Wednesday 29 January 2014

Using Google Maps with Blogger



In a previous post, entitled A Grave Too Far, you may have noticed a Google Map. How many people noticed, though, that it wasn’t a simple static image? How do you embed an active, or “live”, map that can be scrolled, magnified, etc?



There are basically two ways that a Google Map can be embedded in a blog post. The easiest, and the most common, is to simply display a static image captured from Google Maps.

Just position and size the relevant view in Google maps. Remove any unwanted location markers using the menu in the top right-hand corner, close the left panel down, and use Ctrl+Alt+PrtSc (under Windows) to capture the browser’s image. You’ll need to edit the captured image to remove peripheral detail of the browser window, but you can then insert the result into your blog as with any other image.

In the example below, I’ve also added a link below the image, entitled ‘View larger Map’, that will take the reader to the full-blown map if they so choose. The URL to associate with this link is provided for you in the Google Map — Simply click the ‘SHARE’ option in the left-hand panel, and copy the URL from the ‘Share link’ box.


This is also the mechanism you would use if you wanted to embed the map into a Word document as there is no way, currently, of embedding a live map there.

In a blog post, though, you have another option. This time, copy the HTML from the ‘Embed map’ box in that Google Maps ‘SHARE’ window. Put a recognisable place-marker in your blog text at the position you want the embedded map (e.g. “[GoogleMap]”) before switching from the Compose view to the HTML view. This may look a bit scary now as you’re looking at the raw HTML for your post. Search for your place-marker and replace it completely with that HTML from Google Maps. Return to the Compose view.


You’ll notice that a very similar ‘View Larger Map’ link is provided automatically below the embedded map. In fact, the main visible difference between this live map and the static image is the presence of the navigation and magnification control in the corner.

Some tips: In Google Maps, you are allowed to customise the embedded map before copying its HTML (see option at the bottom on the ‘Link’ page) but the default ‘Medium’ size works best for a blog post. When editing your blog, don’t replace the place-marker until you have finished your other edits. When there are other pictures in the blog post then I have had great trouble literally chasing them around the Compose window, and trying to keep them from moving of their own accord. This is a Blogger issue and not the author’s fault. Finally, the embedded map that you’ll see in the Compose window, and during any preview, is just a snapshot rather than a live map, and it will not respond to your cursor. Don’t worry, though, as it will work in the published edition.

You’ll agree that the live map is much richer than the static image. However, Google Maps only provides modern-day maps at the current time. In principle, other sites could use a similar mechanism to access historical maps, and this would prevent the plethora of copies floating around the Internet; many of which have no citation of the original source, and no attribution for where it was taken from. Google have set a good precedent here which could be adopted on a greater scale.

**** Updated 24 Feb 2017 **** The option to share had moved again.

Saturday 25 January 2014

Revisionism in Genealogy



There is a maxim that ‘History is written by the survivors’[1], and so it would be of no surprise that genealogists are occasionally guilty of recording history the way they would like it. You could be surprised, however, at the various reasons for this.

We like to think that modern generations in the Western world are thick-skinned regarding information uncovered during research into their family history. We all know that some things were covered up, and not spoken of, in older times to avoid social embarrassment, such as illegitimate births. Serious genealogists strive to be objective about what they might find, but there is often a threshold at which they might still balk. In my personal research, I have encountered desertions (from both family and military), attempted murders, suicides, prosecutions for child neglect, death by alcohol, and numerous prison sentences, but I record them all. You have to be prepared for whatever you might find in your sources.


Figure 1 - Cast from the Munsters TV Series[2]


But suppose I also had one possible case of incest somewhere in my data. Although I would not be upset by that myself, I would not document it in any shared material in case it upset members of my extended family. Hence, sensitivity is one potential reason for trying to change, or at least hide, the real history.

In some cultures, this sensitivity may be more acute and may still apply today. If someone finds their ancestors were on the wrong side of a civil war, or had affiliations to the wrong religion, or were part of the wrong caste or social group, or had different ethnic origins, then they may try to hide the details, or they may even try to destroy the physical evidence. As DNA analysis becomes more common in other parts of the world, I see it potentially being used in a divisory manner to root-out and exclude people with the wrong ancestry. As long as there are tags, labels, beliefs, and creeds by which we can be differentiated then there will be people who will strive to exercise them.

If you had found that a long-dead ancestor was guilty of war crimes then could this put existing people’s lives in danger? I have certainly heard of cases where a researcher being clever enough to identify and/or locate someone’s estranged spouse has put that person’s life in danger. Hence, sensing a clear and present danger is another potential reason.

Of course, these issues only really apply to people performing historical research, including family history and micro-history. Anyone who is content with a mere family tree probably doesn’t have to worry as much, … or do they? If a couple were previously married to other people then most of us would also record their prior marriages, but what if they were never released from those marriages? What if the couple were never married at all? We’re now getting into the software realm, and the impact that a less-than-adequate product may have. It could be argued that the convictions of the software author, or the ethos in the organisation behind its development, may play a part in what aspects of real-life it will address.

Irrespective of whether your software product deliberately assuages the complexities of real-life, its scope and design may inadvertently distort the past. What if this hypothetical couple had multiple marriage dates, as in separate civil and religious ceremonies? Could your product accommodate that without it getting upset? There’s a whole gamut of detail that we may be unable to record precisely due to software limitations, such as simply having alternative names, including professional names, nicknames, etc. The same applies to having alternate spellings of names, or different versions in different languages. What about adoption, changes of sex, polygamy, same-sex unions, or simply wanting to record some significant event type that your product doesn’t know about?

There are probably two related software issues here: the fact that we only have a lineage-linked data format for sharing (i.e. GEDCOM), and the reluctance of vendors to take in a wider scope. GEDCOM is now inadequate in a number of ways, but arguably the most important of these is that it records lineage rather than history. For people, like myself, who want to record many aspects of history (including narrative, places, events, and unrelated people), this makes GEDCOM useless. Unfortunately, there are still many people who insist ‘We don’t need new standards — GEDCOM is OK’, which effectively means ‘I’m all right Jack. I don’t want to know your problems’. Some vendors, though, hide behind GEDCOM compatibility as a shield against having to consider any radically new direction in their product development, and occasionally using the circular argument that ‘There’s no proven market for that’. I discussed this in an older post at: The Commercial Realities of Data Standards.

By far the weakest reason for distorting history is a personal desire to simplify things, either to keep it all cosy and rosy, or because the researcher was simply too tired to try and capture, and understand, the essence of that history. I have heard tales of people not being interested in recording someone’s birth family as adoptive parents had taken their place in all aspects of their life, or conversely that someone wasn’t interested in adoptive parents as they weren’t blood relations. I find this blinkered approach particularly insidious, so you can imagine my horror when I heard several such declarations in one discussion thread last year. Before I describe these, I want to emphasise that they were from long-standing and respected genealogists.

The idea that some women hyphenated their names after marriage, or that some husbands adopted their wife’s surname, was greeted with derision, and a flat refusal to record either and to just adopt the “standard” convention. This was further justified with the statement that anyone disagreeing can go and look in someone else’s genealogy. The suggestion that the case of surnames in same-sex unions was not covered by any convention was greeted with the claim that these were not part of genealogy, and so irrelevant. The issue of having versions of a name in multiple languages, and with possible variations in spelling, was considered far too complex, with one participant declaring they would simply standardise such a name to the modern Anglicised version. Parts of this thread even suggested that the consideration of medieval names, for which the surname concept didn’t yet exist, was a waste of time as so few people could reliably trace their ancestry back that far.

Part of the reason for these declarations was an irrational fear that catering for the generic cases would increase the cost and complexity of their software products. Another stated reason, though, was the belief that the parts of the world where unusual customs and traditions are commonplace were not big users of genealogy software. Hence, their own parochial, English-speaking history should not be affected. The fact that a huge number of people in, say, the US — where genealogy has been shown to be very popular — would find that their ancestors came from those other parts of the world shows this to be a staggeringly naïve viewpoint.

So let’s not lose sight of the bigger picture.  As genealogists, we must stay faithful to the evidence we uncover, and that means we need a way of marking sensitive information in order to control, or even prevent, sharing. Further than that, though, we need to be able to address real history by place and event, not just by person and lineage. Finally, we also need to take a generic stance with regard to other lifestyles, other cultures and other time periods.




[1] The maxim is credited to Social Forces (Oct 1931) by the Yale Book of Quotations, ed. Fred R. Shapiro (R. R. Donnelley & Sons, 2006) and to Max Lerner (1902-1992) by the Columbia Dictionary of Quotations. It is a derivative of ‘History is written by the victors’ (or /winners/conquerors) which, although often credited to Sir Winston Churchill, is probably much older and its true origin is unknown.
[2] Publicity photo from 28 Aug 1964 of the cast of the TV program The Munsters. Seated, from left: Butch Patrick (Eddie), Yvonne DeCarlo (Lilly), Fred Gwynne (Herman). Standing: Beverly Owen (Marilyn) and Al Lewis (Grandpa). Attribution: By CBS Television (eBay item photo front photo back) [Public domain], via Wikimedia Commons (https://commons.wikimedia.org/wiki/File%3AMunsters_cast_1964.JPG : accessed 21 Jan 2014). NB: My preferred image from this program wasn't in the public domain. I managed to track down the current copyright holding to Universal Studios. They advised me that a single image, displayed for non-commercial use, irrespective of resolution, would cost me $250, and I would have to register at https://www.universalclips.com in order to obtain a license. I felt this was unrealistic given that I'm not making a documentary or writing a book. Effectively, licensing policies like this do not cater for such Web-based "sampling" by individuals, and so it's hardly surprising that they get ignored.

Wednesday 22 January 2014

Using Feedburner with Blogger



Once you have your Blogger account working, you may have heard about a free tool provided by Google called Feedburner, but what is it? Even if you already use it, you may not be quite sure what it does. As well as trying to explain this as simply as I can, I also want to highlight some potential log-jams that you may encounter.


Once you have started publishing to your blog, you will want people to find and read it. Relying on them finding it by accident through a Web search is not really going to work so you will be sharing each post into news streams such as those in Google+ (Circles and Communities) and Facebook. However, it’s easy to miss stuff in these streams, and if someone wants to follow your every word then they will need a way for them to subscribe and be notified when something new has been published.

RSS (Really Simple Syndication) and Atom are both mechanisms for the syndication of Web feeds. In other words, they allow changes on one site to be syndicated to one-or-more other sites. In the context of a blog, this simply means that they allow users to see when new material has been published. When users subscribe to a Web feed, they may see these changes through a feed reader (such as Google Reader, Bloglines, or NetVibes) or by email. A feed reader is sometimes called an aggregator program as it allows you to bring together content from multiple feeds (e.g. from different blogs) so that you can view them all in one place.

A feed reader just needs the URL of the target blog (e.g. http://parallax-viewpoint.blogspot.com) for it to subscribe. It’s not unlike an email program in that it periodically checks your subscribed feeds and shows the number of new posts. You can then decide to selectively download and read them. This is all very well if you’re familiar with those tools, or you don’t mind learning new tools. However, many people would prefer to just receive a simple email containing the latest blog post when it appears.

So where does Feedburner fit into this? Well, Blogger can publish updates via four different feed URLs, such as:

Atom feeds:-
http://parallax-viewpoint.blogspot.com/feeds/posts/default
http://parallax-viewpoint.blogspot.com/atom.xml
RSS feeds:-
http://parallax-viewpoint.blogspot.com/feeds/posts/default?alt=rss
http://parallax-viewpoint.blogspot.com/rss.xml

It’s possible for other sites and tools to make use of any of these, and you would have no ability to see the total number of your subscribers. Those subscribers may not see identically-rendered copies of your post either. What Feedburner does is redirect all these URLs to a new data feed of its own, such as:

http://feeds.feedburner.com/blogspot/xxxxxx

The xxxxxx is a string of characters generated for you. This redirection means your subscribers will all feed from the same place, and Feedburner can then generate subscriber statistics for you.

In order to set this up, you must first have a Feedburner account. Go to Feedburner and sign in with your Google Account. Put your Blog URL (e.g. http://parallax-viewpoint.blogspot.com/) into the 'Burn a Feed Right This Instant' and click ‘Next’. Specify a feed title, and take note of the ‘Feed Address’ that it creates for you as you’ll need to tell Blogger about it. The Feedburner account should now show up in your Google dashboard (https://www.google.com/settings/dashboard) along with your Blogger account, etc.

Then go to your Blogger dashboard. Select Settings→Other from the left panel, and go to the Site Feed section. In the ‘Post Feed Redirect URL’ field, enter the URL of your Feedburner feed (e.g. http://feeds.feedburner.com/blogspot/xxxxxx). Set the ‘Allow Blog Feed’ field to “Full”.

OK, you now have a Feedburner feed. Now let’s make it easy for email subscribers. Go back to your Feedburner account, select the Publicize tab, and then the ‘Email Subscription’ entry on the left. Click ‘Activate’. You can now customise some settings under the ‘Email Subscription’ section such as the title/body used for confirmation emails, the title used for notification emails, and the time-of-day when notifications should be sent.

Back in the Blogger dashboard, select Layout on the left. Pick a panel (usually the right panel) and ‘Add a Gadget’. Choose the ‘Follow By Email’ gadget, provide a label for the email address field, and specify the Feedburner feed URL, as shown above. This will provide a very simple field on each blog page into which a user can enter an email address. Those users will be sent a confirmation email which they must respond to in order to receive email notifications from your blog.

So far, so good; I hope. A quick search of the Internet, though, shows lots of people have problems getting email subscription working, so what’s the problem? Well, Feedburner keeps a copy of the last few blog posts so that it can compare them with the latest information from your original blog feed, and so only notify people of updates. However, it has a space limit of 512KB. Note that this is the size of the HTML rather than of your original text, and it does not include any images or attachments. It’s therefore a little difficult to gauge. The two main issues are exceeding this size limit, and having unrecognisable content in your feed.

Preparation of your blog post in Microsoft Word can be a cause in both of these issues, but this has already been covered in my previous post at: Using Microsoft Word with Blogger.

In the Feedburner dashboard, under the Troubleshootize tab, there are tools to validate the original (Blogger) feed and the Feedburner feed. If these show errors such as the following then there’s a simple explanation:

Undefined entry element: georss:featurename 3 occurrences
... 2" width="72" /><thr:total>0</thr:total><georss:featurename>Naples, FL,

Undefined entry element: georss:box 3 occurrences

nt>26.1420358 -81.7948103</georss:point><georss:box>25.913972299999998

The problem here is that you’ve set a Location in the ‘Post Settings’ down the right-hand side of your blog-post.  This generates geocoding data for you but — at the time of writing ― it’s not a in a format expected by Feedburner, and so it throws-up as a result. If you simply unset that Location property on your post then this error should go away.

Another reason for exceeding the size limitation is if you generate a lot of lengthy blogs, or even a few very lengthy ones. By default, Blogger provides details of the last 25 posts to Feedburner and the sum total of this may be too great. This can be restricted with a parameter on the end of your original feed address. Click the ‘Edit feed details…’ button at the top of the page, and add a max-result parameter to the end of the ‘Original Feed’ address, such as:

http://parallax-viewpoint.blogspot.com/feeds/posts/default?max-results=3

I specified a very low value for myself since, although I only generate about one post per week, they tend to be quite large.

Sunday 19 January 2014

You’re Probably Right



When anyone mentions statistics or probability being applied to genealogical research then there’s usually a sharp reaction. There happen to be some valid questions that would benefit from thoughtful discussion but unfortunately the many knee-jerk reactions tend to be for all the wrong reasons.


It’s hard to find a single reason why this topic gets such an adverse reaction since the arguments made against it are rarely put together very carefully. I have seen some reactions based purely on the fear that any application of numbers means that assessments will be estimated to an inappropriate level of precision, such as 12.8732%. That’s just ludicrous, of course!

In this post, I won’t actually be making a case for the use of statistics since I am still experimenting with an implementation of this myself and it isn’t straightforward. What I will try to do is identify what is and is-not open to debate, and ideally to add some degree of clarity. Although I have a mathematical background, this only briefly touched on statistics. It is a specialist field, and many folks will have a skewed picture of it, whether they’re mathematically inclined or not. It is also a technical field and so a few symbols and numbers are inevitable but I will try and balance things with real-life illustrations.

Statistics is generally about the collection and analysis of data. Despite what politicians might have us believe, statistics proves nothing, and this is important for the purposes of this article. Statistical analysis can demonstrate a correlation between two sets of data but it cannot indicate whether either is a consequence of the other, or whether they both depend on something else. The classic example is data that shows a correlation between the sales of sunglasses and ice-cream — it doesn’t imply that the wearing of sunglasses is necessary for the eating of ice-cream.

Mathematical statistics is about the mathematical treatment of probability, but there is more than one interpretation of probability. The standard interpretation, called frequentist probability, uses it as a measure of the frequency or chance of something happening. Taking the roll of a die as a simple example, we can calculate the number of ways that it can fall and so attribute a probability to each face (1/6, or roughly 16.7%). Alternatively, we could look at past performance of the die and use that to determine the probabilities; a method that works better in the case where a die is weighted. When dealing with the individual events (e.g. each roll of the die), they may be independent of one another, or dependent on previous events. A real-life demonstration of independent events would be the roulette wheel. If the ball had fallen on red 20 times then we’d all instinctively bet on black next, even though the red/black probability is unchanged. Conversely, if you’d selected 20 red cards from a deck of playing cards then the probability of a black being next has increased.

The other major interpretation of probability is called Bayesian probability after the Rev. Thomas Bayes (1701–1761), a mathematician and theologian who first provided a theorem to expresses how a subjective degree of belief should change to account for new evidence. His work was later developed further by the famous French mathematician and astronomer Pierre-Simon, marquis de Laplace (1749–1827). It is this view of probability, rather than anything to do with frequency or chance, which is relevant to inferential disciplines such as genealogy. Essentially, a Bayesian probability represents a state of knowledge about something, such as a degree of confidence. This is where it gets philosophically interesting because some people (the objectivists) consider it to be a natural extension of traditional Boolean logic to handle concepts that cannot be represented by pairs of values with such exactitude as true/false, definite/impossible, or 1/0. Other people (the subjectivists) consider it to be simply an attempt to quantify personal belief in something.

In actuarial fields, such as insurance, a person is categorised according to their demographics, and the previous record of those demographics is used to attribute a numerical risk factor (and an associated insurance premium) to that person. This is therefore a frequentist application. Consider now a bookmaker who is giving odds on a horse race. You might think he’s simply basing his numbers on the past performance of the horses but you’d be wrong. A good bookmaker watches the horses in the paddock area, and sees how they look, move and behave. He may also talk to trainers. His odds are based on experience and knowledge of his field and so this is more of a Bayesian application.

Accepted genealogy certainly accommodates qualitative assessments such as primary/secondary information, original/derivative sources, impartial/subjective viewpoint, etc. When we consider the likelihood of a given scenario then we might use terms such as possible, very likely, or extremely improbable, and Elizabeth Shown Mills offers a recommended list of such terms[1]. Although there is no standard list, we all accept that our preferred terms are ordered, with each being between the likelihoods of the adjacent terms. These lists are not linear; meaning that the relative likelihoods are not evenly spaced. They actually form a non-linear[2] scale since we have more terms the closer we get to the delimiting ‘impossible’ and ‘definite’. In effect, our assessments asymptotically approach these idealistic terms, but never actually get there.

As part of my work on STEMMA®, I experimented with putting a numerical ‘Surety’ value against items of evidence when used to support/refute a conjecture, and also on the likelihood of competing explanations of something. This turned out to be more cumbersome than I’d imagined, although a better user interface in the software could have helped. The STEMMA rationale for using percentages in the Surety attribute rather than simple integers was partly so that it allowed some basic arithmetic to assess reasoning. For instance, if A => B, and B => C, then the surety of C is surety(A) * surety(B). Another goal, though, was that of ‘collective assessment’. Given three alternatives, X, Y, & Z, simple integers might allow an assessment of X against Y, or X against Z, but not X against all the remaining alternatives (i.e. Y+Z) since they wouldn’t add up to 100%.

Although I didn’t know it, my concept of ‘collective assessment’ was getting vaguely close to something called conditional probabilities in Bayes’ work. A conditional probability is the probability of an event (A) given that some other event (B) is true. Mathematicians write this as P(A | B) but don’t get too worried about this; just treat it as a form of shorthand. Bayes’ theorem can be summarised as[3]:

           P(A)
P(A | B) = ―――  P(B | A)
           P(B)

It helps you to invert a conditional probability so that you can look at it the other way around. A classic example that’s often used to demonstrate this involves a hypothetical criminal case. Suppose an accused man is considered one-chance-in-a-hundred to be guilty of a murder (i.e. 1%). This is known as the prior probability and we’ll refer to it as P(G), i.e. the probability that he’s Guilty. Then some new Evidence (E) comes along; say a bloodied murder weapon found in his house, or some DNA evidence. We might say that the probability of finding that evidence if he was guilty (i.e. P(E | G) is 95%, but the probability of finding it if he was NOT guilty (i.e. P(E | ¬ G)[4] is just 10%[5]. What we want is the new probability of him being guilty given that this evidence has now been found, i.e. P(G | E). This is known as the posterior probability (yeah, yeah, no jokes please!). The calculation itself is not too difficult, although the result is not at all obvious.

           P(E | G)            95%
P(G | E) = ――― P(G) = ――――――――――――― x 1% = 8.8%
             P(E)          (95% x 1%) + (10% x 99%)

This may just look like a bunch of numbers to many readers, but the mention of finding new evidence must be ringing bells for everyone. If you had estimated the likelihood of an explanation at such-and-such, but a new item of evidence came along, then you should be able to adjust that likelihood appropriately with this theorem.

So what about a genealogical example? Well, here’s a real one that I briefly toyed with myself. An ancestor called Susanna Kindle Richmond was born illegitimately in 1827. I estimated that there was a 15% chance that her middle name was the surname of the biological father. If we call this event K, for Kindle, then it means P(K) is 15%. This figure could be debated but it’s the difference between the prior and posterior versions of this probability that are more significant. In other words, even if this was a wild guess, it’s the change that any new evidence makes that I should take notice of. It turns out that the name ‘Kindle’ is quite a rare surname. FreeBMD counted less than 100 instances of Kindle/Kindel in the civil registrations of vital events for England and Wales. In the baptism records, I later found that there was a Kindle family living on the same street during the same year as Susanna’s baptism. Let’s call this event — of finding a Neighbour with the surname Kindle ― N. I estimated the chance of finding a neighbour with this surname if it was also the surname of her father at 1%, and the probability of finding one if it wasn’t the surname of her father at 0.01%. What I wanted was the new estimation of K, i.e. K | N. Well, following the method in the murder example:

           P(N | K)              1%
P(K | N) = ―――― P(K) = ――――――――――――― x 15% = 94.6%
            P(N)           (1% x 15%)+(0.01% x 85%)

This is a rather stark result from the low probabilities being used. I’m not claiming that this is a perfect example, or that my estimates are spot on, but it was designed to illustrate the following two points. Firstly, it demonstrates that the results from Bayes’ theorem can run counter to our intuition. Secondly, though, it demonstrates the difficulty in using the theorem correctly because this example is actually flawed.  The value of 1% for P(N | K) is fair enough as it represents the probability of finding a neighbour with the surname Kindle if her middle name was her father’s surname. However, the figure of 0.01% for P(N | ¬ K) was really representing the random chance of finding such a neighbour if her middle name wasn’t Kindle at all. What it should have represented was the probability of finding such a neighbour if her middle name was Kindle but it wasn’t the surname of her father. However, it failed to consider that the two families may simply have been close friends.

There is no room for debate on the mathematics of probability, including Bayesian probability and the Bayes’ theorem. The application of this mathematics is accepted in an enormous number of real-life fields, and genealogy is not fundamentally different to them. As part of my professional experience, I know that many companies use Bayesian forecasting to good effect in the analytical field know as business intelligence. The only controversial point presented here is the determination of those subjective assessments. All of the fields where Bayes’ theorem is applied involve people who are quantifying assessments that are based on experience and expertise. We already know that genealogists make qualitative assessments but would it be a natural step to put numerical equivalents on their ordered scales of terms. We wouldn’t argue that ‘definite’ means 100%, or that ‘impossible’ means 0%, but employing numbers in between is more controversial even though we may use a phrase like “50 : 50” in normal speech.

I believe there are two issues that would benefit from rational debate: where those estimations come from, and whether it would be practical for genealogists to specify them and make use of them through their software. Although businesses proactively use Bayesian forecasting, the only examples I’ve seen in fields such law and medicine have been ex post facto (after the event). For my part, I find it very easy to put approximate numbers against real-life perceived risks, and the likelihood of possible scenarios. I have no idea where these come from, and I can’t pretend that someone else would conjure the same values. Maybe it’s a simple familiarity with numbers, or maybe people are just wired differently – I really don’t know!

Even if this works for some of us, it is unlikely to work for all of us. By itself, though, this is not a reason for dismissing it out-of-hand, or lashing out at the mathematically-inspired amongst the community. A potential reaction such as ‘We happen to be qualified genealogists, and not bookmakers’ would say more about misplaced pride than considered analysis. Genealogists and bookmakers are both experts in their own fields. When they say they’re sure of something, they don’t mean absolutely, 100% sure, but to what extent are they sure?



[1] Elizabeth Shown Mills, Evidence Explained: Citing History Sources from Artifacts to Cyberspace (Baltimore, Maryland: Genealogical Pub. Co., 2009), p.19.
[2] If you’re thinking “logarithmic” then you would be wrong. The range is symmetrically asymptotic at both ends and so is hyperbolic.
[3] This simple form applies where each event has just two outcomes: a result happening or not happening. There is a more complicated form that applies where each event may have an arbitrary number of outcomes.
[4] I’m using the logical NOT sign (¬) here to indicate the inverse of an event’s outcome. The convention is to use a macron (bar over the letter) but that requires a specialist typeface.
[5] Yes, that’s right, 10% and 95% do not add up to 100%. The misunderstanding that they should plagues a number of examples that I’ve seen. The probability of finding the evidence if he was guilty, P(E | G), and the probability of finding the evidence if he was not guilty, P(E | ¬ G), are like “apples and oranges” because they cover different situations, and so they will not add up to 100%. However, the probability of not finding the evidence if he was guilty, P(¬ E | G), is the inverse of P(E | G) and so they would total 100%.