Friday 19 June 2015

Old Dogs, New Tricks

As well as taking a brief look at the relationship between evidence, inference, and conjecture, I also want to explain why it’s so important to remember the level to which each of your statements is substantiated. If we accidentally promote some statement beyond that level then we will be distorting our knowledge, and hence our representation of history.

This is not saying that we can only make “proven” statements, or even just substantiated ones; we’re entitled to make whatever statements we see fit in our research, including conjecture, rumour, through hypothesis and theory, to some conclusion that we have a degree of confidence in. This range is necessary since it will be impossible to know everything, or to prove anything at all in the strict sense (see Proof of the Pudding), but we still want answers. What we should not do is to pretend, to ourselves or to others, that something is more than what it is.

Having come to genealogy from a scientific background, I have often claimed it as a paragon of virtue when it comes to evidence-based analysis and understanding of the universe. For instance, when someone recently posted a quote from one of the greatest physicists of the 20th Century, Richard Feynman (“I’d rather have questions that can’t be answered than answers that can’t be questioned” — no room for dogma there, old or new), I just had to add a comment about his exceptional approach to academic integrity. I still research topics in physics, but what I didn’t anticipate is subconsciously applying some of my new critical genealogical thinking when subject-hopping. It does seem that old dogs can pick up new tricks!

Oh no, he's done it again! New ball please!
Figure 1 – Oh no, he's done it again! New ball, please!


Evidence, more than any other genealogical concept — including proof — is hard for most of us to fully grasp. Evidence is often defined as information that is used to support or refute some claim. It therefore involves some interpretation by a researcher, and their interpretation may not be the same as that of another researcher. In other words, it’s a relative concept: relative to both the claim and the researcher. Without the specific claim then information cannot be deemed evidence of anything, and without the researcher then the raw information cannot be interpreted to derive that evidence.

I admit to struggling with the concept myself, and I think part of the reason is the loose way that we use such terms in everyday speech. One of my ways of working, which I call Source Mining, involves reading the raw information in a source, such as a newspaper, and tagging items that I believe will be relevant to a particular piece of research. I’m effectively collecting references that may be significant, to a lesser or greater extent, to my overall goal: that of writing about the history of some person, family, industry, or place. Those items aren’t evidence because I have not made any specific claims. When I’ve consulted several sources then I begin looking for correlations between them: items that support or contradict each other, as well as assembling things into a timeline. But they’re still not evidence! It’s only when I make a specific claim or statement, such as ‘A was the child of B and C’ or ‘A moved from X to Y in 1887 because he changed jobs’, that I need to substantiate it. I then link my statement to those selected items of information, or to my written analysis if those informational items don’t yield direct and non-conflicting evidence by themselves.

So, it’s not necessarily the case that we set out with a specific research question. Evidence is related to the claim or statement rather than to the way it was derived, or even to the reason that it was sought. The above methodology makes the connections in the bottom-up direction (meaning building upwards from the raw information) rather than in a top-down direction where the research is framed by a specific question. My overall goal did not constitute a question and so the information I’d assimilated did not constitute evidence of anything related to that goal.

Let me take a simple example from Who Was Simeon Webber? I had found that the marriage of Simon Webber — the railway labourer — to Elizabeth Samson occurred over 18 years after the birth of their first child, Mary Ann Webber (actually 18 years 106 days, according to my Date Subtraction Tool). The associated birth and marriage certificates provided direct evidence of those events, but did not shed any light on why there was such a huge gap. At that point, any suggested reason would have been pure conjecture or speculation since there was no information to point the way. Such speculation isn’t necessarily idle since it may direct future research efforts in an attempt to support or refute the possibility.

At a later stage, I found that the family had emigrated from England to the US, and I had several sources confirming their residency in Illinois. As well as those sources providing direct evidence of their residency and naturalization, they provided indirect evidence[1] for a reasonable explanation of the late marriage: it must have been easier to emigrate as a family if there was paperwork to back it up — especially if US immigration back then was anything like it is now. The difference was that I then had some extra information that I could interpret in order to make some inferences.

But I couldn’t say that I knew this; it would have been a theory or a hypothesis (which is a sort of prototype theory)[2] depending on how much I knew of US immigration during that time period. That evidence merely supported my hypothesis, but it also supported alternatives too. As it stood, I was confident that the trigger for the marriage was the major event of emigration, but not of the specific reason for it. Was it the potential difficulties in being admitted as a family with no marriage (I have read about polygamous marriages and under-age wives being issues during 19th Century US immigration), or was it to do with property rights, or to custody of the children should something happen during the journey? My evidence only really supported the claim that emigration was the trigger for the marriage, but I would need more information before I could claim anything more specific than that. Of course, none of the information that I had shed any light on why they didn’t marry in the first place — that’s a whole different question.

You may have wondered why academic historians don’t put such emphasis on the finer details of their research, including the “proof” of event dates, relationships, etc. They might argue — erroneously in my opinion — that the small-scale information doesn’t matter that much when they’re researching large-scale history, relevant on a national or global level, and that genealogy, being a form of micro-history, is more dependent upon them. Even if we accept such an argument, how many historical articles have you read that distinguish individual claims and statements according to how well they are substantiated? If they don’t differentiate between something supported directly and something they’ve inferred, including to what level of confidence, then another historian may regurgitate it as “fact”. Biographers are notoriously guilty of similar faults, and yet their goal is much more of a micro-history so they have no excuse.


There are a number of sciences that relate to the past, such as archaeology, Egyptology, palaeontology, and geology. They can all be forgiven for having to make some inferences on the best available information at the time, but they rarely distinguish between what has been found or measured, and what is the interpretation of it. Worse still, especially on the TV, that interpretation is generally presented as though it were fact, and any professional opinion that differed from the mainstream view, or from some acknowledged expert, may be risking that person’s career. This can be a sinister side of science, and I had believed that physics was beyond that.

I’m currently taking another look at physics because it’s just as guilty in some ways. There are many things that are depicted as fact when they’re merely theory, and — worse still — some blatant misrepresentation of what mathematical physics is telling us. Believe me when I tell you that this is worse than suggesting some genealogical information tells us something when there’s obviously some interpretation required. You see, mathematics does not describe cause-and-effect; it describes a continuous relationship between certain variables. The equation that describes the path of a cannonball does not say that it was ejected from a cannon to finish on the ground, or vice versa; it merely describes the total path.

Let me select a popular subject as an example: the Big Bang. The public are led to believe that there was a massive primordial explosion from which the universe was created. Well, that’s wrong on just about every count. The event would not have been big; it would have been infinitesimally small. There would have been no bang because there was no space, no air, and no one to hear it. There wasn’t even an explosion because that implies material expanding into pre-existing space, whereas space and time supposedly began at this event. In fact, the whole event is inferred from the extrapolation of certain stellar spectral measurements that imply all stars are travelling away from each other. A similar story can be seen in topics such as quantum theory, relativity, etc. I have always considered myself to be an extreme sceptic but I intend to reapply my critical thinking in these areas very soon.


So what can we learn from this? Critical thinking is good; think like a detective. We can’t always expect solid evidence and so we’ll have statements that vary widely in the level to which they’re substantiated — just make sure you remember those levels by documenting any evidence, where it came from, and how it relates to your statements. A guess is fine as long as you remember that it’s a guess, and don’t promote it without more evidence. Understand what you’re doing, and why, during your research, as opposed to blindly following a prescription. If you think you’re being hassled by so-called genealogy police, just remember that people who advocate a prescription probably need a prescription.

[1] Elizabeth Shown Mills, “QuickLesson 13: Classes of Evidence―Direct, Indirect & Negative”, Evidence Explained: Historical Analysis, Citation & Source Usage ( : accessed 15 Jun 2015).
[2] Elizabeth Shown Mills, “QuickLesson 16: Speculation, Hypothesis, Interpretation & Proof”, Evidence Explained: Historical Analysis, Citation & Source Usage ( : accessed 15 Jun 2015).

Friday 5 June 2015

Light-Bulb Moments

Listening to the continuing discussions within FHISO, I have to wonder whether something is missing; something that's fundamentally necessary for the production of a data standard for genealogy. Let me start with a question: how many programmers does it take to change a light bulb?

There are many instances of this, and similar, jokes on the Internet. The most common answer is:

"None. That's a hardware problem."

A less common one, aimed specifically at Windows programmers, is:

"472. One to write WinGetLightBulbHandle, one to write WinQueryStatusLightBulb, one to write WinGetLightSwitchHandle, ... etc ..."


"None. Darkness is a software 'feature'."

What these jokes demonstrate is the popular notion of the programmer as someone who lives in a virtual reality, and who is not fully connected to the real world. Coming from this background myself, I have to admit that there are rather too many real instances of the stereotype for me to be entirely comfortable with the jokes. But is this a fair criticism of programmers, software developers, and other software people, working in the genealogy field?

If I extrapolated that notion to a real bulb-changing scenario then I might imagine the programmer taking careful note of the bulb fitting. Is it a bayonet or screw fitting? If it’s a screw fitting then what size is it? What shape bulb is required: standard, golf-ball, pygmy, or other? Does the replacement conform to national standards? Should they use a low-energy one as they’re the way to go — allegedly?

The problem is that a consideration of the change as an isolated exercise — no matter how deep the attention to detail — will miss the fact that the result has to perform a particular function. Without an examination of the functional requirements then the appropriate power rating (wattage) and bulb colour may be wrongly selected.

OK, enough of the analogies before I get lynched. My point is that the FHISO mailing lists are currently dominated by programmers and other people with a background in software development. There are some particularly brave non-programmers on those lists, and their astuteness and functional focus are both very noticeable. Dialogue between the two groups has helped to rescue the concept of genealogy from the mire of BMD data and family trees, but software thinking continues to retreat back into that safe world; a place where the computer needs to understand everything, and anything written in a national language — including research notes, proof arguments, and conclusions, if not general biographical narrative — is too flexible and ill-defined.

There’s a subtle issue here, concerning the programmer’s attitude to genealogy, that isn’t seen in most other industry sectors. Software may be applied to a huge range of activities, including: banking and financial services, business, healthcare, manufacturing, accounting, aerospace and defence, government, engineering, and education — to name but a few. It would be extremely rare to find programmers who knew these subjects to a level  that required no functional requirements from anyone else, and yet that is frequently what happens in the genealogical sector, so why is that?

Interestingly, international and cultural differences are topics that both groups can make gross assumptions over. Unless they have specifically studied such differences, or researched in alternative locales, then they can both be guilty of retreating into their own native norms and cultural traditions; they both need functional inputs from authoritative sources.

Part of the genealogical difference may arise from a chicken-and-egg case of “what you see is what you get”. Early software products were quite naïve in their understanding of genealogy, and that image has been perpetuated by the advertising and online tools of the major genealogical companies. It’s hard to break ranks when everything you see appears to follow the same paradigm.

Another part of the difference will certainly be due to the fact that those programmers supposedly indulge in some form of genealogy, and so as consumers they’re effectively writing their own specifications. This is a very important factor since it’s hard to assess the type of genealogy they indulge in, or to what depth of knowledge they aspire.

This should not be taken as a suggestion that they’re bad genealogists, or that they do not indulge at all, but simply that their work is rarely evident. This is actually a fault of many genealogists, from the amateur through to the professional; very few of us publish our genealogical works beyond mere trees, and some don’t even do that. Is that for reasons of confidence, capabilities, copyright, or something else beginning with ‘c’? Unfortunately, in the absence of such public material, how can we possibly know that we’re trying to solve the same problems, or aim for the same goal?

There is a core group of genealogists who publish written works on their Web site or their blog, and I take my hat off to them. They capture much more than a mere tree would, and offer readable material for friends, family, and colleagues to access. Professionals and academics may publish articles in genealogical journals, where they would be read by their peers, but far less so online. It was suggested to me recently that their drive to produce reliable and well-documented works may be construed as elitism, and so they would rather avoid that level of criticism. I made a conscious decision to publish written genealogical works on my own blog — with some initial trepidation as I was neither writer nor historian — primarily to show the type of genealogy that I do, and to put some context around my software views and requirements. Although I include citations in those works, I do admit to some embarrassment over them. There are definitely folks who see citations as some sort of elitist competition, rather than a matter of research integrity, and so I find myself trying to “nibble off the edges”, and so reduce their volume — probably as a journal editor might.

This general situation is sad for a couple of reasons. Firstly, genealogy needs more online works to guide and inspire others. There is so much value that can be added through writing, not just in capturing recollections but also in explaining your journey through time and sources to capture the lives of your antecedents. Secondly, in the context of FHISO, working on a standard for representing our data will be an impossible task if we cannot determine the range of our collective goals. We all have different notions of what constitutes genealogy, and although many will have similar notions, some will have quite dissimilar ones. Writing a specification for your own personal style, approach, and goals is a dangerous pitfall.

Maybe we’ll be in the darkness for some time to come, unless someone manages to change that broken bulb.