Friday, 23 September 2016

Reaping What We Sow — Part I

Several pundits have questioned what genealogy really is, usually focusing their answers on the interpretation of the word. Even I’ve contrasted the semantics of the terms genealogy and family history, as used in the US and the UK, at What Is Genealogy? In this article, though, I want to consider the question in a much wider arena: is what we’re doing what we really want to do, and how has the Internet influenced this? Or, with a Monty Python twist, what has the digital age ever done for us?

On 9 Aug 2015, genealogist Janet Few prompted a flurry of diverse opinion with a post entitled Internet Genealogy - is this progress?. She suggested that although ease of access to record images was of great benefit, thoroughness and rigour had been compromised in the interests of speed. Also, that Web site changes were largely in the interests of profit rather than of “serious researchers”.

Only a couple of days before that, at Is It Time to Let Go of the Internet in Genealogy?, Amy Johnson Crow bemoaned the continued use of the adjective online as though it indicated some fundamentally different resource. In other words, the Internet is here to stay, and is now a fundamental part of genealogical research, so why emphasise it.

So who is right? Is the Internet simply part-and-parcel of our pursuit, or is it a crucial opportunity that has been missed through a combination of commercial interests and a hands-off fear of the technological leviathan?

I want to make the case that genealogy has come off its rails with advent of Internet genealogy, and that the different interests, diverse skills, and entrenched viewpoints within our community have unintentionally left it injured, disrespected, and a pale shadow of what it should be. In order to do this, I will first look at the most change-laden contributions to genealogy of recent times. In Part II, I will examine the repercussions of those contributions and consider whether they have collectively been good or bad for genealogy.

Figure 1 – Barren tree in an infertile landscape.[1]


Back in Are we a Genealogical Community?, I was naïve enough to suggest that we were a single community. I now want to renege on that and suggest that we currently have a multitude of largely independent communities operating under the same umbrella. I will refer to the main three driving forces as:

  • Software Genealogy
  • Traditional Genealogy
  • Commercial Genealogy

The recipients of their uncoordinated efforts are the everyday devotees, enthusiasts and hobbyists, including the end-users of any technology.

Software Genealogy

This group includes those with strong software backgrounds who are either producing products or who are researching into the application of software to genealogy. I accept that I fall into this category myself, and so when I criticise it then I am implicitly accepting my own failings.

For the first eight year of my research, I used no genealogy product or database. The initial task through which I entered genealogy was a complex family mystery that left me with many clues and hypotheses, other writings, verbal recollections, and newspaper cuttings, but comparatively few official records. When the time came, I found that no software came remotely close to taking over so I had to write my own — so was born my STEMMA project.

Current software tools appear to force a binary choice: your primary focus must be either lineage or family history, and so your respective tool of choice must involve either a tree or some form of narrative aid. This is woefully inadequate and prevents the proper integration of, say, a family history write-up with access to the associated biological and marital relationships, events, timelines, and geography. There may be a growing number of sites advocating written history, but there is an implicit assumption that such writing will separately use either a normal word-processor, a blog, or some wiki-style tool.

So what about those people who are working upwards from information encountered in various sources, and making inferences or arguments as they go? I examined this source-based approach back in Source Mining, and discussed its advantages and differing goals, but it is not a feature of any mainstream products or Web sites. There are some newer products that help keep track of your evidence, and its relationship to sources, but they are not — as far as I’m aware — advocating a different methodology.

So where lies the root-cause of this discrepancy between what I want to do and what’s expected of me? In Light-bulb Moments I suggested that programmers were effectively writing specifications for whatever form of genealogy they happen to indulge in themselves. Also that it was hard to assess the type of genealogy they indulged in, or to what depth of knowledge they aspired, if they didn’t publish their work. This was the main reason why I decided to publish some of my own research on this blog; putting it in the spotlight would allow people to assess whether my still-evolving software ideas had any merit in the wider world. In practice, though, my association with software is something of a stigma that makes it hard to be taken seriously in certain quarters, or to cooperate in productive debate.

Software people generally have a talent for looking at things in abstract ways that can lead to clever and efficient designs that may have longevity beyond their originally-envisaged functional requirements. This is a two-edged sword, though, and it can lead to over simplification of a problem, or to approaches that are just too abstract to be useful in the real world. STEMMA has been criticised for being an overly-complex data model, to which I would counter that it is modelling data and relationships that are part of the real world, and that reductive software thinking can ultimately lead to reduced potential.

A good example of this is narrative. I have heard statements that genealogical narrative is too free-form for computer representation, and that what is expresses is therefore too opaque for software to understand. This speaks volumes about a particular mindset, and those commentators must be reminded that it’s people that do genealogy, not software. Narrative is a very rich medium that is essential for genealogists, but it must not be supported alone.

One of the most important contributions from software genealogy should have been data standards but all attempts have been unsuccessful to date. My work within FHISO has shown a number of things: that it is impossible to get the major software people around the same table; that our different ideas of genealogy are often at odds, and sometimes not grounded with sufficient experience; that the industry is content to sit on the sidelines and wait for something to appear, which may then be ignored; and that only a very small number of non-software people have been able to tolerate the abstract discussions and make valuable contributions.

Traditional Genealogy

This group includes those who undertake professional genealogy, publish books and write for academic journals, or who promote the rigorous handling of evidence and sources in research methodology. Judging by the membership of organisations such as APG and NGS, this influential group makes up a surprisingly small proportion of all US genealogists, and the same pattern is probably evident in Europe too. It is undeniable that their guidance can be found in books and on certain Web sites, but it is not linked or advocated by any of the big commercial sites, and that puts it in a different domain to the ones frequented by the majority of genealogists. In other words, why would they hunt it out if they’ve never heard of it?

The importance of promoting rigorous research, and the clear and detailed writing-up of its fruits, cannot be overstated. Unfortunately, these recommendations are deeply-rooted in traditional printed forms of media. Publishing books involving genealogical research, or writing articles for academic journals, may attract more kudos — and may even be more profitable — but the readership will be smaller; the average genealogist will not be consulting those sources, and that is a loss in more than one respect.

Ideally, such work should be published online, not simply as a source of information but as a beacon to guide other researchers. We might all benefit from reading well-researched and clearly-presented write-ups from professionals, but most genealogists will never see one. Is there a reason behind this?

There is a gulf between printed and online genealogy that may be traced simply to technology, but one that is rapidly becoming a chasm. There is a perception of software genealogy as being related only to databases of conclusions. For instance, the following is a quote from Evidence Explained QuickLesson 20.

Step 4: Data entry?…this is the point—but not until this point—that we cherry-pick individual bits of data and record them in a spread sheet or other data-management softwareWe need only cut-and-paste them from our research report…[2]

In effect, that genealogical software plays little part in the research process, and is simply a repository for discrete so-called “facts” derived from the real research. So while a word-processor, blog, or wiki, might be employed by a user, they would not be considered genealogical software, and by implication any notion of a product that embraced both narrative and research aids could not be entertained.

Instead, most serious genealogists attempt to employ those good teachings in the area of the prevailing software tool: the family tree. This dilutes their intent to the mere association of a source citation with a “fact”, such as a date, name, or place. These could be construed as proof summaries, but that assumes that the evidence from those sources is direct and non-conflicting for each claim. It is no wonder then that online trees are still full of errors since a “fact” is worthless — no matter how many citations you offer for it — if it is for the wrong person. Although rarely done, proof arguments (the why rather than the where) could be provided in notes fields, but then bigger claims such as why a whole family upped and left for faraway climes would require real narrative to convey it properly. Hanging snippets of narrative off a conclusion-based tree is like putting the cart before the horse.

Furthermore, as I recently commented on one of James Tanner’s blogs (Important updates to the Website and with the Family Tree), a 'source' is a source of information that you've mentioned in a work (positively or negatively), not necessarily a source of so-called “facts”, and so the skewed usage of citations in online trees will eventually lead people to misunderstand about sources.

Despite this group collectively publishing many works, it does not embrace or direct any software group on its own behalf. The net effect of this slightly obvious statement is that it has no direct influence on software research, and so carte blanche is effectively given to the other groups.

Commercial Genealogy

On the face of it, access to digitised sources should be a windfall for every genealogist who has a computer. The benefits include immediate access to records that we might have to travel to see in another form, and faster searching due them being indexed on selected items of information. This is clearly progress but at what cost?

The commercial Web sites who host such records need to finance their digitisation, transcription, indexing, storage, and purchase of more data, as well as making a profit. Creating a mass-market genealogy was a fundamental requirement to make this work: too few users and the subscription cost would be too high; too complicated and it would put off the newcomers. In others words, that progress has only been possible by providing a simple model where you give the end-users masses of data to satisfy their searches, and some simple tools to make use of their finds.

That simple model is the ubiquitous online family tree. I believe this model was too simplistic, and out of necessity has since been distorted beyond the original concept, but more on that in Part II. For now, I want to highlight the fact that a simplistic model combined with mass-market advertising will undoubtedly redefine what genealogy is, and so it has been; it is now clear that the majority of genealogists equate the pursuit with family trees. Historical research and the determination of events in people’s lives have been replaced by a philatelic point-and-click collecting of names and vital-event dates and places. There is nothing wrong with online trees, per se, except that the concept has been sold to the public through relentless advertising until the majority of genealogists now talk about building family trees without so much as a blink. All their limitations and failings are then reflected negatively upon the pursuit of genealogy.

Collaboration is an essential element of genealogy; if you can’t share then progress is impeded and future generations are robbed of their histories. Being able to exchange genealogical data in a static file, such as GEDCOM, has fallen way behind modern requirements, mainly due to the inability of software genealogy to come up with correspondingly modern standards. It is left to commercial genealogy to support collaboration and sharing but that is then impacted by both their simplistic model and their commercial considerations. They can only share tree-based data — either unified or user-owned — and primarily with subscribers to the same site. Anyone who doubts that should try to contact a researcher on a site they don’t subscribe to, or add a constructive piece of information to their tree. I have written about other forms of collaboration, such as working on identification of census individuals (Collaboration Without Tears), but they would be so far removed from their existing model that they are dismissed as distractions. In the case of a unified tree then the desire to keep things simple has resulted in naïve models that spawn both edit wars and a diffusion of less-rigorous research into the collective effort.

Although FamilySearch does not strictly qualify for this category, I am including it because their software uses similar models. In particular, I recently queried their site’s conditions of use as it appeared to hinder collaboration involving research written-up on other sites, and I received the following response on 2 Sep 2016:

As you may know, nearly all of the records within the collection of FamilySearch International are governed by contracts between the original record custodian and FamilySearch. For most contracts, FamilySearch merely acquires rights for a patron to use the records for incidental, personal, noncommercial genealogical research purposes. This includes the right to extract factual data of the patron's direct family line and then reformat that data to add to the patron's personal family tree which the patron may then use as desired.

However, publication or distribution of the actual record images/documents (including via print or the Web) and wholesale indexing, transcribing, and/or translating of the records (even when these activities are for non-profit purposes) are prohibited under the contracts. Therefore, you must acquire written permission from the custodian of the original records before publishing an image of a record or document. Once this is accomplished, you may proceed as the record custodian directs. FamilySearch will have no further objections.

I'm pretty sure that this doesn't happen much in the real world, and that most people think images displayed there are in the public domain. For real collaboration, it should be possible for patrons to declare that their images or documents are available under a Creative Commons licence, rather than a blanket restriction and the expectation that patrons will respond to such written requests.

Next Step

I will wait until Part II to look at how these contributions have left us where we are now.

[1] Dead tree, Salton Sea, taken 16 Feb 2012; image credit: Dan Eckert ( : accessed 28 Jul 2016).
[2] Elizabeth Shown Mills, “QuickLesson 20: Research Reports for Research Success”, Evidence Explained: Historical Analysis, Citation & Source Usage ( : accessed 14 Sep 2016).