Thursday 7 April 2022

What a Mess

This will be my last blog post for the foreseeable future, and probably forever. This is not a matter of free time, or of advancing years, or even of competing tasks, but of a complete disillusionment with modern genealogy.

I will continue to research in my spare time, but this will be to produce something of standing for my family and extended family to read; I have lost faith in the public world of genealogy. But let me explain in more detail because this has been on the horizon for a while, and yet previously planned retreats have all fallen down for different reasons.

I am use to academic research, and how academic research works in other fields. The purpose of research in those other fields is to find answers — truths — and to produce a valuable collective body of work through collaboration. Virtually by definition, it is not a commercial goal.

I have previously pondered over the nature of genealogy (What is Genealogy?), and considered its difference from family history, but there is a more systemic difference that touches on collaboration, software, and commercial forces. Although genealogy has a well-respected academic side, it generally considers the internet, and digital resources in general, as only good for derivative sources such as images and transcriptions, and not for publications. This field has high standards and produces quality work in traditional publications such as books and journals, but the internet is considered inappropriate for publication due to its transient and ephemeral nature.

If we look to commercial genealogy then we see two quite different worlds: that of the generation of derivative sources that people can search, and that of online trees. Other than improving the search tools and options, I have no real criticisms of the many digitisation and transcription projects, but for online trees then I have many. In fact I have written so many articles on this subject that I won't even begin to enumerate links to them. Irrespective of whether we are considering "unified trees" or "user-owned trees", there are fundamental issues with their structure and the process by which they are generated.

In terms of structure, a tree is appropriate for representing biological lineage, but dreadful for representing history — can you imagine a family tree attempting to detail, say, the events of WWII? But non-biological lineage, such as fostering and adoption, or even weaker associations between people, break this visualisation and can result in a cat's cradle of complexity and confusion. A tree is also limiting in terms of proof arguments (particularly if they reference multiple individuals, families or generations), citations that refer to actual claims (as opposed to simple hyperlinks saying where you got your information), and linking to external resources (images or document scans) that are not specific to single individuals.

But worse than this is the process by which we are expected to construct such trees. We are all probably aware now of the variable quality of trees — although I still find it vexing when I see 'trees are not a valid source' (it depends on the claim) — and that trees can persist online long after someone may have dabbled for a few months using a free trial or a subscription birthday present. There is no responsibility taken by the respective companies for the accuracy of what their subscribers publish, and they appear to be disinterested in why academia looks down on these published works. It is impractical for these companies to fact-check stuff, and so I am not suggesting that is the solution, but they do not acknowledge, publicly, that the simple paradigm of building trees directly from their raw digitised records is naive (despite their advertising). There are many difficult cases of family reconstruction that require effort — possibly an enormous amount of effort — to get around missing information, ambiguous information, or even deliberately obfuscated information, and so make a case for what really happened in the past.

But two experienced researchers might reach different conclusions, both of which appear to fit available information, and so how should that be dealt with? Well, the red mist and edit wars commonly associated with "unified trees" are not the answer. If left to software people then they might suggest transactional get and commit operations, analogous to those in software source-control systems. If you don't know what these are then it's probably best not to ask; they're complicated, generally with horrible user interfaces, and even get software people into trouble.

Well, why don't these companies look at how collaborative research works everywhere else? I can't believe that they're ignorant of it, and so I can only assume that they fear it would be too complicated for their subscribers, or that it would cost them money, or even that it's just a huge step into the unknown and they don't want to kill their cash cow.

Collaborative research elsewhere is not a linear one-step 'raw-data leading to final conclusions'; it's stepwise, and involving prior work by other researchers. Researchers can then look at the work of others and build from it (or refute it). This means real written work, with real citations, is a starting point as claims have to be justified, not just by pointing to data that appears to confirm them, but by explaining why, and why not something else.

OK, so not everyone will be able or willing to produce such written work, but there are people who do, and regularly do so: bloggers. I have already made a case that online genealogy companies could take advantage of this in a way that requires minimal investment, would not run into copyright or attribution problems, and would increase traffic to the respective blogs — surely, a win-win (Blogs as Genealogical Sources). Briefly summarised, the author of a blog article would give permission to the genealogical company to list the corresponding URL in one of their databases, and would provide meta-data to ensure that it showed up in the results of appropriate searches. The genealogy company would store such information in a database of so-called authored works (i.e. the URL, name of author, article title, and meta-data), but would not copy the body of the works. When these works showed up in a genealogical search, the end-user would click on one of them in order to be directed to the original blog article.

Yes, there would be some smaller issues such as the rating these works, or citing them, and so on, but it's academic as there has been no subsequent engagement — Zero, with a capital Z — by any of the companies, including the ones I approached directly.

Modern genealogists rely on the search functions within these online companies, and possibly on Google (although woe betide we have to research a surname such as 'covid'), but they would be less likely to find relevant printed books or journal articles. This sort of scheme could even be extended to cover non-internet sources, but there is yet another possibility, one that flies in the face of the view that research has to be written up in paper-based journals.

People who have researched in other fields may be aware of sites such as arXiv.org (the 'X' is actually representing the Greek letter chi, and so the site name is pronounced as "archive"). These contain online articles, submitted online, and viewed online. They are much more accessible and searchable than the old paper-based journals, and it is entirely possible that this could be done for genealogical research, but it would take the initiative away from any forward-looking genealogy company. Does that matter to genealogists? Probably not as there are many searchable resources that do not fall under their control. Would it contribute to the accuracy and a truly collaborative approach in modern genealogy?

I wish I could be optimistic here, but I'm not!