Thursday 7 April 2022

What a Mess

This will be my last blog post for the foreseeable future, and probably forever. This is not a matter of free time, or of advancing years, or even of competing tasks, but of a complete disillusionment with modern genealogy.

I will continue to research in my spare time, but this will be to produce something of standing for my family and extended family to read; I have lost faith in the public world of genealogy. But let me explain in more detail because this has been on the horizon for a while, and yet previously planned retreats have all fallen down for different reasons.

I am use to academic research, and how academic research works in other fields. The purpose of research in those other fields is to find answers — truths — and to produce a valuable collective body of work through collaboration. Virtually by definition, it is not a commercial goal.

I have previously pondered over the nature of genealogy (What is Genealogy?), and considered its difference from family history, but there is a more systemic difference that touches on collaboration, software, and commercial forces. Although genealogy has a well-respected academic side, it generally considers the internet, and digital resources in general, as only good for derivative sources such as images and transcriptions, and not for publications. This field has high standards and produces quality work in traditional publications such as books and journals, but the internet is considered inappropriate for publication due to its transient and ephemeral nature.

If we look to commercial genealogy then we see two quite different worlds: that of the generation of derivative sources that people can search, and that of online trees. Other than improving the search tools and options, I have no real criticisms of the many digitisation and transcription projects, but for online trees then I have many. In fact I have written so many articles on this subject that I won't even begin to enumerate links to them. Irrespective of whether we are considering "unified trees" or "user-owned trees", there are fundamental issues with their structure and the process by which they are generated.

In terms of structure, a tree is appropriate for representing biological lineage, but dreadful for representing history — can you imagine a family tree attempting to detail, say, the events of WWII? But non-biological lineage, such as fostering and adoption, or even weaker associations between people, break this visualisation and can result in a cat's cradle of complexity and confusion. A tree is also limiting in terms of proof arguments (particularly if they reference multiple individuals, families or generations), citations that refer to actual claims (as opposed to simple hyperlinks saying where you got your information), and linking to external resources (images or document scans) that are not specific to single individuals.

But worse than this is the process by which we are expected to construct such trees. We are all probably aware now of the variable quality of trees — although I still find it vexing when I see 'trees are not a valid source' (it depends on the claim) — and that trees can persist online long after someone may have dabbled for a few months using a free trial or a subscription birthday present. There is no responsibility taken by the respective companies for the accuracy of what their subscribers publish, and they appear to be disinterested in why academia looks down on these published works. It is impractical for these companies to fact-check stuff, and so I am not suggesting that is the solution, but they do not acknowledge, publicly, that the simple paradigm of building trees directly from their raw digitised records is naive (despite their advertising). There are many difficult cases of family reconstruction that require effort — possibly an enormous amount of effort — to get around missing information, ambiguous information, or even deliberately obfuscated information, and so make a case for what really happened in the past.

But two experienced researchers might reach different conclusions, both of which appear to fit available information, and so how should that be dealt with? Well, the red mist and edit wars commonly associated with "unified trees" are not the answer. If left to software people then they might suggest transactional get and commit operations, analogous to those in software source-control systems. If you don't know what these are then it's probably best not to ask; they're complicated, generally with horrible user interfaces, and even get software people into trouble.

Well, why don't these companies look at how collaborative research works everywhere else? I can't believe that they're ignorant of it, and so I can only assume that they fear it would be too complicated for their subscribers, or that it would cost them money, or even that it's just a huge step into the unknown and they don't want to kill their cash cow.

Collaborative research elsewhere is not a linear one-step 'raw-data leading to final conclusions'; it's stepwise, and involving prior work by other researchers. Researchers can then look at the work of others and build from it (or refute it). This means real written work, with real citations, is a starting point as claims have to be justified, not just by pointing to data that appears to confirm them, but by explaining why, and why not something else.

OK, so not everyone will be able or willing to produce such written work, but there are people who do, and regularly do so: bloggers. I have already made a case that online genealogy companies could take advantage of this in a way that requires minimal investment, would not run into copyright or attribution problems, and would increase traffic to the respective blogs — surely, a win-win (Blogs as Genealogical Sources). Briefly summarised, the author of a blog article would give permission to the genealogical company to list the corresponding URL in one of their databases, and would provide meta-data to ensure that it showed up in the results of appropriate searches. The genealogy company would store such information in a database of so-called authored works (i.e. the URL, name of author, article title, and meta-data), but would not copy the body of the works. When these works showed up in a genealogical search, the end-user would click on one of them in order to be directed to the original blog article.

Yes, there would be some smaller issues such as the rating these works, or citing them, and so on, but it's academic as there has been no subsequent engagement — Zero, with a capital Z — by any of the companies, including the ones I approached directly.

Modern genealogists rely on the search functions within these online companies, and possibly on Google (although woe betide we have to research a surname such as 'covid'), but they would be less likely to find relevant printed books or journal articles. This sort of scheme could even be extended to cover non-internet sources, but there is yet another possibility, one that flies in the face of the view that research has to be written up in paper-based journals.

People who have researched in other fields may be aware of sites such as arXiv.org (the 'X' is actually representing the Greek letter chi, and so the site name is pronounced as "archive"). These contain online articles, submitted online, and viewed online. They are much more accessible and searchable than the old paper-based journals, and it is entirely possible that this could be done for genealogical research, but it would take the initiative away from any forward-looking genealogy company. Does that matter to genealogists? Probably not as there are many searchable resources that do not fall under their control. Would it contribute to the accuracy and a truly collaborative approach in modern genealogy?

I wish I could be optimistic here, but I'm not!

9 comments:

  1. Tony, I find this incredibly sad, as your blog is something I have enjoyed and learned from for many years -- almost since it began. I'm not convinced that a blogger presenting the results of their research needs one of the online genealogy companies to validate their existence, but respect your opinion.

    ReplyDelete
  2. Thanks for support, Helen. I didn't mean that those companies would "validate" the existence of the bloggers, but that making their material more readily accessible would benefit both camps, as well as being a step in a better direction for true collaboration.

    ReplyDelete
  3. Tony, I understand and agree whole-heartedly with much of what you have said. But it would be a shame if voices like yours were no longer heard. I, too, always find your posts and emails of interest and value and hope you will continue to write, even though the vast majority of your audience may not understand the importance of proper research. We won't ever get everyone to do thing "right" but I am sure there are enough of us out here in genealogy-land to exchange and provide information in the most appropriate manner. I would also encourage you to submit this latest post for publication in one or more journals and magazine where it will have a bigger audience.

    ReplyDelete
    Replies
    1. More thanks for this support. I have been disillusioned for a while, and my articles have gradually become more critical and blunt, but to no effect. The people I would like to reach are still not listening. Thanks too for the suggestion. I might give that a shot, but my mind is made up.

      Delete
  4. I completely agree about the benefits, as long as the 'rating' issue can be oversome -- a bad blog post could perhaps do even more damage than a bad tree if it was seen as more definitive.

    ReplyDelete
    Replies
    1. I agree. A good work should not depend upon the name of the author, or their credentials, or even the length of the article, but on what is written and the strength of the case being made.

      Delete
  5. Tony, I agree with what you say, but am very sorry to hear that you don't plan to blog anymore. I've enjoyed so many of your posts (even though some of the more technical ones were over my head!) and hope you change your mind about blogging. Parallax View will remain on my follow list, just in case you do. :)

    ReplyDelete
    Replies
    1. Thanks, Linda, but it's time for me to move on. I will continue to support SVG-FTG because I use that myself.

      Delete
  6. Sad to see you leave blogging but very proud of all the work you put in over the years x
    Emerald.

    ReplyDelete