Thursday 6 December 2018

Research in Online Trees

My previous post, The Future of Online Trees, prompted a flurry of reaction, most of which was positive; however, it did suggest, implicitly, that many people find it hard to think beyond their entrenched views, and that my explanation may have assumed too much.

This follow-up collects together some of the explanatory comments that I'd since posted around the Web, and tries to make a coherent  argument for what genealogical research should entail.

The previous article made several negative comments about existing online trees, including:

  • That assembling their associated conclusions directly from raw digitalised information is not always easy, and gets very hard as you go further back in time (e.g. before census returns and civil registration).
  • That it's hard to tell naive trees from properly researched ones, and that, no, a bunch of citations are not a useful indicator.
  • That naive trees usually persist long after a creator may have  abandoned them, and could steer new researchers down the wrong path.
  • That proof arguments (i.e. reasoned explanation), as opposed to simple proof statements (i.e. citations), are almost never provided.
The primary issue behind my article was that many genealogical conclusions require their research work to be written up in order for them to be assessed, and for that research to be cited either by trees or other research work. Proof statements alone are only applicable when the sources offer direct answers and do not conflict with each other, but identity problems and family reconstruction can require lengthy arguments that examine multiple sources. The results will often address groups of correlated people rather than just some specific person, and so it's not realistic to expect that work to be tucked away in a single person entity (on a tree) or in a single person page.

An associated issue is that the contributors to online trees — and probably the users of genealogy software in general — routinely talk about individual "claims", and the supporting sources for those claims, as though they're all independent of each other. This is fallacy! The idea that a specific claim can be justified in isolation, and linked directly to one or more sources that give a direct answer, is a huge oversimplification of the research process, and yet this is a mindset that is hard to argue with.

One of the positive things I suggested in my previous article (possibly the only one) was that there are researchers who do publish their work online (e.g. in blogs), and that online trees could reference their research as "authored works": a recognised source category that supplements those of "original" and "derivative". There is no issue at all with representing this in GEDCOM files — the data format most often used to transfer data between two places — nor any significant issue with online providers recognising such work as a specific source category.

Many traditional genealogists write-up their work in academic journals, but this is more about kudos than about helping a  community of genealogists; few of us will be subscribers to these journals. This is a shame because they cannot distance themselves from online genealogy, nor ignore the associated problems, because we're all tarnished with the same brush. If we describe our work as "genealogy" then it will be linked automatically to the prevailing impression of its most common form: online trees.

It may be hard to see what I'm getting at if you haven't participated in research in other fields. All the fields that I am aware of, such as in science and medicine, rely on published works. This could be in journals or online, but by far their biggest difference from what is currently considered genealogical research is that newer works cite older works. The consensus is then built up through layers of research, each of which may support or refute previous work. There's a saying about standing on the shoulders of giants, and it makes perfect sense: someone could have spent a lifetime solving one particular deep mystery, and so to expect someone else (beginner or otherwise) to find the same answer directly from raw online information is unrealistic. I cannot think of any other area of research that works as genealogy currently does, and where conclusions are either copied blindly from those of someone else or constructed independently from raw information. This is a little like a surgeon creating an independent textbook themselves by simply dissecting the evidence — a cadaver in this case. Knowledge and progress come from sharing research, and by building on the research of others. It's step-wise, progressive, and takes time. And without seeing any written research then you cannot tell whether someone made their conclusions in 30 minutes or 30 years.

So what size of work are we talking about? Is it just a single paragraph? Well, it could be, or it could be a couple of thousand words, as with several blog articles that I've encountered. I have two unpublished works of 5000 words, myself, that I want to contribute to the community, and for posterity, but also a work-in-progress that is already at 10,000 words — such is the complexity.

A note on the use of wikis as a medium for collaborative research is necessary because they were mentioned by a few people. It is true that wikis can be, and are, used for such research purposes, but they have significant weaknesses. They are often limited in the richness of their presentation — usually amounting to more of a protracted discussion, as the old BetterGEDCOM wiki demonstrated — but genealogical research requires support for rich formatting, images, tables, and citations. Not all blogs offer this, but there are usually ways of achieving it (see Summarised Blogger Tips for instance). Wikis have little, if any, editorial control, and no attribution support beyond their confines. Also, that they constitute a confined medium — forcing people to contribute outside any personal medium or prior work — would put too many people off. By contrast, blogs are not confined, they may be linked or associated with other work by the author, and their articles have immediate attribution. Wikipedia was also mentioned as an example of successful collaboration, but it has strict rules that prevent original work or theories being presented. It relies on secondary sources, and so implicitly collects information that is already in the mainstream. This certainly doesn't prevent edit wars but it does place it apart from collaborative wiki-based genealogy.

So, my suggestion is to separate attributable research work from tree-based conclusions, and to cite such work for the harder cases rather than just some raw information. This suggestion is not rocket science so why aren't we doing it?

1 comment: