Thursday, 29 September 2016

Reaping What We Sow — Part II



Has genealogy found itself in a rut? Part I of this article looked at the major contributors to modern genealogy so I now want to examine the repercussions of their efforts and question whether, collectively, they’ve resulted in good or bad for us as users.

Digitised Sources

Back in Internet Genealogy - is this progress?, Janet Few mentioned that many end-users do not venture beyond the brief details that have been transcribed for them, often doing no more than saving any images to their computer. As well as wasting any un-transcribed content, this would also mean that they have not verified the transcribed portions. Since only enough portions are transcribed to support a database index then valuable information may be missed. How many researchers look at the neighbours in a census, or travelling companions in a passenger list? But not all online sources have accompanying images, and so end-users are then expected to blindly accept someone else’s limited transcription.

A converse to this situation occurs with certain newspaper archives. Although they will have used OCR to generate text from the image of a whole article, as opposed to merely selected items, they do not allow the end-user to save that fully searchable and editable text; instead, expecting them to simply save the image or transcribe it themselves. In other words, rather than relying on someone’s transcription, end-users are then forced to rely on the image and make their own transcriptions (if any).

Full and accurate transcription may be an effort but also so valuable, and so why are there so few tools that support it, or standards for its representation? What we have here is a failure of software and commercial genealogy to appreciate the importance of a transcription, and its relationship to an image copy of the original. Particularly in the newspaper case, there is no excuse whatsoever for not providing access to the already transcribed text. But even if you have a full transcript of a document, your online trees offer no way to keep that and the associated image together as a single item associated with your source reference. Where are users expected to include an analysis of such a source?

Finally, it may be difficult-to-impossible to determine the provenance of the digital information (image or transcript). Although this is gradually changing, that change has been the result of pressure from traditional genealogy, and generally from research-orientated genealogists who need that information in order to make a considered argument. Maybe this was initially considered too complicated for the host’s intended market, or maybe they just didn’t understand genealogical research to that extent.

There are adequate technologies for recording provenance, and other information, in the image itself, in meta-data that would be invisible to the end-user but not to the software. What is missing is an agreed standard from software genealogy and commitment within commercial genealogy. What we have is zilch! A tentative proposal was produced within FHISO but there was no feedback or discussion over it.

Online Trees

Software is all about organisation and visualisation of your data, and these are quite different to each other. In effect, the organisation helps make data accessible, but the visualisation is designed to convey information to the end-user. One organisational schema will typically support multiple visualisations, dependent upon the perspective being studied, the regional settings and preferences of the user, and the sophistication of the product.

A family tree is essentially just one way of visualising lineage — another being a pedigree chart — but the simplistic model used to attract the mass market has made it into an organisational structure upon which anything and everything is expected to be hung. This effectively treats a tree like a wardrobe full of hangers, and results in an inappropriate organisation of the data.

But what is meant by “your family tree”? A search for that phrase suggests there are hundreds of thousands of references, combined with verbs such as: find, discover, climb, Google, trace, flesh-out, grow, build, create, and research. Do we all have just one family tree? Obviously not, unless you believe that only male ancestors carrying your own surname are significant in your history; however, even if we accept that we have multiple family trees then where do we put step relations, half siblings, adopted or foster parents? Where do we put incidental people who may have been so important? Where do we record non-vital historical events in all those lives?

Surprisingly, many casual genealogists do look only at their “surname tree” and commercial genealogy eagerly accommodates that, but what happens if you want to store the lineage associated with every paternal and maternal branch of each generation? How many distinct trees are we limited to?

What happens, too, when you want to attach a document or photograph involving multiple people, maybe from distinct trees or even from none of your trees? If you’re not forced to duplicate it then are you allowed to see all the persons it is attached to? What about shared events where multiple persons were involved, such as a wedding, or even a census? Merely attaching the same item to multiple people leaves no room for a representation of the event and its history. Such requirements should be common sense, but the unswerving adherence to using trees as an organisational concept means that they’re hard to accommodate.

Documents or images relating to people on different trees
Figure 1 – Documents or images relating to people on different trees.

As an aside, notice that the linkages shown in this diagram are bidirectional. Yes, it is possible for specific parts of an image to be connected to entries in a tree — or some other organisational framework — a little like being tagged on a social-networking site. As mentioned already, the technology is there but the standards and commitment are not.

Maybe the biggest criticism of online trees is that they’re conclusion-based; they usually represent someone’s conclusions based on absent evidence. Support for source citations began to appear when it became obvious that unsubstantiated conclusions were propagating like a virus, but then citations are insufficient unless the evidence is direct and non-conflicting. Where is the incentive for writing any justification? Is it incompatible with the mass-market objective of commercial genealogy?

Trees are generally about discrete data: names, vital-event dates and places, and biological relationships. When combined with the point-and-click ease with which search results can be added to a tree then there’s no room for justifying an operation. If the expected name isn’t visible in the search results then it is a brick wall for many people, and similarly if there are too many close alternatives. That simple model offered by the commercial sites presumes that the discrete data you require are all in their records, somewhere, and that you’ll just be assembling them into your tree. From that perspective, a citation serves only to say where a datum came from, and not why you believe it to be relevant or correct. In effect, online trees bypass huge parts of the research process. A source-based approach would fix that, and yet still allow the tree as a visualisation of the underlying data.

Collaboration

Ask any group of people what they think collaboration means and the majority will mention unified online trees or exchanging GEDCOM files. They’re not wrong but there is more: there are forums, groups, message boards, and wikis devoted to helping people and that is also a form of collaboration. Although collaboration existed before Internet genealogy there are many more possibilities now, but do they really help?

Let’s try and breakdown the types of collaboration into some broad categories.

  • Operational: helping others with questions about the how, why, or where. This is one of the biggest uses of the groups and forums.
  • Research: working with others on a given research topic, such as a family or a surname. Working with other family members qualifies, but so too would one-name and one-place groups, as would any crowd-sourcing initiative. I will also add unified trees to this category.
  • Publication: effectively sharing work that we’ve already done with others who may be interested. Examples include blogs, dedicated Web sites, and user-owned trees.

It’s interesting to compare which of these are currently supported within commercial genealogy. Most sites provide public user-owned trees and some provide unified trees. Not all offer a community area for operational collaboration, only FamilySearch offers an area for “memories” that can be linked to entries in trees, and Findmypast have no offerings that fit these categories. But who supports collaborative research? Even collaborative publication is not supported well.

In Part I, I mentioned that sites could allow their patrons to upload images or documents under a Creative Commons licence, and that this would help researchers elsewhere to make use of them with appropriate attribution. My emphasis is to indicate that those sites invariably have an insular approach to collaboration. They are not really concerned with non-subscribers, but it’s a two-way street and they stand to benefit from a little more vision.

Let me present an example. I recently published an article entitled A Sad Career in which I researched the short life of a girl who was not related to me at all. I therefore do not have a tree for her family, but I still want to share all my research with descendants of that family. Try as I did, the best I could do was to add a link to a tree on Ancestry, but I could not contact the owner there or elsewhere. It was about this time that I suggested people like me could volunteer meta-data for their articles (or Web pages) that would allow these sites to make them freely searchable (see Blogs as Genealogical Sources). Making written research articles available as another type of source means that these sites could retain their existing focus on family trees, and would not have to embark on a major enhancement to support narrative locally.

As that article states: this proposal should be a win-win for all concerned; however, commercial genealogy has not even acknowledged it. The worrying aspect to this is that if, as these commercial sites claim, they have genealogists on their teams then they would be well aware of the written suggestion and subsequent discussions. Alternatively, they must be in an ivory tower.

Software Tools

There are a couple of software tools that currently acknowledge evidence and its respective sources, but I am not aware of any that allow you to work upwards from the information in sources without a precisely predefined goal. I have discussed support for source-based genealogy at Our Days of Future Passed — Part III, and a particular way of working at Source Mining. The difference is that it dissects a source to identify information to be associated with several persons (rather than searching for specific details for a given person), and with source-mining it assembles the history of a person, family, place, etc., from all the information that can be found for it across multiple sources. All the same issues of handling source references, conflicts, and interpretation apply but the goal is much more general than finding, say, a birth or marriage. Under Digitised Sources, above, I suggested that a scaled-down version of this approach would also apply when building family trees, but an inappropriate data organisation might render the approach difficult-to-impossible to implement.

I recently had to make contributions to online trees at Ancestry and FamilySearch and was shocked to find how hard it was to do something as simple as take a given census page and associate all the details of a household with the respective tree entries. I had to switch from person to person and continually repeat myself, but when looking at it from the perspective of the census event for that household then it is so much more natural — the citation is the same; the event, place and date are the same; the people are usually part of the same family; proof arguments relating to identification of the family, analysis of errors, etc., are the same. So, as well as the existing ergonomics being very poor, the obvious place to hold a written source analysis is stolen away.

But what else has been missed by our tools? In addition to source-based genealogy, that series of posts beginning at Our Days of Future Passed — Part I identified the following alternative approaches that have been tried within software genealogy: arboreal (tree) genealogy, event-based genealogy, and narrative genealogy. One of the main thrusts of that series was that these should not have to be exclusive alternatives. As already explained, the many possible visualisations, including the user interactions with them, must be supported by a separate organisational framework, which in the STEMMA case is a single all-embracing one.

Another thrust was that of generalising the subjects of our research from people to include places, groups, and animals, including an orthogonal treatment of events, evidence, sources, narrative, and hierarchical relationships. This level of generalisation was the reason why I prefer the term micro-history to either family history or genealogy.

There are many variations in the way that software tools can be written, with each having its own benefits, as recently detailed by Tamura Jones in a series of posts related to genealogical software choices. In Do Genealogists Really Need a Database?, I justified my elimination of a relational database on the basis that:

  • they use indexes that are inefficient  and not a good fit for historical data,
  • they limit sharing because each product has its own schema,
  • they risk data loss as disk-based linkages may become corrupt,
  • they force an extra stage before data can be accessed or viewed,
  • they reduce the longevity of the data as the database is tied to the longevity of a specific product.

As that same article concludes, having an all-embracing representation in a file that doesn’t require a database gives a special degree of freedom that genuinely helps with sharing. These genealogical contributions, or “bundles” as I refer to them internally, may be likened to Word, Excel, and Powerpoint files in the Microsoft Office paradigm — so termed not because Microsoft invented it but because Office is a ubiquitous example of it. The essential elements of this paradigm are that files can be directly exchanged and immediately visualised by recipients, and that other products may use an API to access the associated data for alternative visualisations.

Visualising Excel data
Figure 2 – Visualising Excel data.

If an Excel spreadsheet were received in an email then you could immediately click on it to see its information in a registered viewer. This would be the Excel tool in this case, but note that an Adobe PDF file could use a dedicated read-only viewer that is freely available; you only need a licence if you want to create such files yourself. Alternatively, software developers can write other tools that use the publicly documented API to present information in different ways, and possibly to supplement suites of proprietary software.

Visualising STEMMA data
Figure 3 – Visualising STEMMA data.

These failures by software genealogy may be linked to the prevailing notion of genealogy as family trees maintained in databases. A particular concern of mine is that this model restricts what parts of our history we can leave for the future, and how long its digital representation will be accessible. Consider that a JPG image file is backed by an international standard that is publicly accessible, whereas a genealogical database generally has a proprietary schema implemented in a proprietary database engine (I can list several such engines that are no longer available).

Narrative

In Part I, I suggested that the fundamental medium of narrative is poorly supported by our tools, and that it is either squeezed into some internal (to a tree) plain-text field, or relegated to some external tool such as a word-processor, blog, or wiki. I know of no public tool that supports the integration of narrative — whether for research write-ups, stories and memories, proof arguments, or transcription — with trees, timelines, geography, and other forms of genealogical visualisation. Again, I am making the distinction between organisation and visualisation so your first reading of this paragraph may be misleading.

The concept of genealogical narrative has been tainted by the laughable claims of some products that they can generate narrative from discrete data in trees, and I want to distance myself from that. As an example, the ProGen manual for professional genealogy implies that software combined with narrative must mean template-generated robot-speak,[1] and this perception will spread as long as such claims exist.

There may be a more subtle issue with narrative which relates to its usage on the Web. In Our Days of Future Passed — Part II, I started distinguishing narrative essay from narrative report, and rather more recently noted that ProGen (p.354) also used the term narrative report in the same fashion, albeit with negative connotations. A forum post from June 2016, entitled Hereinafter Unsure, began as a simple question about potential confusion with the use of hereinafter in a citation, but quickly degenerated to a circular debate about writing for the Web and the use of this same term. The term is widely used outside of genealogy, but I’d adopted it to describe the format of my own research articles. These articles are certainly not research reports, which are far more rigorous and intended to describe research undertaken for a client. Neither are they historical accounts, which would be wholly about lives or events, or even case studies, which might be more appropriate in an academic journal. My usage reflects the fact that they embrace both the research journey and the uncovered history in a single narrative form, together with identification of sources and analysis of evidence. It soon became clear that there was a deep difference of opinion over whether a readable account of research, plus the inclusion of evidence analysis and citations, were a bad combination; furthermore, that the latter belong in scholarly journals and books rather than in narrative shared on the Web or with family. My belief is that the Web is the primary mechanism for sharing our narrative, and that such material will be found by corresponding search operations rather than reading journals. If true then the proper handling of sources and evidence makes the difference between more throwaway genealogical claims and something of value that can be used and cited by others. The sad element of this is that online narrative would provide the missing venue for the full use of the good methods taught within traditional genealogy, the very ones that are now being poorly applied to online trees.

So is there a general feeling that the majority of genealogists are not up the challenge of a written account? I sincerely hope not. In fact, I believe that the majority would not only be capable, they would welcome the encouragement from both commercial genealogy and traditional genealogy. It is true that there are some so-called genealogy police who can deliver heavy-handed criticism, and they will effectively discourage these people. But remember two things: experts who know this craft inside-out did not arrive by parachute out of thin air — it took them time to achieve it — and these people will not be writing for academic publications. I personally don’t care if such work isn’t grammatically perfect, or without every citation perfectly crafted and punctuated, as long as all the details and reasoning are captured. People can always learn to do it better, but presuming it to be the preserve of academics is a self-fulfilling prophecy.

Conclusion

The price we’ve paid for having access to online records was giving genealogy a mass-market appeal — meaning ease and simplicity — but the momentum associated with that market has changed the face of genealogy, probably forever. It’s now very difficult to take a step back and look at what might have been achieved with more vision and in the absence of this legacy.

There are many genealogists who have only ever known computer-based genealogy — using the Internet and/or local applications — and their perception will therefore have been determined by the currently available sites and products. A case in point is that many of these users do not know how to handle conflicting information from different sources, such as ages in different census returns — not because it’s too complicated but because their tools offer them the wrong orientation. If their family tree only accommodates conclusions then it’s hardly surprising if they resort to creating multiple birth events.

At some point, the software industry was always going to look at genealogy as a market for new applications, but what would prospective software developers currently see: family trees. It is hardly surprising that their data models, databases, and the products themselves, all try to deliver variations on this theme.

In an episode of Mondays with Myrt on 10 Aug 2015 (timestamp 1:14:20), I suggested that genealogy was serving the interests of the software industry, rather than software serving the interests of genealogists. Had genealogy not turned into a big enough market then the software industry would have selected some other field of endeavour to focus its talents on. Unfortunately, it’s still rare to find professional software designers who are also research-orientated genealogists, and I believe this is hindering further progress.

These endeavours (software and history/genealogy) would almost certainly have been exclusive career choices since they would each have required a wealth of experience to become truly proficient, and each would have required a quite different academic background. On top of this, both have their own jargon, but with much room for confusion and ambiguity when participants interact.

The main problem is one of education, but not the traditional forms such as how to use a given Web site, or how to use a given product, or even attaining qualifications such as BCG certification; there is a lack of reciprocating knowledge, and vision, in both software genealogy and traditional genealogy. Software genealogy certainly has a naïve and overly-simple view of its goal, and so it needs input as part of that education. I have mentioned a perception within traditional genealogy, though, of software as necessarily conclusion-based, and possibly even a mistrust of its capabilities or reliability. It’s a poor analogy but one that everyone can relate to: no one would now prepare a document using pen-and-paper before entering it into a word-processor. A word-processor is a complicated program, but it has evolved to the point where almost everyone can use it to some level of productivity, often with no documentation.

Traditional genealogy acknowledges the power and importance of narrative, for all its many uses such as the handling of evidence and inference, but it does not sell this notion within the digital world. As a result, the requirement has not been picked up by software genealogy or by commercial genealogy. This one deficiency, alone, can be linked to poorly researched trees, a dearth of reasoning and justification on these trees, and a disrespect of genealogy by academic historians. It is welcome, therefore, to see kindex — whom I met at RootsTech 2016 — progressing with their mark-up for narrative; and also to hear FamilySearch reinventing itself and emphasising stories and memories over trees — although I hope they can also see a place for fully researched, reasoned, and sourced narrative.

So where does this Gordian knot leave me? Well, I admit that I struggle to find common ground. Software genealogy is too concerned with something different to what I'm doing; traditional genealogy has taught me much, but it appears uninterested in its future within a digital world; commercial genealogy provides me with valuable data, but I fear it is not interested in what I really want to achieve. Someone please tell me that I'm not alone!




[1] Elizabeth Shown Mills, ed., Professional Genealogy: A Manual for Researchers, Writers, Editors, Lecturers, and Librarians (Baltimore: Genealogical Publishing Co., 2001), p.355; book hereinafter cited as ProGen.