Wednesday, 2 December 2015

Our Days of Future Passed — Part II

After bringing things up-to-date regarding STEMMA V4.0 in my previous post, Our Days of Future Passed — Part I, I now want to expand on its support for narrative genealogy.

When asked what this is, most people would respond that it is story telling. It is true that recounting stories of our own experience or recollection would be a part of this, but there is still more. Unfortunately, the terminology for distinguishing types of authored work is something of a minefield with many different terms being applied inconsistently. I will keep with the following terms in this article, which I hope will be meaningful and acceptable to readers:

  • Narrative essays: typically contain personal non-fictional storytelling for the purpose of sharing an experience, recollections, or a point of view.
  • Narrative reports: write-up of research, analysis of information, conclusions, etc., using a narrative format. That is, weaving the research process into a description of the events uncovered. This type of narrative would generally be from the point of view of source information rather than personal experience.
  • Research report: formalised report of a specific research assignment, usually for a client. It might record everything that was searched, who & what was searched for, everything that was found, everything you hoped to find but couldn’t (with reasons), all the negative results, analysis, and a research plan for future work.
For completeness, research notes are  the sum total of everything we know about a person, or other subject, expressed as raw records with separate commentary, and in an easily accessible typescript form. This is the accepted meaning, but I suggest that this disorganised concept stems from inadequate digital representation.
Let’s just focus on the first two of these for now: narrative essays and narrative reports, the difference of which might be blurred if researching something that has its roots in living memories. Where would we write them? The main answers would probably be: blogs, dedicated Web sites, or word-processor documents, but are they sufficient?

The problem with those approaches is that your narrative is then disconnected from any other type of data, and not described by a data model. On its own, it probably doesn’t need a data model, but if you want to integrate it with anything more structured — which basically means all that multi-linked data that I described in Part I — then it is essential. Before I can explain that integration, I must first describe some of the advantages and disadvantages of narrative.

Narrative Essays

Natural language is a very rich means of expression, and it can be used to describe events, circumstances, objects, people, emotions, analysis, evidence, and conclusions. Furthermore, this can be done objectively, in a matter-of-fact way, or with the elegance and beauty of seductive prose — whichever is appropriate for the material. Remember, too, that having some natural-language content is essential if you want to share your history with friends, relatives, or peers. If anyone believes that template-generated sentences, where software inserts discrete data values into some stock template, constitutes narrative then they need to read more books. There are no rules for whether these contributions would have to be about a particular person, family, or other subject; they could cover any historical topic, and even describe your research and reasoning in arriving at your picture of the past.

However, narrative also has a disadvantage: it is sequential. While a master of the art could take you on a journey using just their words, it is still a pre-prepared journey that has to be followed in sequence; you cannot easily navigate your own way around the information in the story. By contrast, the multi-linked entities in Part I would allow you to navigate their hierarchies (e.g. lineage), events, timelines, and geography — together, and with no restrictions.

Now imagine that both of these were combined, and that you could, for instance, navigate from a person reference, or a place reference, found in some narrative, to its respective hierarchy, then to a nearby entity in that same hierarchy, and finally to a mention of the new entity in some other narrative.

Navigating between narrative and hierarchies
Figure 1 - Navigating between narrative and hierarchies.

STEMMA effectively integrates separate narrative articles (“pieces of non-fictional prose that is an independent part of a publication”) with its multi-linked data describing hierarchies, events, geography, sources, etc., and this allows the freedom to navigate between all of them. Note that the links between subject references (in the narrative) and subject entities can be considered bi-directional, and so can be navigated in either direction.

In effect, I’m saying that narrative supplements that multi-linked structured data, and that you cannot truly represent the past without it. I recently found a real-world analogy to this synergy after talking with Brian Miller, CEO of His company produce ceramic outdoor plaques that can be associated with a gravestone, or other memorial marker. They include a QR code that allows a passer-by to scan it and see stories of that person’s history, and so breathe life into what would otherwise be simple names and dates, possibly with brief relationship details such as “wife of”, “husband of”, etc.

Source Information

If you’re serious about genealogy then you will be interested in original documents — including copies or derivatives thereof — and maybe even authored works by other people. Just as narrative held in a separate location is not making good use of it, then neither is keeping only images or other facsimiles of those documents. What I’m about to talk about, here, is transcription, and including those transcriptions with the rest of the data.

It’s tempting to think that a transcription is simply text, and so not fundamentally different to narrative. However, there are many additional issues to consider, such unusual or erroneous spelling, unknown or uncertain words, insertions and deletions, emphasis, marginal notes, footnotes, and so on. Capturing the essence of these in a transcription is essential if you plan to study it, or even to understand it properly.

Electronic documents use a system of mark-up, analogous to original manuscript mark-up, that embeds information or instructions within the text. This may be presentational mark-up that gives instructions on how to present something (e.g. that a word or phrase should be in italics), or semantic mark-up that associates meaning or other information with a word or phrase (e.g. that a phrase is actually a hyperlink that must take you to a given URL). STEMMA uses such a system in its narrative support, and a large part of it is common to both authored work and transcription, but a smaller part is also specific to transcription. Since authored work will often need to quote transcribed text then both features are actually provided by the same rich-text narrative tool. For the masochists amongst you, an example showing STEMMA’s mark-up being applied to an evidence-of-age document may be found at: Transcription Anomalies.

Semantic Mark-Up

In order to introduce semantic mark-up, let’s begin by looking at a person reference such as “Tony Proctor”. When producing, say, a narrative essay then the mark-up allows the author to ‘generate a reference to the Person entity whose identifying key is such-and-such’. There are options allowing a choice of formal/informal name, or even some custom description of the person. As well as inserting the selected name into the text, this also marks it as a person reference.

This approach can be applied to all of the STEMMA subject types: person, animal, place, and group. It can even be applied to the names of events, or to raw dates.

Relationship between subject entities and narrative
Figure 2 - Relationship between subject entities and narrative.

Note that this diagram illustrates how each of the subject entities is still independently connected to its respective hierarchy and shared events. The subjects might be referenced in many separate narrative articles, but these could be found directly from those hierarchies and events, or vice versa.

Now let’s switch to a person reference encountered during a transcription. In this case, the subject reference was already present, and the goal of the mark-up is simply to tag it as a person reference, etc. Note, however, that a subject reference is not necessarily the same as a name; phrases such as “my grandmother”, “my dog”, “his regiment”, or “their home” are all valid examples of subject references — for person, animal, group, and place, respectively — but none are names. This may also apply to dates since phrases such as “next year” or “last week” are still date references.

This observation leads to a choice by the transcriber: either a subject reference can be connected to an existing subject entity, or left as a reference to some unidentified or incidental subject. STEMMA terms these options deep semantics when we want to make the association and shallow semantics when we simply want to mark it as a reference to a subject of a given type. This choice also applies to date references where we may not be able to identify the date value beyond reasonable doubt, but we still know that it’s a date.

Narrative Reports and Research Reports

A narrative report will probably have a slightly more academic approach than a narrative essay, and one of the most important requirements will be for source reference notes and for general footnotes/endnotes. Since a narrative report, by definition, will involve researching information from a number of sources then it should include traditionally formatted citations for them. STEMMA’s mark-up allows the production of citations, and these may be generated directly in a corresponding footnote/endnote, or inline with your other text so that more complex (possibly multi-source) citations can be placed in a custom footnote/endnote. Illustrations of this may be found in: Cite Seeing.

Research reports are a more emotive issue, but bear with me. I do not have any figures that indicate how many professionals disseminate their research reports via paper or electronic means, but all will undoubtedly have been created using a word-processor. Reactions to the suggestion that a research report could be disseminated in some computerised form, other than from a word-processor, are largely based on the fear that it would somehow limit freedom of expression, or force the use of some database, but these are unfounded.

STEMMA’s rich-text narrative does require a specialised word-processor tool so, for a moment, let’s assume that this tool was freely available. It has the same capabilities for layout, formatting, tables, pictures, and reference notes, as do most word-processors. If you had to use that, and the recipient also had a corresponding reader, then there would be no loss of freedom. However, the STEMMA version would also allow subject references to be flagged, and formatted differently to the surrounding text — no more need to use that horribly non-international approach of uppercasing surnames. Furthermore, those subject references could be clickable, and could take you to pictorial representations of some event or lineage information that was uncovered. That structured information could also be lifted out of the report by a compliant genealogy product since it would understand the same format; there would be no need for the client to mess around trying to cut-and-paste pieces for data entry.

There are many advantages to this approach, but I also know that it’s currently a step too far for some readers. Until such a format becomes as ubiquitous as our word-processor formats then it will remain in the abode of digital dragons and uncharted territory.

Making use of Narrative

So is this just about exploration? What about searching? While it is possible to search word-processor documents, or blogs, you have to know exactly what to search for, and this is fraught with problems for subjects with alternative names. Searching marked-up narrative means that the software can automatically check all the alternative modes of reference without you having to know them. Not convinced? OK, consider this problem that I had several times before my software had matured sufficiently. You come across a surname, or someone asks you about a surname, and you want to search all the persons you have in your data, all their aliases, their maiden names, alternative spellings, and find all the narrative and transcribed references to them. Furthermore, you want to check previously unidentified people or incidental people. And you don’t want to worry about ambiguities such as tailor/Tailor and baker/Baker, or confusing the surname London with the place of the same name. That’s a powerful feature, but it’s then quite easy to perform. I would even consider buying a product based on that one capability … had I not already done it.

There will undoubtedly be a researcher amongst you who will ask ‘what happens if I suspect who a person reference is to, but I do not want to connect it directly to a corresponding person entity; instead wanting to build a case for why it is that person’. Well, full marks to whoever asked that!  It was one of the last core pieces to appear in the STEMMA specification.

The subject of connecting a person reference (or other subject reference) to the logic, and to prototype persons[1], before making a concluding link to a person entity, will be the subject for Part III in this series. This will also discuss why reasoning must be expressed using natural language, and not in some wholly formalised computer-speak.

[1] A prototype subject begins as the details of some initial subject reference — effectively a persona in the case of persons — and possibly being merged with other prototypes before being connected to a subject entity.