After bringing things up-to-date regarding STEMMA V4.0 in my
previous post, Our
Days of Future Passed — Part I, I now want to expand on its support for narrative genealogy.
When asked what this is, most people would respond that it
is story telling. It is true that recounting stories of our own experience or
recollection would be a part of this, but there is still more. Unfortunately,
the terminology for distinguishing types of authored work is something of a
minefield with many different terms being applied inconsistently. I will keep
with the following terms in this article, which I hope will be meaningful and
acceptable to readers:
- Narrative essays: typically contain personal non-fictional storytelling for the purpose of sharing an experience, recollections, or a point of view.
- Narrative reports: write-up of research, analysis of information, conclusions, etc., using a narrative format. That is, weaving the research process into a description of the events uncovered. This type of narrative would generally be from the point of view of source information rather than personal experience.
- Research report: formalised report of a specific research assignment, usually for a client. It might record everything that was searched, who & what was searched for, everything that was found, everything you hoped to find but couldn’t (with reasons), all the negative results, analysis, and a research plan for future work.
Let’s just focus on the first two of these for now:
narrative essays and narrative reports, the difference of which might be
blurred if researching something that has its roots in living memories. Where
would we write them? The main answers would probably be: blogs, dedicated Web
sites, or word-processor documents, but are they sufficient?
The problem with those approaches is that your narrative is
then disconnected from any other type of data, and not described by a data
model. On its own, it probably doesn’t need a data model, but if you want to
integrate it with anything more structured — which basically means all that
multi-linked data that I described in Part I — then it is essential. Before I can
explain that integration, I must
first describe some of the advantages and disadvantages of narrative.
Natural language is a very rich means of expression, and it can
be used to describe events, circumstances, objects, people, emotions, analysis,
evidence, and conclusions. Furthermore, this can be done objectively, in a
matter-of-fact way, or with the elegance and beauty of seductive prose —
whichever is appropriate for the material. Remember, too, that having some natural-language
content is essential if you want to share your history with friends, relatives,
or peers. If anyone believes that template-generated sentences, where software
inserts discrete data values into some stock template, constitutes narrative then
they need to read more books. There are no rules for whether these
contributions would have to be about a particular person, family, or other
subject; they could cover any historical topic, and even describe your research
and reasoning in arriving at your picture of the past.
However, narrative also has a disadvantage: it is sequential.
While a master of the art could take you on a journey using just their words,
it is still a pre-prepared journey that has to be followed in sequence; you
cannot easily navigate your own way around the information in the story. By
contrast, the multi-linked entities in Part I would allow you to navigate their
hierarchies (e.g. lineage), events, timelines, and geography — together, and
with no restrictions.
Now imagine that both of these were combined, and that you
could, for instance, navigate from a person reference, or a place reference,
found in some narrative, to its respective hierarchy, then to a nearby entity
in that same hierarchy, and finally to a mention of the new entity in some
other narrative.
Figure 1 - Navigating between narrative and hierarchies.
STEMMA effectively integrates separate narrative articles (“pieces of non-fictional prose
that is an independent part of a publication”) with its multi-linked data
describing hierarchies, events, geography, sources, etc., and this allows the
freedom to navigate between all of them. Note that the links between subject
references (in the narrative) and subject entities can be considered
bi-directional, and so can be navigated in either direction.
In effect, I’m saying that narrative supplements that
multi-linked structured data, and that you cannot truly represent the past
without it. I recently found a real-world analogy to this synergy after talking
with Brian Miller, CEO of http://history-to-share.com/.
His company produce ceramic
outdoor plaques that can be associated with a gravestone, or other
memorial marker. They include a QR code that allows a
passer-by to scan it and see stories of that person’s history, and so breathe life
into what would otherwise be simple names and dates, possibly with brief
relationship details such as “wife of”, “husband of”, etc.
If you’re serious about genealogy then you will be
interested in original documents — including copies or derivatives thereof —
and maybe even authored works by other people. Just as narrative held in a
separate location is not making good use of it, then neither is keeping only
images or other facsimiles of those documents. What I’m about to talk about,
here, is transcription, and including those transcriptions with the rest of the
data.
It’s tempting to think that a transcription is simply text,
and so not fundamentally different to narrative. However, there are many
additional issues to consider, such unusual or erroneous spelling, unknown or
uncertain words, insertions and deletions, emphasis, marginal notes, footnotes,
and so on. Capturing the essence of these in a transcription is essential if
you plan to study it, or even to understand it properly.
Electronic documents use a system of mark-up, analogous to original manuscript mark-up, that embeds
information or instructions within the text. This may be presentational mark-up that gives instructions on how to present
something (e.g. that a word or phrase should be in italics), or semantic mark-up that associates meaning
or other information with a word or phrase (e.g. that a phrase is actually a
hyperlink that must take you to a given URL). STEMMA uses such a system in its
narrative support, and a large part of it is common to both authored work and
transcription, but a smaller part is also specific to transcription. Since
authored work will often need to quote transcribed text then both features are
actually provided by the same rich-text narrative tool. For the masochists
amongst you, an example showing STEMMA’s mark-up being applied to an
evidence-of-age document may be found at: Transcription
Anomalies.
In order to introduce semantic mark-up, let’s begin by
looking at a person reference such as “Tony Proctor”. When producing, say, a
narrative essay then the mark-up allows the author to ‘generate a reference to
the Person entity whose identifying key is such-and-such’. There are options
allowing a choice of formal/informal name, or even some custom description of
the person. As well as inserting the selected name into the text, this also
marks it as a person reference.
This approach can be applied to all of the STEMMA subject
types: person, animal, place, and group. It can even be applied to the names of
events, or to raw dates.
Figure 2 - Relationship between subject entities and
narrative.
Note that this diagram illustrates how each of the subject
entities is still independently connected to its respective hierarchy and
shared events. The subjects might be referenced in many separate narrative
articles, but these could be found directly from those hierarchies and events,
or vice versa.
Now let’s switch to a person reference encountered during a
transcription. In this case, the subject reference was already present, and the
goal of the mark-up is simply to tag it as a person reference, etc. Note, however,
that a subject reference is not necessarily the same as a name; phrases such as
“my grandmother”, “my dog”, “his regiment”, or “their home” are all valid
examples of subject references — for person, animal, group, and place,
respectively — but none are names. This may also apply to dates since phrases
such as “next year” or “last week” are still date references.
This observation leads to a choice by the transcriber:
either a subject reference can be connected to an existing subject entity, or
left as a reference to some unidentified or incidental
subject. STEMMA terms these options deep
semantics when we want to make the association and shallow semantics when we simply want to mark it as a reference to
a subject of a given type. This choice also applies to date references where we
may not be able to identify the date value beyond reasonable doubt, but we
still know that it’s a date.
A narrative report will probably have a slightly more
academic approach than a narrative essay, and one of the most important
requirements will be for source reference notes and for general
footnotes/endnotes. Since a narrative report, by definition, will involve
researching information from a number of sources then it should include
traditionally formatted citations for them. STEMMA’s mark-up allows the
production of citations, and these may be generated directly in a corresponding
footnote/endnote, or inline with your other text so that more complex (possibly
multi-source) citations can be placed in a custom footnote/endnote.
Illustrations of this may be found in: Cite
Seeing.
Research reports are a more emotive issue, but bear with me.
I do not have any figures that indicate how many professionals disseminate
their research reports via paper or electronic means, but all will undoubtedly
have been created using a word-processor. Reactions to the suggestion that a research
report could be disseminated in some computerised form, other than from a
word-processor, are largely based on the fear that it would somehow limit
freedom of expression, or force the use of some database, but these are
unfounded.
STEMMA’s rich-text narrative does require a specialised
word-processor tool so, for a moment, let’s assume that this tool was freely
available. It has the same capabilities for layout, formatting, tables,
pictures, and reference notes, as do most word-processors. If you had to use
that, and the recipient also had a corresponding reader, then there would be no
loss of freedom. However, the STEMMA version would also allow subject
references to be flagged, and formatted differently to the surrounding text —
no more need to use that horribly non-international approach of uppercasing
surnames. Furthermore, those subject references could be clickable, and could
take you to pictorial representations of some event or lineage information that
was uncovered. That structured information could also be lifted out of the
report by a compliant genealogy product since it would understand the same
format; there would be no need for the client to mess around trying to
cut-and-paste pieces for data entry.
There are many advantages to this approach, but I also know
that it’s currently a step too far for some readers. Until such a format
becomes as ubiquitous as our word-processor formats then it will remain in the
abode of digital dragons and
uncharted territory.
So is this just about exploration? What about searching?
While it is possible to search word-processor documents, or blogs, you have to
know exactly what to search for, and this is fraught with problems for subjects
with alternative names. Searching marked-up narrative means that the software
can automatically check all the alternative modes of reference without you
having to know them. Not convinced? OK, consider this problem that I had several
times before my software had matured sufficiently. You come across a surname,
or someone asks you about a surname, and you want to search all the persons you
have in your data, all their aliases, their maiden names, alternative
spellings, and find all the narrative and transcribed references to them.
Furthermore, you want to check previously unidentified people or incidental
people. And you don’t want to worry about ambiguities such as tailor/Tailor and
baker/Baker, or confusing the surname London with the place of the same name.
That’s a powerful feature, but it’s then quite easy to perform. I would even consider
buying a product based on that one capability … had I not already done it.
There will undoubtedly be a researcher amongst you who will
ask ‘what happens if I suspect who a person reference is to, but I do not want
to connect it directly to a corresponding person entity; instead wanting to
build a case for why it is that person’. Well, full marks to whoever asked
that! It was one of the last core pieces
to appear in the STEMMA specification.
The subject of connecting a person reference (or other
subject reference) to the logic, and to prototype
persons[1], before
making a concluding link to a person entity, will be the subject for Part III in
this series. This will also discuss why reasoning must be expressed
using natural language, and not in some wholly formalised computer-speak.
[1] A prototype subject
begins as the details of some initial subject reference — effectively a persona in the case of persons — and
possibly being merged with other prototypes before being connected to a subject
entity.
No comments:
Post a Comment