Yes, that old chestnut! How would we handle people having
multiple names — or even no name at all — if we had better software?
Back in A
Place For Everything, I explained the difference between a place and its
name. This had to be said because far too many people don’t make the
distinction, and so don’t recognise that a name is just one of several possible
properties for identifying a real thing (irrespective of whether it still
exists now). The industry’s preoccupation with family trees (i.e. lineage, and
hence genealogy in its literal sense),
at the expense of history, is probably at the root of this.
You might expect, therefore, that there would be less excuse
in the context of a person. Surely, we all appreciate that the person and their
name, or names, are two different things. Well, apparently not! I still see
questions asking how to find someone’s “real name”, as though everyone has just
one unique name that they can be referenced by and indexed by. They then run
into difficulties when they know someone exists but they don’t have a reliable
or complete name, or the person was never assigned a name, or their name was
shared by a close relative. When you factor-in the many reasons for someone
changing their name, or having multiple concurrent names, then it can be a
cause of confusion.
Now I admit that the existence of an actual person
corresponding to some name found in a source might be harder to verify than the
case of an actual place, and this is acknowledged to some extent through the
use of the persona
concept when recording data. At some point, though, it will be associated with
a “conclusion person” entity in your data, and that entity will have many
properties (e.g. date of birth), of which their personal names are simply one instance.
From a software perspective, a personal name cannot be the key
that defines a Person entity since it is neither unique nor fixed. So what functionality
are we looking for from a person’s name(s)?
- To record their preferred epithets.
- To use as one of several keys in identifying a person from a source.
- To use as a title or label in reports, charts, etc.
At first sight, you may be thinking that these are all the
same — and in some products they are — but there are fundamental differences.
The preferred name is not the same as the accepted variations of it, and the
annotation used for display may not even be a name at all.
Let me start by using my own name as an illustration.
Although I am known in most circles as Tony Proctor, I was assigned the given
names Anthony Charles at birth. Now I have never changed my name but even this
simple case means I have several alternatives by which I might be referenced:
In other words, I have a full name, and an accepted
diminutive form, but several variations that could still refer to me. If I had
more than one middle name, or I had changed my name, or I had variations in my
native language, or separate stage/professional names, then you could imagine
this diagram becoming a lot more complex.
Now you might be about to say that initialisms are obvious
and can be deduced from the given names, if they’re known of course. If true
then you’re thinking of English-speaking, Western conventions. Initials are not
applicable to logogram (or ideogram) based languages. Also, whist we accept
their use in modern Latin-based languages, it would be a gross generalisation
to assume that all alphabet-based languages, modern or ancient, use this custom
in personal names. People of other cultures who have adopted Romanised versions
of their native names may also not use initials.
Whether someone changed their name through marriage, deed
poll, at a point of immigration, or when entering a different phase of their
life, there will be a date associated with that change, and possibly a
significant life-event to which it should be connected. Sometimes those names
are mutually exclusive (changing from one to another) and sometimes they run
together, but those dates primarily describe the preferred names rather than
the accepted variations. This means that the accepted variations may still be
used in sources long after one of those life-events. STEMMA®
handles names by dividing them into groups, with each group having any relevant
date ranges and name type. Each group consists of a series of accepted name
variants, each represented by a sequence of tokens, and a separate set of canonical names that represent the
preferred versions. This same scheme is also used for places as well as for people.
When looking up a person (or place) by name, each of the accepted variations is
compared, in sequence, against the required name.
What about evidential variations? This is probably the most
common cause of confusion, although it’s no different, say, to handling a range
of birth years found in different census sources. The years may differ because
details were provided by someone else, or ages were rounded up/down for census
purposes, or the birth was on a different side of the census day, etc, but it
doesn’t mean that the person had multiple birth dates. With names, just as with
other personal data, the evidential forms have to be recorded, but separately
from the conclusion forms. The diagram below illustrates this with an example
event supported by two sources, both of which have misspellings of my name
(surname in first case, and given name in second case). In STEMMA, the
evidential forms are associated with the Event-to-Source link, as explained in Evidence
and Where to Stick It, whereas the conclusion forms are part of the Person
entity.
So what about identifying a person on a computer display, or
in a genealogical report? In the context of variant spellings, Elizabeth Shown
Mills advocates picking a common spelling and using it consistently[1].
With a woman’s maiden name then it’s very common to place it in parentheses,
such as Sarah (Smith) Jones. Where there are other cases of alternative names then
there may be different conventions, such as separating them with a slash (or
solidus, ‘/’), or specifying an “aka” (also-known-as) in parentheses. When a
name is ambiguous, either because it has been used in several generations, or
it was used for by a deceased sibling, then you might add the year of birth in
parentheses. If a child didn’t live long enough to be given a name, or you
simply don’t know it, then you may still want to identify it in a report. All
these cases that are more than simple variations of spelling have the same
issue: the annotation is no longer a
personal name. In our software, we should never store some display annotation
and call it a personal name. This issue has been covered in excellent detail by
Tamura Jones[2]. What is
needed is a separate title/label field specifically for display purposes, and STEMMA
provides such a field using the <Title> element in both the Person and
Place entities.
GEDCOM made a fair stab at handling multiple names. Version
5.5.1 supported multiple names for a given person, and even had a range of name
types that could be used to distinguish them. GEDCOM names are generally
unstructured sequences of tokens — which is good for generality — but this
version also had an optional PERSONAL_NAME_PIECES description which allowed the
name tokens to be categorised, albeit with a warning that most systems will not
use this alternative form. It wasn’t until the draft GEDCOM XML 6.0
specification that individual names were given a NAME_PRINCIPAL_FORM which was
roughly equivalent to STEMMA’s canonical
names, but there was still no title/label facility.
Several products do accommodate multiple names for a single
person, although their separation of preferred versus accepted variants, names
of different types, names over different time spans, titles/labels for display
purposes, and evidential variants, are not as organised as in STEMMA. By
attempting to take short-cuts, our software may deprive genealogists of the
flexibility they need, and effectively corrupt our recorded histories. The
following is an excellent quote from a genealogist on a Usenet Newsgroup[3]:
There is no excuse for software that pretends 80% of the people in the
world are wrong about their own names.
I wish I’d said that!
[1] Elizabeth Shown Mills, "Re:
[TGF] Surname spelling variants", Transitional-Genealogists-Forum-L, message dated 24 Jun
2011 (http://archiver.rootsweb.ancestry.com/th/read/transitional-genealogists-forum/2011-06/1308944480 : accessed 3 Feb 2014).
[2] Tamura Jones, “FNU
LNU MNU UNK”, Modern software Experience,
11 Aug 2013 (http://www.tamurajones.net/FNULNUMNUUNK.xhtml
: accessed 22 Nov 2013); Also his previous works: “The Lnu Family Mystery”, Modern software Experience, 11 Aug 2013
(http://www.tamurajones.net/TheLnuFamilyMystery.xhtml
: accessed 22 Nov 2013); “Unk is a Real Name”, Modern software Experience, 10 Aug 2013 (http://www.tamurajones.net/UnkIsARealName.xhtml
: accessed 22 Nov 2013).
No comments:
Post a Comment