GeneaBloggers

Tuesday, 9 September 2014

Cite Seeing



It’s about time that I presented my STEMMA® approach to sources and citations[1]. Although the initial design approach wasn’t unusual, it has since evolved by trying to match all the real, hand-generated citations in my own research reports, and without having to restrict things to some “standard” list of source-types, or some formatted samples published on paper or online.



The concept of a citation depends somewhat on the context. Some view it as the abstract act of citing a source of information or some scholarly work — ignoring contexts such as military awards and traffic citations. In STEMMA, a Citation entity (capitalisation deliberate for clarity) is a generalised representation of information location, sources, and repositories. For most genealogists, though, the term has come to mean the formatted reference notes appearing in a footnote or endnote; even more so than the source-list and source-label variants.

A citation has a number of purposes: intellectual honesty (not claiming prior work as your own), to allow your sources to be independently assessed by the reader, and to allow the strength of your information sources to be assessed. In order that a citation can be understood by other readers, there are conventions for the ordering, formatting, and separation of the elements that depend upon the type of source being cited. Probably the best known resource for genealogists crafting citations is Evidence Explained[2] (hereinafter EE).

Despite any overlap, we should not confuse the concept of a footnote/endnote with that of a reference-note citation. That is a general mechanism that may also be used for annotation (e.g. clarifying a word or phrase) or discursive notes (commentary which digresses from the main subject). There are cases for all of these in a research report and so STEMMA had to accommodate each of them.

It’s reasonable to ask why computer storage needs to somehow encode a citation. Why not simply retain the carefully-crafted formatted version? Well, that version effectively sets in concrete things such as the layout of the terms (someone may want a different ordering, say for ISO 690 compliance), the punctuation characters (e.g. see International Variations in Quotation Marks), the general style (CMOS, EE, others), and the locale. The last one of these covers a number of subtle aspects that should differ for users in different locales. The formatting of a date might be an obvious example, but whether you put punctuation characters inside or outside quotation marks is a less-spoken-of one. Since computer software cannot reliably decompose a formatted citation then it also means that it cannot indicate which piece is a title, which is an author, which is a date of publication, etc. This is semantic information that would need to be attached to the relevant parts if anything other than a human was to make use of it.

There are several design schemes that suggest breaking apart citations into a number of separate citation-elements (e.g. an author’s name), and relying on a separate citation-template system to regenerate a formatted edition appropriate to a given reader. The main differences between them might be summarised as follows:

  • Whether there’s a fixed, master list of source-types.
  • How the source-types are named or catalogued.
  • Whether the citation-element names are limited to convey the semantics.

STEMMA also uses citation-elements but with some important differences. Each source-type is identified by a Uniform Resource Identifier (URI). A URI generally looks like a URL but it may be defined freely if you own the root domain name. Digital Freedom explained how their visible semantics, decentralised creation, hierarchical derivatives, and versioning make them a cornerstone for extensible systems like STEMMA. The result is that you can define as many custom source-types as either your research or your locale require.

The citation-elements are defined as part of each source-type. That means their names and properties (e.g. their data-type, whether they’re optional, and whether they’re multi-valued) can be chosen independently for each source-type. Any semantic information can be attached to the individual citation-elements as necessary. For instance:

<Dataset Name=’Example’ xmlns:DC=’ http://purl.org/dc/elements/1.1/’>

<Citation Key=’cBook’>
    <Title> Generic Citation for published books </Title>
    <URI> http://stemma.parallaxview.co/source-type/book/ <URI>
    <Params>
        <Param Name=’Author’ SemType=’DC:Creator’/>
        <Param Name=’Title’ SemType=’DC:Title’/>
        <Param Name=’Publisher’
            SemType=’DC:Publisher.CorporateName’/>
        <Param name=’PublisherAddr’
            SemType=’DC:Publisher.CorporateName.Address’/>
        <Param Name=’Date’ Type=’Date’ SemType=’DC:Date’/>
        <Param Name=’Pages’  Optional=’1’/>
    </Params>
</Citation>

This STEMMA Citation entity can then be used to describe any number of simple book references. This example employs the ‘Dublin Core’ semantic tags, including their tentative refinements, but STEMMA can select other systems by using a different namespace (as indicated here by the “DC:” prefix). Such a custom entity is guaranteed not to clash with any others, and citations that use it are transportable. What is required in a receiving product is a citation-template that can format it appropriately, and — if you wanted to generate new instances of it — the verbiage associated with the source-type and its citation-element names for your locale.



This diagram illustrates how the main components of this scheme operate together. The source-type URI is used to fetch the definition of the source-type, either through a discovery service (on the Internet) or from a local repository. That definition will also include the verbiage appropriate to one-or-more locales.

User input for a source reference is solicited using that locale-specific verbiage and acknowledging the citation-element data-types and other properties in the process.

When generating a formatted citation — say for a report — the software product must interface to some citation-template tool which has a relevant template for that source-type. Developer note: STEMMA currently passes objects to a primitive tool, which then calls back on well-defined methods to obtain the specific details required by the template, e.g. a contact’s formal/informal name, a contact’s address, or a formatted place-hierarchy. This is more flexible than passing fixed items of text.

A nice feature of this scheme is that there is a lot of freedom, and it’s not expecting some standards body to define the many hundreds of samples that are published in EE. It works equally well for different preferences and different locales since it is merely a mechanism; not a standard list. Software developers sometimes think too much in terms of a formulaic approach to citations (‘you plug these values into a template and out pops your formatted reference’) whereas real-life citations need much more freedom. Those same developers may also view EE as just a list of prescribed citation forms for all conceivable sources rather than a comprehensive work on analysing evidence and crafting whatever citations we find necessary. As Elizabeth Shown Mills says herself: citations are an art rather than a science.

I now want to describe some basic STEMMA mechanisms for attaching information to a body of text, and then illustrate how they would be used in combination to replicate my hand-crafted editions. I won’t suggest that my own citations are good examples for anyone to follow, but I do strive to make them functional and relevant. That means that they sometimes get quite complicated, involving separate layers, analytical notes, and occasionally more than one source reference in the same reference note. 

Case 1 – Simple reference-note citation


The following shows a simple sentence that references a certificate for a ‘death overseas’ in the UK. The associated citation is generated in a footnote

The certificate came through and confirmed the location of her death as Park Hotel, Ingenbohl, Canton [Kanton] Schwyz, Switzerland.16

…etc…

16 England, death certificate for Mary Phyllis Ashbee, died 13 Jun 1984; citing location Switzerland; Death Abroad (1966 to 1994), General Register Office (GRO), Southport.

<Narrative><Text>
The certificate came through and confirmed the location of her death as Park Hotel, Ingenbohl, <Alt Value=’Kanton’>Canton</Alt> Schwyz, Switzerland.
<CitationRef Key=’cDeathsOverseasUK’>
<Param Name=’Name’> Mary Phyllis Ashbee </Param>
<Param Name=’Date’> 1984-06-13 </Param>
<Param Name=’Country’ Key=’wSwitzerland’/>
</CitationRef>
</Text></Narrative>

The CitationRef could have specified an explicit Mode=’RefFootnote’ but that’s the default and so is unnecessary. Note that it is a layered citation indicating where the originals are held. This is achieved through the Citation entity (cDeathsOverseasUK) linking to another one (cDeathsAbroadGRO) using a ParentCitationLnk; thus creating a citation chain.

The example also uses a second mechanism to provide annotation on the text; in this case, to provide the alternative German spelling for the cantons of Switzerland. Notice that this annotation is correctly placed in editorial brackets when the final form is non-interactive, such as on a printed page. 

Case 2 – Discursive notes


This example uses a different mechanism to create a footnote that simply contains discursive notes. There is no source reference in this case.

This confirmed the death occurred at the British Military Hospital, Peshawar, and the cause of death as ‘Cerebral Haemorrhage, result of motor accident’. It also gave his rank as Lance Corporal in the 14th/20th [King’s] Hussars4, and his service number as 551091.

…etc…

4 British cavalry regiment created through the merger of the 14th King's Hussars and the 20th Hussars in 1922. The honorific "King's" was added back into the title in 1936.

<Narrative><Text>
This confirmed the death occurred at the British Military Hospital, Peshawar, and the cause of death as ‘Cerebral Haemorrhage, result of motor accident’. It also gave his rank as Lance Corporal in the
<NoteRef Mode=’Footnote’>14th/20th [King’s] Hussars
<Narrative><Text>
British cavalry regiment created through the merger of the 14th King's Hussars and the 20th Hussars in 1922. The honorific "King's" was added back into the title in 1936.
</Text></Narrative>
</NoteRef>, and his service number as 551091.
</Text></Narrative>

This NoteRef element creates a footnote and inserts a footnote indicator into the main text. There are other options, though, such as Mode=’Inline’ which would place the text in editorial brackets at that location.

In this example, the relevant text was placed inside the NoteRef, but it could equally have been placed in a Text element elsewhere, and the NoteRef element made to point to it using a NoteKey attribute. For instance:

<NoteRef Mode=’Footnote’ NoteKey=’tHussars’> 14th/20th [King’s] Hussars </NoteRef> 

Case 3 – Analytical notes


This case attaches a simple analytical note to a citation in the form of another layer (i.e. separated by a semicolon). That extra layer is achieved by appending the note in a local footnote rather than in the Citation entity itself; thus clearly separating personal opinion from the details of the citation.

Near the end of the burial register3 were the entries for all three of the soldiers who died in that road accident:

…etc…

3 Burial register held at Garrison Church, Risalpur, NWFP, Pakistan (1915–1947), photocopy; Asia, Pacific and Africa Collections (APAC), The British Library, 96 Euston Road, London; source of photocopy was a typed document so unclear whether original was typed or whether it was a transcript itself.

<Narrative><Text>
Near the end of the burial register
<NoteRef  Mode=’Footnote’>
<Narrative><Text>
<CitationRef Key=’cAPACBurialReg’ Mode=’RefInline’>
<Param Name=’Church’ Key=’wGarrisonChurch’/>
<Param Name=’From’> 1915 </Param>
<Param Name=’To’> 1947 </Param>
<Param Name=’Media’> photocopy </Param>
</CitationRef>; source of photocopy was a typed document so unclear whether the original was typed or whether it was a transcript itself.
</Text></Narrative>
</NoteRef> were the entries for all three of the soldiers who died in that road accident:
</Text></Narrative>

This may take a couple of glances to see what is happening. The outermost NoteRef is generating a footnote, but inside the footnote is a citation generated inline and followed by a layer representing the analytical note. In this instance, the cAPACBurialReg also points to a parent entity representing APAC in a chain. 

Case 4 – Multiple sources


Cases of reference notes mentioning multiple sources may be relatively rare outside of professional circles but I do have the following instance:

A check in the GRO index of births and deaths only gave one real possibility: Elsie Evelyn Emms, born 16 Feb 1913 in Wooldridge, West Ham, Essex; died 2003 in East Surrey.3

…etc…

3 "England & Wales, Free BMD Index: 1837-1983", database, FreeBMD (http://freebmd.org.uk/cgi/search.pl : accessed 5 Aug 2014), birth entry for Elsie E. Emms; citing West Ham, 1913, Jan [Q1], vol. 4A:642. FreeBMD, death entry for Elsie Evelyn Emms; citing East Surrey, 2003, Jan [Q1], district number 7551B, register number ESB5, entry number 184, date of reg. 0303.

<Narrative><Text>
A check in the GRO index of births and deaths only gave one real possibility: Elsie Evelyn Emms, born 16 Feb 1913 in Wooldridge, West Ham, Essex; died 2003 in East Surrey.
<NoteRef  Mode’Footnote’>
<CitationRef Key=’cFreeBMDBirth’ Mode=’RefInline’>
<Param Name=Name’> Elsie E. Emms </Param>
<Param Name=’RegDistrict’ Key=’wWestHam’/>
<Param Name=’RegDate’> 1913-01:03 </Param>
<Param Name=’Accessed’> 2014-08-05 </Param>
…etc…
</CitationRef>. <CitationRef Key=’cFreeBMDDeath’ Mode=’RefInline’>
<Param Name=Name’> Elsie Evelyn Emms </Param>
<Param Name=’RegDistrict’ Key=’wEastSurrey’/>
<Param Name=’RegDate’> 2003-01:03 </Param>
…etc…
</CitationRef>
</NoteRef>
</Text></Narrative>

The reason that these two sources are included in the same reference note is that the conclusion was derived from a correlation of the two, and the details cannot be factored into two independent references.

This case also employs the NoteRef mechanism to generate a footnote containing two inline citations. Note that the dates of registration (e.g. 1913-Q1) are provided using the STEMMA date-value string format.

Attribution 


The terms citation and attribution are often confused and used interchangeably. In principle, a citation references an information source, such as a prior work, whereas attribution gives appropriate credit to individuals. In a journalistic context, though, the act of citing ones source (e.g. an interview with someone) is called attribution. For the purposes of genealogical and historical research, I usually reserve the term attribution for when someone’s material has been directly included in a report or collection (e.g. an image), which then contrasts with referencing some external source of information consulted during my research. Even then, though, there are grey areas. The point of mentioning attribution here is that the same Citation entity can be used to model attribution, too, and without any confusion or loss of functionality. In other words, the underlying mechanism is sufficiently flexible and general-purpose that the two become syntactically equivalent.




[1] This is a STEMMA-specific article, and so is not directly related to recent FHISO sources & citations discussions, or to Louis Kessler’s recent post, or to Randy Seaver’s recent post.
[2] Elizabeth Shown Mills, Evidence Explained: Citing History Sources from Artifacts to Cyberspace (Baltimore, Maryland: Genealogical Pub. Co., 2009),

Friday, 29 August 2014

The Game of the Name



Yet another subject where there is little or no agreement. Let me try and explain some of the many issues with personal names, and with other types of name, and then present my own approach to handling them.


This is probably one of the most likely areas for trapping the unwary with insular attitudes or limited knowledge of other cultures. We so desperately want to record our names as we know them rather than as we see them that we may fail to consider the bigger picture. Most people reading this will have names consisting of one or more given names (the parts chosen to distinguish members of a family) and a single surname (the inherited part).

English-speaking people sometimes select one of their middle names as their preferred given name, rather than the norm of selecting the first one. However, this is far from unusual in, say, Germany where one of the given names (the Rufname, or “call name”) — which may be the second or third one — is identified as the primary one. Hence, the concept of a first name and middle names is inappropriate for them.

If we’re lucky then we may have Honorifics expressing esteem or respect. In English-language names, these may be academic titles (e.g. Dr. or Prof.), honorific prefixes (e.g. the honourable, or his holiness), honorific titles (e.g. Sir, Lord, Dame, Lady), or post-nominal letters (e.g. VC, OBE, PhD). These are mostly either prefixes or postfixes[1]. Another type of postfix is a generational title (e.g. .Jr, Sr, I, II, III, etc), although the Irish equivalent is actually infix as opposed to either prefix or postfix (e.g. Seán Óg Ó Súilleabháin).

Spanish-speaking people often have two or more surnames, but even English-speaking people may have double-barrelled or hyphenated surnames. In German, a family may have a second surname, preceded by the word vulgo (meaning “so-called” or “also known as”), in order to show their association with a farm or other property. Such a vulgo name may change, therefore, when that family moves.

While there is a lot of variation so far, it’s still possible to describe distinct cultural patterns. Every so often, someone suggests having the flexibility to store the precisely categorised parts of their (usually Western, English-speaking) names, and of “foreign names”, together in their software. If they have some knowledge of software development then they may be suggesting that Object Orientation (OO) can help to treat those different patterns in a consistent way. However, let’s look at some more variations.

Traditional Chinese names can use something called a Generational name to identify members of a particular generation, including siblings, cousins, etc. There is no Western equivalent of this custom.

Many cultures employ ‘name particles', analogous to grammatical particles, to separate the various parts of their names. For instance: “von”, “van”, “der”, “de [la]”, “d′”, “the”, “[son] of”, “mc”, “mac", "Ó", "Ní", "Nic", "Mhic", "Bean", "Ui", "y", etc. These may occur almost anywhere, and their behaviour under case conversion and sorting is culturally dependent.

Then there’s the important case that all genealogists should be aware of: the patronym and matronym[2]. These are surnames based on the given name of a male of female ancestor, respectively. For instance: son of William (now Williamson, or Wilson), van Dijk, Nic Dhòmhnaill, Nikolayevich.

The OO advocates would suggest ‘no problem’, but what is the practicality and the ultimate goal of categorising every single token in a personal name, and then rigidly representing that classification in digital storage?

Personal names, as described here, haven’t always existed. At one time, people would have been given an epithet based on their occupation (e.g. William the thatcher), their origin (or topoanthroponym, e.g. Robin of Loxley), or some other attribute (e.g. Little John). Even now, we may encounter epithetic titles such as Earl of Huntingdon, or Henry VIII. This is where many software schemes start to break down, and these cases are usually given scant consideration on the basis that few researchers can trace their lineage back that far, or can reliably identify titled ancestors.

Indeed, an ancestor’s identification may have been just a single-word mononym, as opposed to a multi-word polynym, so how can you categorise that? I have pointed out previously that Native Americans typically have unstructured names, and that they may have different names assigned at different phases of their lives. My point, here, being that the particularly diverse cultural origins within the US are not simply the product of latter-day immigration, and that they will eventually affect many researchers.

What I’ve briefly described here are structural differences in personal names. These may vary from those name structures that we take for granted in the West, through other structures that we’re less familiar with, to having no discernable structure at all. Little wonder that the design of STEMMA® makes a case for handling names as simple, uncategorised sequences of tokens — multiple names being just alternative sequences — but more on that later.

In a previous post, One Name to Rule Them All, I explained about the many types and forms of name that a person may be identified by in practice, and how that is a different set to their preferred identifications. It also explained the relationship to evidential forms (with their possible misspellings, transcription errors, and informality) and to the labels that we, as researchers, want to identify them by in our reports or charts.

In contrast to the purely structural differences, there are a number of considerations that might be described as processing differences, including the following:

  • How they’re sorted.
  • Behaviour under case-conversion.
  • Behaviour under capitalisation.
  • How they’re compared.
  • Handling of initials.
  • Handling inheritance.

In the West, we commonly replace middle names with their initials, and sometimes our forenames too, but this is not a universal option. Initials are not applicable to logogram (or ideogram) based languages. Also, whist we accept their usage in some modern Latin-based languages, it would be a gross generalisation to assume that all Latin-based languages, or indeed all alphabet-based languages, modern and ancient, use this custom in personal names. Even people of other cultures who have adopted Romanised versions of their native names may not use initials.

Another issue with initials involves the case where we know an initial but not what it stands for. As with any abbreviations, these must be represented with a trailing period in order to prevent ambiguity, even in the single-letter case. There are common cases where a name may contain a single-letter non-initial, such as the Irish Ó (or O-fada, meaning from) and the Spanish y (meaning and).

If we were searching for Frederick and some text contained frederick or FREDERICK then we would still expect a match; this is called case-blind. If the text contained Frédérick then we may still expect a match; this is called accent-blind. These are quite common ways of performing a textual match in software, but lesser-known is that Unicode makes specific recommendations about which composed and decomposed forms should be equivalent: http://www.unicode.org/reports/tr15/. A composed form involves one Unicode character and a decomposed form involves two or more Unicode characters. For instance, the Angstrom sign (+U212B, Å) should match the combination Latin-A (+U0041, A) plus Combining-ring-above (+U030A, °), as well as a Latin-A-with-ring-above (+U00C5, Å). This is normally achieved by normalising each piece of text to its lowest common denominator (e.g. lower-cased, no diacritical marks, and decomposed forms) and compare those using a standard match.

Now if we’re sorting a mixture of text from different locales then we have another problem that tends to get ignored by software people: culturally preferred sort orders. Although there is an international sort order, it is basically just a convenience for software people as it relies on the numeric character codes. However, different cultures want to sort their characters in slightly different ways. This issue was encountered by the SQL standard when Unicode text columns were introduced since it made its column-specific “collation sequences” all but useless. In effect, sort orders should be selected by the application, dependent upon the current end-user, and not implied by the data itself.

Sorting and collation are troublesome in many ways. For instance, some cultures sort on their given names rather than their surname, and the position of those parts is similarly dependent upon culture. When a name includes multiple surnames, as in the Spanish-speaking world, then the sorting may attach priority to either of them depending on the person’s location. Also, any name particles may be considered significant (i.e. involved in the sort) or ignored during the sorting. Finally, the ideographic characters in Japanese names can be pronounced in different ways, and if the sorting is to reflect the way that the name is spoken then additional information is usually required to assist the sorting. In summary, there are two pieces of information required for correct sorting: the sorted representation (e.g. ‘surname, given-names’ in English) and a possible overriding “sort as” instruction when one-or-more tokens do not sort according to simple text rules.

Case conversion is not something I recommend — despite it being commonplace —since the specific choice of character case may be important in a given language (e.g. Irish), or there may be no duality for a given character (e.g. the German eszett, ß). Even capitalisation — normally considered to be the uppercasing of the initial letter, as with English proper nouns — is problematic. Sometimes it may be the first two characters (e.g. O’Connor), or the second character (e.g. the Irish hUiginn), or something more exotic such as deShannon, deSouza, or diCaprio (all of which may incur an unwanted initial capitalisation). See Letter Case and Capitalisation, respectively.

Lastly, there is name inheritance. In cultures where there is an inherited part of a personal name — which isn’t true of all of them — then it may be via the father’s line (patrilineal), or the mother’s line (matrilineal), or both in the Spanish-speaking world. The inherited part may be a surname or a given name (as in patronyms) but in Russia it is common to have both a surname and a patronym. Even in cultures where we think we recognise a simple case of a name being inherited from the father, the way in which that name is represented may depend on the sex of the child. In other words, we can never assume that it is simply tacked on. In marriage, it may be normal in some cultures for the woman to not take the man’s family name, but this has also become a life-style choice in many Western cases. The man may take the woman’s name, or they may both take a hybrid name. The essential fact here is that there are no rules. There are just conventions, and these will depend on the culture or social group involved.

As well as wanting to adopt a portable approach to personal names, and so avoid trying to taxonomise the non-taxonomical, I also wanted STEMMA to adopt the same approach for both place names and group names. This isn’t as wild as it first sounds. If you abandon any formalised structural differences, then you find that all of the processing differences except ‘inheritance’ (see below) are also common. I take it as obvious that all of these entity types also share the common requirement of supporting alternative names — possibly in different languages — and linking the name changes to specific dates or events.

In order to describe the STEMMA approach — which is still evolving[3] — I want to avoid simply showing code and use a schematic representation instead. Personal names are represented by a series of time-dependent descriptions for each distinct name, as follows:


The optional From and To fields may be dates or Events at which the name came into use or was (officially) no longer used, and the Name Type field may be something like “Maiden” or “Adopted”. More important are the Canonical Names section, which contains the preferred renderings of this name, and the Match Sequences section, which may contain additional matching instructions.

Note that this same structure is also used for the names of places and of groups. The only difference is the vocabulary used for the Name Type field.

The mode of usage for the first three canonical names is fairly obvious. The Listing mode is used for ordered listings of names, and may be supplemented by a separate “sort as” instruction for the problem cases mentioned above. The match-sequences may specify very simple parsing instructions for accepting name variants beyond the canonical ones. This will use the following notation here:

 Name[i]           - simple name token, e.g. Tony.
{name, ...}[i]    - mandatory selection from alternative tokens.
[name, ...][i]     - optional selection from alternative tokens.

The optional ‘i’ superscript indicates that initials are appropriate for the respective tokens.

Let’s look at a trivial example:


Now STEMMA’s name handling has been accused of being cumbersome and verbose but let me explain its layered approach. At run-time, when the data is loaded, the name information is used to create a simple parse tree using the normalised (see above) tokens. Developer Note: It turns out that this can be stored economically by using token indices, into an “atom table”, but a local table (for the current person) is just as effective as a global table (for the whole tree). Despite the commonality of surnames, etc., the shorter local indices take up considerably less space and may be packed more densely without data alignment issues. The match-sequences section feeds the generation of the parse tree, but note that it is a simple representative form and so not significant in terms of repetitions or parse efficiency. The canonical names are also part of this feed, in conjunction with the match-sequences, and so if we know the relevant personal name style (see below) then the match-sequences are only required to express cases beyond the canonical ones. The above example can be simplified, therefore, by omitting all the explicit match-sequences.

Furthermore, each of the main subject entities (Person, Place, and Group) has a shorter mechanism for specifying a Semi-Formal canonical name in the very simplest of cases: PersonalName, PlaceName, and GroupName, respectively.

In other words, the STEMMA approach has been designed bottom-up; starting with what is required for the in-memory parse tree, and then working up to a simplified and practical representation within the data files. The intermediate representations are not always required but their availability gives the flexibility and power of expression when it is needed.

I have mentioned a name style in this article, but I am still looking for an acceptable vocabulary. The style of name (and hence the rules for sorting, initials, inheritance, etc.) obviously depends on the relevant culture or social group, but these must include historical ones as well as modern-day ones. The computer locale is inadequate for this, as is a simple language identifier (ISO 639) or country identifier (ISO 3166). My own name style, for instance, is very common but terms such as English-speaking, Anglo-Saxon, or Anglo-American do not adequately describe the group using this style, or the actual conventions associated with the style.

GEDCOM also handled names as unstructured lists of tokens, albeit with the family name enclosed between slashes. It supported multiple names per person, and even a NAME_TYPE record to categorise them, e.g. as maiden, married, or immigrant. V5.5 introduced an optional PERSONAL_NAME_PIECES description to allow the individual name tokens to be typed, e.g. as given name, surname, etc. However, V5.5.1 — the last official specification — contained a warning that this wasn’t portable. The STEMMA approach is considerably more powerful than either of the GEDCOM schemes, but has a certain level of compatibility with its original scheme. I hope that my research has indicated how that general direction is the more portable, both between different name styles and between the names of different entity types.



[1] Although suffix and postfix are usually treated as synonyms — both as nouns and as verbs — I prefer to use the less-common postfix as it was directly modelled on prefix and so more accurately expresses the opposite condition.
[2] The Oxford English Dictionary (and many others) declares patronym to be a noun, as expected, but patronymic to be both a noun and an adjective. Interestingly, it presents a similar dual usage for matronymic but doesn’t list matronym, despite it being in common use and listed in other dictionaries. I do not know the etymology of this but using the –ic derivation as a noun really grates on the ear. I make no apologies, therefore, for reserving the –ic forms as adjectives, and thus being consistent with words such as: acronymic, toponymic, antonymic, eponymic, metonymic, and homonymic.
[3] This presentation involves changes that will be published in STEMMA V2.3.