GeneaBloggers

Tuesday, 9 September 2014

Cite Seeing

It’s about time that I presented my STEMMA® approach to sources and citations[1]. Although the initial design approach wasn’t unusual, it has since evolved by trying to match all the real, hand-generated citations in my own narrative reports, and without having to restrict things to some “standard” list of source-types, or some formatted samples published on paper or online.


The concept of a citation depends somewhat on the context. Some view it as the abstract act of citing a source of information or some scholarly work — ignoring contexts such as military awards and traffic citations. In STEMMA, a Citation entity (capitalisation deliberate for clarity) is a generalised representation of information location, sources, and repositories. For most genealogists, though, the term has come to mean the formatted reference notes appearing in a footnote or endnote; even more so than the source-list and source-label variants.

A citation has a number of purposes: intellectual honesty (not claiming prior work as your own), to allow your sources to be independently assessed by the reader, and to allow the strength of your information sources to be assessed. In order that a citation can be understood by other readers, there are conventions for the ordering, formatting, and separation of the elements that depend upon the type of source being cited. Probably the best known resource for genealogists crafting citations is Evidence Explained[2] (hereinafter EE).

Despite any overlap, we should not confuse the concept of a footnote/endnote with that of a reference-note citation. That is a general mechanism that may also be used for annotation (e.g. clarifying a word or phrase) or discursive notes (commentary which digresses from the main subject). There are cases for all of these in a narrative report and so STEMMA had to accommodate each of them.

It’s reasonable to ask why computer storage needs to somehow encode a citation. Why not simply retain the carefully-crafted formatted version? Well, that version effectively sets in concrete things such as the layout of the terms (someone may want a different ordering, say for ISO 690 compliance), the punctuation characters (e.g. see International Variations in Quotation Marks), the general style (CMOS, EE, others), and the locale. The last one of these covers a number of subtle aspects that should differ for users in different locales. The formatting of a date might be an obvious example, but whether you put punctuation characters inside or outside quotation marks is a less-spoken-of one. Since computer software cannot reliably decompose a formatted citation then it also means that it cannot indicate which piece is a title, which is an author, which is a date of publication, etc. This is semantic information that would need to be attached to the relevant parts if anything other than a human was to make use of it.

There are several design schemes that suggest breaking apart citations into a number of separate citation-elements (e.g. an author’s name), and relying on a separate citation-template system to regenerate a formatted edition appropriate to a given reader. The main differences between them might be summarised as follows:

  • Whether there’s a fixed, master list of source-types.
  • How the source-types are named or catalogued.
  • Whether the citation-element names are limited to convey the semantics.

STEMMA also uses citation-elements but with some important differences. Each source-type is identified by a Uniform Resource Identifier (URI). A URI generally looks like a URL but it may be defined freely if you own the root domain name. Digital Freedom explained how their visible semantics, decentralised creation, hierarchical derivatives, and versioning make them a cornerstone for extensible systems like STEMMA. The result is that you can define as many custom source-types as either your research or your locale require.

The citation-elements are defined as part of each source-type. That means their names and properties (e.g. their data-type, whether they’re optional, and whether they’re multi-valued) can be chosen independently for each source-type. Any semantic information can be attached to the individual citation-elements as necessary. For instance:

<Dataset Name=’Example’ xmlns:DC=’ http://purl.org/dc/terms/’>

<Citation Key=’cBook’ Abstract=’1’>
    <Title> Generic Citation for published books </Title>
    <URI> http://stemma.parallaxview.co/source-type/book/ <URI>
    <Params>
        <Param Name=’Author’ SemType=’DC:creator’/>
        <Param Name=’Title’ SemType=’DC:title’/>
        <Param Name=’Publisher’ SemType=’DC:publisher’/>
        <Param Name=’Date’ Type=’Date’ SemType=’DC:date’/>
        <Param Name=’Page’  Optional=’1’/>
    </Params>
</Citation>

This STEMMA Citation entity can then be used to describe any number of simple book references. This example employs the ‘Dublin Core’ semantic tags, including their tentative refinements, but STEMMA can select other systems by using a different namespace (as indicated here by the “DC:” prefix). Such a custom entity is guaranteed not to clash with any others, and citations that use it are transportable. What is required in a receiving product is a citation-template that can format it appropriately, and — if you wanted to generate new instances of it — the verbiage associated with the source-type and its citation-element names for your locale.


This diagram illustrates how the main components of this scheme operate together. The source-type URI is used to fetch the definition of the source-type, either through a discovery service (on the Internet) or from a local repository. That definition will also include the verbiage appropriate to one-or-more locales.

User input for a source reference is solicited using that locale-specific verbiage and acknowledging the citation-element data-types and other properties in the process.

When generating a formatted citation — say for a report — the software product must interface to some citation-template tool which has a relevant template for that source-type. Developer note: STEMMA currently passes objects to a primitive tool, which then calls back on well-defined methods to obtain the specific details required by the template, e.g. a contact’s formal/informal name, a contact’s address, or a formatted place-hierarchy. This is more flexible than passing fixed items of text.

A nice feature of this scheme is that there is a lot of freedom, and it’s not expecting some standards body to define the many hundreds of samples that are published in EE. It works equally well for different preferences and different locales since it is merely a mechanism, not a standard list. Software developers sometimes think too much in terms of a formulaic approach to citations (‘you plug these values into a template and out pops your formatted reference’) whereas real-life citations need much more freedom. Those same developers may also view EE as just a list of prescribed citation forms for all conceivable sources rather than a comprehensive work on analysing evidence and crafting whatever citations we find necessary. As Elizabeth Shown Mills says herself: citations are an art rather than a science.

I now want to describe some basic STEMMA mechanisms for attaching information to a body of text, and then illustrate how they would be used in combination to replicate my hand-crafted editions. I won’t suggest that my own citations are good examples for anyone to follow, but I do strive to make them functional and relevant. That means that they sometimes get quite complicated, involving separate layers, analytical notes, and occasionally more than one source reference in the same reference note.

Case 1 – Simple reference-note citation

The following shows a simple sentence that references a certificate for a ‘death overseas’ in the UK. The associated citation is generated in a footnote

The certificate came through and confirmed the location of her death as Park Hotel, Ingenbohl, Canton [Kanton] Schwyz, Switzerland.16

…etc…

16 England, death certificate for Mary Phyllis Ashbee, died 13 Jun 1984; citing location Switzerland; Death Abroad (1966 to 1994), General Register Office (GRO), Southport.

<Narrative><Text>
The certificate came through and confirmed the location of her death as Park Hotel, Ingenbohl, <Alt Value=’Kanton’>Canton</Alt> Schwyz, Switzerland.<CitationRef Key=’cDeathsOverseasUK’>
    <Param Name=’Name’> Mary Phyllis Ashbee </Param>
    <Param Name=’Date’> 1984-06-13 </Param>
    <Param Name=’Country’ Key=’wSwitzerland’/>
</CitationRef>
</Text></Narrative>

The CitationRef could have specified an explicit Mode=’RefFootnote’ but that’s the default and so is unnecessary. Note that it is a layered citation indicating where the originals are held. This is achieved through the Citation entity (cDeathsOverseasUK) linking to another one (cDeathsAbroadGRO) using a ParentCitationLnk; thus creating a citation chain.

The example also uses a second mechanism to provide annotation on the text; in this case, to provide the alternative German spelling for the cantons of Switzerland. Notice that this annotation is correctly placed in editorial brackets when the final form is non-interactive, such as on a printed page.

Case 2 – Discursive notes

This example uses a different mechanism to create a footnote that simply contains discursive notes. There is no source reference in this case.

This confirmed the death occurred at the British Military Hospital, Peshawar, and the cause of death as ‘Cerebral Haemorrhage, result of motor accident’. It also gave his rank as Lance Corporal in the 14th/20th [King’s] Hussars4, and his service number as 551091.

…etc…

4 British cavalry regiment created through the merger of the 14th King's Hussars and the 20th Hussars in 1922. The honorific "King's" was added back into the title in 1936.

<Narrative><Text>
This confirmed the death occurred at the British Military Hospital, Peshawar, and the cause of death as ‘Cerebral Haemorrhage, result of motor accident’. It also gave his rank as Lance Corporal in the
<NoteRef Mode=’Footnote’>14th/20th [King’s] Hussars
<Narrative><Text>
British cavalry regiment created through the merger of the 14th King's Hussars and the 20th Hussars in 1922. The honorific "King's" was added back into the title in 1936.
</Text></Narrative>
</NoteRef>, and his service number as 551091.
</Text></Narrative>

This NoteRef element creates a footnote and inserts a footnote indicator into the main text. There are other options, though, such as Mode=’Inline’ which would place the text in editorial brackets at that location.

In this example, the relevant text was placed inside the NoteRef, but it could equally have been placed in a Text element elsewhere, and the FromText element (new in V4.0) used to include it.

Case 3 – Analytical notes

This case attaches a simple analytical note to a citation in the form of another layer (i.e. separated by a semicolon). That extra layer is achieved by appending the note in a local footnote rather than in the Citation entity itself; thus clearly separating personal opinion from the details of the citation.

Near the end of the burial register3 were the entries for all three of the soldiers who died in that road accident:

…etc…

3 Burial register held at Garrison Church, Risalpur, NWFP, Pakistan (1915–1947), photocopy; Asia, Pacific and Africa Collections (APAC), The British Library, 96 Euston Road, London; source of photocopy was a typed document so unclear whether original was typed or whether it was a transcript itself.

<Narrative><Text>
Near the end of the burial register
<NoteRef  Mode=’Footnote’>
<Narrative><Text>
<CitationRef Key=’cAPACBurialReg’ Mode=’RefInline’>
    <Param Name=’Church’ Key=’wGarrisonChurch’/>
    <Param Name=’From’> 1915 </Param>
    <Param Name=’To’> 1947 </Param>
    <Param Name=’Media’> photocopy </Param>
</CitationRef>; source of photocopy was a typed document so unclear whether the original was typed or whether it was a transcript itself.
</Text></Narrative>
</NoteRef> were the entries for all three of the soldiers who died in that road accident:
</Text></Narrative>

This may take a couple of glances to see what is happening. The outermost NoteRef is generating a footnote, but inside the footnote is a citation generated inline and followed by a layer representing the analytical note. In this instance, the cAPACBurialReg also points to a parent entity representing APAC in a chain.

Case 4 – Multiple sources

Cases of reference notes mentioning multiple sources may be relatively rare outside of professional circles but I do have the following instance:

A check in the GRO index of births and deaths only gave one real possibility: Elsie Evelyn Emms, born 16 Feb 1913 in Wooldridge, West Ham, Essex; died 2003 in East Surrey.3

…etc…

3 Transcribed GRO Index for England and Wales (1837–1983), database, FreeBMD (http://freebmd.org.uk/cgi/search.pl : accessed 5 Aug 2014), birth entry for Elsie E. Emms; citing West Ham, 1913, Mar [Q1], vol. 4A:642. "England & Wales deaths 1837-2007", database, FindMyPast (www.findmypast.org.uk : accessed 5 Aug 2014), entry for Elsie Evelyn Emms; citing East Surrey, 2003, Mar [Q1], district number 7551B, register number ESB5, entry number 184, date of reg. 0303.

<Narrative><Text>
A check in the GRO index of births and deaths only gave one real possibility: Elsie Evelyn Emms, born 16 Feb 1913 in Wooldridge, West Ham, Essex; died 2003 in East Surrey.
<NoteRef  Mode=’Footnote’>
<CitationRef Key=’cFreeBMDBirth’ Mode=’RefInline’>
    <Param Name=Name’> Elsie E. Emms </Param>
    <Param Name=’RegDistrict’ Key=’wWestHam’/>
    <Param Name=’RegDate’> 1913-01:03 </Param>
    <Param Name=’Accessed’> 2014-08-05 </Param>
…etc…
</CitationRef>. <CitationRef Key=’cFindMyPastDeath’ Mode=’RefInline’>
    <Param Name=Name’> Elsie Evelyn Emms </Param>
    <Param Name=’RegDistrict’ Key=’wEastSurrey’/>
    <Param Name=’RegDate’> 2003-01:03 </Param>
…etc…
</CitationRef>
</NoteRef>
</Text></Narrative>

The reason that these two sources are included in the same reference note is that the conclusion was derived from a correlation of the two, and the details cannot be factored into two independent references.

This case also employs the NoteRef mechanism to generate a footnote containing two inline citations. Note that the dates of registration (e.g. 1913-Q1) are provided using the STEMMA date-value string format.

Attribution

The terms citation and attribution are often confused and used interchangeably. In principle, a citation references an information source, such as a prior work, whereas attribution gives appropriate credit to individuals. In a journalistic context, though, the act of citing ones source (e.g. an interview with someone) is called attribution. For the purposes of genealogical and historical research, I usually reserve the term attribution for when someone’s material has been directly included in a report or collection (e.g. an image), which then contrasts with referencing some external source of information consulted during my research. Even then, though, there are grey areas. The point of mentioning attribution here is that the same Citation entity can be used to model attribution, too, and without any confusion or loss of functionality. In other words, the underlying mechanism is sufficiently flexible and general-purpose that the two become syntactically equivalent.

** Post updated on 22 Nov 2015 to align with the changes in STEMMA V4.0 **




[1] This is a STEMMA-specific article, and so is not directly related to recent FHISO sources & citations discussions, or to Louis Kessler’s recent post, or to Randy Seaver’s recent post.
[2] Elizabeth Shown Mills, Evidence Explained: Citing History Sources from Artifacts to Cyberspace (Baltimore, Maryland: Genealogical Pub. Co., 2009),