It’s about time that I presented my STEMMA® approach to
sources and citations
[1].
Although the initial design approach wasn’t unusual, it has since evolved by
trying to match all the real, hand-generated citations in my own narrative reports,
and without having to restrict things to some “standard” list of source-types,
or some formatted samples published on paper or online.
The concept of a citation
depends somewhat on the context. Some view it as the abstract act of citing a
source of information or some scholarly work — ignoring contexts such as
military awards and traffic citations. In STEMMA, a Citation entity
(capitalisation deliberate for clarity) is a generalised representation of
information location, sources, and repositories. For most genealogists, though,
the term has come to mean the formatted reference notes appearing in a footnote
or endnote; even more so than the source-list and source-label variants.
A citation has a number of purposes: intellectual honesty
(not claiming prior work as your own), to allow your sources to be
independently assessed by the reader, and to allow the strength of your
information sources to be assessed. In order that a citation can be understood
by other readers, there are conventions for the ordering, formatting, and
separation of the elements that depend upon the type of source being cited.
Probably the best known resource for genealogists crafting citations is
Evidence Explained
[2]
(hereinafter EE).
Despite any overlap, we should not confuse the concept of a
footnote/endnote with that of a reference-note citation. That is a general
mechanism that may also be used for annotation (e.g. clarifying a word or
phrase) or discursive notes (commentary which digresses from the main subject).
There are cases for all of these in a narrative report and so STEMMA had to
accommodate each of them.
It’s reasonable to ask why computer storage needs to somehow
encode a citation. Why not simply retain the carefully-crafted formatted version?
Well, that version effectively sets in concrete things such as the layout of
the terms (someone may want a different ordering, say for ISO 690 compliance),
the punctuation characters (e.g. see
International
Variations in Quotation Marks), the general style (CMOS, EE, others), and
the locale. The last one of these covers a number of subtle aspects that should
differ for users in different locales. The formatting of a date might be an
obvious example, but whether you put punctuation characters inside or outside
quotation marks is a less-spoken-of one. Since computer software cannot
reliably decompose a formatted citation then it also means that it cannot
indicate which piece is a title, which is an author, which is a date of
publication, etc. This is semantic information that would need to be attached
to the relevant parts if anything other than a human was to make use of it.
There are several design schemes that suggest breaking apart
citations into a number of separate citation-elements (e.g. an author’s name),
and relying on a separate citation-template system to regenerate a formatted
edition appropriate to a given reader. The main differences between them might
be summarised as follows:
- Whether there’s a fixed,
master list of source-types.
- How the source-types are
named or catalogued.
- Whether the
citation-element names are limited to convey the semantics.
STEMMA also uses citation-elements but with some important
differences. Each source-type is identified by a Uniform Resource Identifier (
URI). A URI
generally looks like a URL but it may be defined freely if you own the root
domain name.
Digital
Freedom explained how their visible semantics, decentralised creation,
hierarchical derivatives, and versioning make them a cornerstone for extensible
systems like STEMMA. The result is that you can define as many custom
source-types as either your research or your locale require.
The citation-elements are defined as part of each
source-type. That means their names and properties (e.g. their data-type, whether
they’re optional, and whether they’re multi-valued) can be chosen independently
for each source-type. Any semantic information can be attached to the individual
citation-elements as necessary. For instance:
<Dataset Name=’Example’ xmlns:DC=’ http://purl.org/dc/terms/’>
<Citation Key=’cBook’ Abstract=’1’>
<Title> Generic Citation for published books </Title>
<URI> http://stemma.parallaxview.co/source-type/book/<URI>
<Params>
<Param Name=’Author’ SemType=’DC:creator’/>
<Param Name=’Title’ SemType=’DC:title’/>
<Param Name=’Publisher’ SemType=’DC:publisher’/>
<Param Name=’Date’ Type=’Date’ SemType=’DC:date’/>
<Param Name=’Page’ Optional=’1’/>
</Params>
</Citation>
This STEMMA
Citation
entity can then be used to describe any number of simple book references. This
example employs the ‘Dublin Core’ semantic tags, including their tentative
refinements, but STEMMA can select other systems by using a different namespace
(as indicated here by the “DC:” prefix). Such a custom entity is guaranteed not
to clash with any others, and citations that use it are transportable. What is
required in a receiving product is a citation-template that can format it
appropriately, and — if you wanted to generate new instances of it — the
verbiage associated with the source-type and its citation-element names for
your locale.
This diagram illustrates how the main components of this
scheme operate together. The source-type URI is used to fetch the definition of
the source-type, either through a discovery service (on the Internet) or from a
local repository. That definition will also include the verbiage appropriate to
one-or-more locales.
User input for a source reference is solicited using that
locale-specific verbiage and acknowledging the citation-element data-types and
other properties in the process.
When generating a formatted citation — say for a report — the
software product must interface to some citation-template tool which has a
relevant template for that source-type. Developer
note: STEMMA currently passes objects to a primitive tool, which then calls
back on well-defined methods to obtain the specific details required by the
template, e.g. a contact’s formal/informal name, a contact’s address, or a
formatted place-hierarchy. This is more flexible than passing fixed items of
text.
A nice feature of this scheme is that there is a lot of
freedom, and it’s not expecting some standards body to define the many hundreds
of samples that are published in EE. It works equally well for different
preferences and different locales since it is merely a mechanism, not a
standard list. Software developers sometimes think too much in terms of a
formulaic approach to citations (‘you plug these values into a template and out
pops your formatted reference’) whereas real-life citations need much more
freedom. Those same developers may also view EE as just a list of prescribed
citation forms for all conceivable sources rather than a comprehensive work on
analysing evidence and crafting whatever citations we find necessary. As
Elizabeth Shown Mills says herself: citations are an art rather than a science.
I now want to describe some basic STEMMA mechanisms for
attaching information to a body of text, and then illustrate how they would be
used in combination to replicate my hand-crafted editions. I won’t suggest that
my own citations are good examples for anyone to follow, but I do strive to
make them functional and relevant. That means that they sometimes get quite
complicated, involving separate layers, analytical notes, and occasionally more
than one source reference in the same reference note.
Case 1 – Simple reference-note citation
The following shows a simple sentence that references a
certificate for a ‘death overseas’ in the UK. The associated citation is
generated in a footnote
The
certificate came through and confirmed the location of her death as Park Hotel,
Ingenbohl, Canton [Kanton] Schwyz, Switzerland.16
…etc…
16
England, death certificate for Mary Phyllis Ashbee, died 13 Jun 1984; citing
location Switzerland; Death Abroad (1966 to 1994), General Register Office
(GRO), Southport.
<Text>
The certificate came through and confirmed the location of her
death as Park Hotel, Ingenbohl, <Alt Value=’Kanton’>Canton</Alt>
Schwyz, Switzerland.<CitationRef Key=’cDeathsOverseasUK’>
<Param Name=’Name’> Mary Phyllis Ashbee </Param>
<Param Name=’Date’> 1984-06-13 </Param>
<Param Name=’Country’ Key=’wSwitzerland’/>
</CitationRef>
</Text>
The CitationRef could have specified an explicit Mode=’Footnote’
but that’s the default and so is unnecessary. Note that it is a layered
citation indicating where the originals are held. This is achieved through the
Citation entity (cDeathsOverseasUK) linking to another one (cDeathsAbroadGRO) using
a ParentCitationLnk; thus creating a citation chain.
The example also uses a second mechanism to provide
annotation on the text; in this case, to provide the alternative German
spelling for the cantons of Switzerland. Notice that this annotation is
correctly placed in editorial brackets when the final form is non-interactive,
such as on a printed page.
Case 2 – Discursive notes
This example uses a different mechanism to create a footnote
that simply contains discursive notes. There is no source reference in this
case.
This
confirmed the death occurred at the British Military Hospital, Peshawar, and
the cause of death as ‘Cerebral Haemorrhage, result of motor accident’. It also
gave his rank as Lance Corporal in the 14th/20th [King’s] Hussars4,
and his service number as 551091.
…etc…
4
British cavalry regiment created through the merger of the 14th King's
Hussars and the 20th Hussars in 1922. The honorific "King's" was
added back into the title in 1936.
<Text>
This confirmed the death occurred at the British Military
Hospital, Peshawar, and the cause of death as ‘Cerebral Haemorrhage, result of
motor accident’. It also gave his rank as Lance Corporal in the
<NoteRef Mode=’Footnote’>14th/20th [King’s] Hussars
<Text>
British cavalry regiment created through the merger of the 14th
King's Hussars and the 20th Hussars in 1922. The honorific "King's"
was added back into the title in 1936.
</Text>
</NoteRef>, and his service number as 551091.
</Text>
This NoteRef element creates a footnote and inserts a
footnote indicator into the main text. There are other options, though, such as
Mode=’Inline’ which would place the text in editorial brackets at that
location.
In this example, the relevant text was placed inside the
NoteRef, but it could equally have been placed in a Text element elsewhere, and
the FromText element (new in V4.0) used to include it.
Case 3 – Analytical notes
This case attaches a simple analytical note to a citation in
the form of another layer (i.e. separated by a semicolon). That extra layer is
achieved by appending the note in a local footnote rather than in the Citation
entity itself; thus clearly separating personal opinion from the details of the
citation.
Near
the end of the burial register3 were the entries for all three of
the soldiers who died in that road accident:
…etc…
3 Burial
register held at Garrison Church, Risalpur, NWFP, Pakistan (1915–1947),
photocopy; Asia, Pacific and Africa Collections (APAC), The British Library, 96
Euston Road, London; source of photocopy was a typed document so unclear
whether original was typed or whether it was a transcript itself.
<Text>
Near the end of the burial register
<NoteRef Mode=’Footnote’>
<Text>
<CitationRef Key=’cAPACBurialReg’ Mode=’Inline’>
<Param Name=’Church’ Key=’wGarrisonChurch’/>
<Param Name=’From’> 1915 </Param>
<Param Name=’To’> 1947 </Param>
<Param Name=’Media’> photocopy </Param>
</CitationRef>; source of photocopy was a typed document so
unclear whether the original was typed or whether it was a transcript itself.
</Text>
</NoteRef> were the entries for all three of the soldiers
who died in that road accident:
</Text>
This may take a couple of glances to see what is happening.
The outermost NoteRef is generating a footnote, but inside the footnote is a
citation generated inline and followed by a layer representing the analytical
note. In this instance, the cAPACBurialReg also points to a parent entity
representing APAC in a chain.
Case 4 – Multiple sources
Cases of reference notes mentioning multiple sources may be
relatively rare outside of professional circles but I do have the following
instance:
A
check in the GRO index of births and deaths only gave one real possibility:
Elsie Evelyn Emms, born 16 Feb 1913 in Wooldridge, West Ham, Essex; died 2003
in East Surrey.3
…etc…
3 Transcribed
GRO Index for England and Wales (1837–1983), database, FreeBMD (http://freebmd.org.uk/cgi/search.pl : accessed
5 Aug 2014), birth entry for Elsie E. Emms; citing West Ham, 1913, Mar [Q1],
vol. 4A:642. "England & Wales deaths 1837-2007",
database, FindMyPast
(www.findmypast.org.uk : accessed 5 Aug 2014), entry for
Elsie Evelyn Emms; citing East Surrey, 2003, Mar [Q1], district number 7551B,
register number ESB5, entry number 184, date of reg. 0303.
<Text>
A check in the GRO index of births and deaths only gave one real
possibility: Elsie Evelyn Emms, born 16 Feb 1913 in Wooldridge, West Ham,
Essex; died 2003 in East Surrey.
<NoteRef Mode=’Footnote’>
<CitationRef Key=’cFreeBMDBirth’ Mode=’Inline’>
<Param Name=Name’> Elsie E. Emms </Param>
<Param Name=’RegDistrict’ Key=’wWestHam’/>
<Param Name=’RegDate’> 1913-01:03 </Param>
<Param Name=’Accessed’> 2014-08-05 </Param>
…etc…
</CitationRef>. <CitationRef Key=’cFindMyPastDeath’
Mode=’Inline’>
<Param Name=Name’> Elsie Evelyn Emms </Param>
<Param Name=’RegDistrict’ Key=’wEastSurrey’/>
<Param Name=’RegDate’> 2003-01:03 </Param>
…etc…
</CitationRef>
</NoteRef>
</Text>
The reason that these two sources are included in the same
reference note is that the conclusion was derived from a correlation of the
two, and the details cannot be factored into two independent references.
This case also employs the NoteRef mechanism to generate a
footnote containing two inline citations. Note that the dates of registration
(e.g. 1913-Q1) are provided using the
STEMMA
date-value string format.
Attribution
The terms citation
and attribution are often confused
and used interchangeably. In principle, a citation references an information source,
such as a prior work, whereas attribution gives appropriate credit to
individuals. In a journalistic context, though, the act of citing ones source
(e.g. an interview with someone) is called attribution. For the purposes of
genealogical and historical research, I usually reserve the term attribution
for when someone’s material has been directly included in a report or
collection (e.g. an image), which then contrasts with referencing some external
source of information consulted during my research. Even then, though, there
are grey areas. The point of mentioning attribution here is that the same
Citation entity can be used to model attribution, too, and without any
confusion or loss of functionality. In other words, the underlying mechanism is
sufficiently flexible and general-purpose that the two become syntactically
equivalent.
** Post updated on 19 Apr 2017 to align with the changes in
STEMMA V4.1 **
[1] This is a
STEMMA-specific article, and so is not directly related to recent FHISO sources & citations discussions, or to
Louis Kessler’s recent post,
or to Randy Seaver’s recent
post.
[2] Elizabeth Shown Mills, Evidence
Explained: Citing History Sources from Artifacts to Cyberspace (Baltimore,
Maryland: Genealogical Pub. Co., 2009),