Parallax View ®: October 2013

Wednesday 30 October 2013

Micro-history for Genealogists

Micro-history is a term growing in its usage, but what is it? Why is it important to genealogists and family historians?

According to Wikipedia, Micro-history is the intensive historical study of a well-defined smaller unit of research. However, this is a little vague and betrays the fact that it is hard to pin down with any consensus. An un-credited[1] page at the University of Victoria (Canada) Web site, entitled What is Micro-history?, researches the history and various interpretations of the term, and goes on to suggests that micro-history focuses on the marginalised individuals, isolated localities, and locally-significant events.

In previous posts, I have lumped a number of fringe topics loosely associated with genealogy and family history under the micro-history umbrella, including One-Place Studies, One-Name Studies, house histories, personal histories, and organisational histories. This is largely based on the dictionary definition which is not unlike the Wikipedia one above.

More than anything, though, in its current usage micro-history is by people – ordinary people – rather than necessarily about people. We all have a story to tell, and we all have knowledge that we want to pass on. It doesn’t matter whether it is tales of your local area, the place you were raised, your family, or your friends. People’s history, if you like. This has the capability to uncover a real-life that traditional history is likely to miss, and is more likely to engage ordinary people in the appreciation of history generally.

But isn’t it the same as local history? According to the dictionary then this is probably true. However, local history tends to be perceived as more about politics, industry, geology, geography, development, religion, etc. This is typified by the British Association for Local History (BALH) whose Web site states ‘Our purpose is to encourage and assist the study of Local History as an academic discipline…’. Local history is rarely about the lives of ordinary people. This has to come from people themselves, either those directly involved, or their descendants, friends, acquaintances, etc.

We may be forgiven for thinking that TV producers can only think of celebrity genealogy. However, the BBC has some genuinely good history programs which fall totally into this discussion of micro-history.

Secret History of Our Streets. Six ordinary streets, each telling us about how life in London has changed in 150 years. Involves both past and present residents.
Reel History of Britain. A social history of 20th Century Britain showing how people worked and lived using viewers' personal memories and rare archive newsreel footage.

It may be hard to see how the specific subject in each programme could achieve mass appeal but it does. Many viewers feel a connection between their own history and the people being interviewed.

There are many micro-history Web sites being created as independent projects, not affiliated to any guiding organisation or society, and involving public collaboration. Subjects include houses, streets, old photographs, villages, oral history and storytelling, pubs, schools, and organisations. Would it be naïve to at least expect them to have some central listing, just as UK local history groups are currently listed at Local History Groups? There is a Microhistory Network which was ‘created as a loose group in January 2007 to bring together historians interested in the theory and practice of microhistory’. It has members from around the world but it appears to have a more scholarly approach compared to those public collaborations.

At the time of writing, the UK Heritage Lottery Fund (HLF) uses money raised via the National Lottery to give grants to help share and preserve our heritage in communities across the UK. A new funding programme, called Sharing Heritage, aims to help people towards this goal.

So are these public contributions significant to genealogists and family historians? Absolute yes! Although we may have direct evidence for discrete events from so-called reliable sources, there is little chance of interpolating without public contributions. Genealogy wants so much to be acknowledged as an historical discipline by scholarly historians (see Are Genealogists Historians Too?) that we risk falling into the same trap as them: not being able to see the whole forest for all the trees. A serious issue posed to diligent genealogists is how to deal with such subjective and hard-to-substantiate evidence. This is where attribution rather than citation is important. Attribution (in the journalistic sense) gives credit to the individual providing information or evidence. All those public contributions should, in principle, have clearly visible attribution. Anyone hiding behind a username such as MickeyMouse1066 would be simply diluting their own contribution.

I’m certainly not the only person to suggest that micro-history is important to genealogists and family historians[2] but I would further like to suggest that micro-history is the continuation of an ancient tradition of oral family-lore and folklore, and so is essential for the preservation of our histories. When this tradition is continued into our modern electronic world, and especially the Internet, then it results in contributions that interlock and partially substantiate each other, as well as providing a very important social connection.

Currently in the UK, the Office for National Statistics (ONS) is reviewing the needs for the national census beyond 2011, and how those needs might best be met. It is not unexpected that genealogical research is way down the list of needs, although it is there. Detailed UK census data is closed for 100 years, which is interesting since retaining information such as individual personal names cannot be justified for statistical research alone. Hence, some type of historical research must be anticipated, be it genealogy, micro-history or otherwise. It may be impractical but I feel it would be a beautiful idea to allow online respondents to leave a time capsule for their descendants – a short paragraph about themselves that they would wish to tell if they could. A picture would be nice, too, but it would probably take too much space. As the old saying almost goes: a picture is bigger than a thousand words.

[1] Although the article is un-credited, it appears under the Victoria Brewing Company pages created by Sarah Alford, Heather Fyfe, and Liam Haggarty.

[2] Anne Patterson Rodda, Trespassers in Time: Genealogists and Microhistorians (CreateSpace Independent Publishing Platform, July 2012).

Sunday 20 October 2013

Do Genealogists Really Need a Database?

If we have a software product on our computer that looks after our family history data then we expect it to have some type of database too. Why is that? The question is not as silly as it sounds. The perceived requirement is largely a hangover from the past that introduces incompatible proprietary representations and limits data longevity.

This is a bold statement but I will guide you through the rationale for it. This will require me to explain some technical details of databases and computers which I hope you’ll find useful. Those people who are already aware of these details can just skip over those paragraphs (but hopefully not the whole blog!).

A database is an organised collection of data, which basically means it is indexed to support efficient access to it. Databases are mostly disk-resident (more on this later) but there are various types, including relational, object-orientated, and multi-dimensional (or OLAP). The majority of modern databases are relational ones that use the SQL query language, but there is growing trend to use other forms called NoSQL such as ones based on key-value stores.

The reason for NoSQL is that SQL databases were designed to support high levels of consistency and reliability in their data operations. For instance, ensuring that complex financial transactions either complete fully or not at all — the last thing you want is for your money to disappear down a hole, or end up in two different accounts. The term for this integrity feature is ACID and it imposes a large performance penalty. This all makes sense for large commercial databases that process concurrent transactions in real-time but it is extraneous for your personal family history database.

If and when we get a modern standard for representing family history data then it will primarily describe a data model — a logical representation of the data. The actual physical representation in a file depends on which syntax is adopted, and there may be several of those. This is presented in more detail at: Commercial Realities of Data Standards. However, no standard can, or should, mandate a particular database or a particular database schema. For a start, no two SQL databases behave the same — despite their being a SQL standard. The choice of how a database schema is organised and indexed depends as much on the design choices of the vendor and the capabilities of their product as it does on a data-model specification. Effectively, a database cannot form a definitive copy of your data since you cannot transport the content directly to the database of another product, and if your product becomes defunct then you could be left with an unreadable representation. This is important if you want to bequeath the fruits of your research to an archive, or to your surviving family.

The goal of organisations like FHISO, and BetterGEDCOM before it, is to define a data model for the exchange and long-term preservation of genealogical and family-history data. This means that whatever the vendor’s database looks like, they must support an additional format for sharing and preservation. Unfortunately, the old GEDCOM format is woefully inadequate for this purpose and there is no accepted replacement that fits this bill. If such a representation existed then why couldn’t we work directly from it? The idea isn’t that new but there are technical arguments that are always levelled against it, and I will come to these soon. If the representation were covered by an accepted international standard then the problems of sharing and longevity disappear. It also opens up your data for access by other, niche software components — say for a special type of charting or analysis. This isn’t possible if it’s all hidden in some opaque proprietary database. It also means there’s less chance of an unrecoverable corruption because the internal complexities of the database do not apply.

Let’s pick on a controversial representation: XML. This is an international standard that is here to stay[1], and that means that any standards organisation must at least define an XML representation of their data in order to prevent multiple incompatible representations being defined elsewhere. There are some people, including software vendors, who vehemently hate XML, and would refuse to process it. The main reason appears to be a perceived inefficiency in the format and so it is ideal for this illustration.

Yes, XML appears verbose. Repetitive use of the same key words (element and attribute names) means it can take up more disk space than other, custom representations. However, this also means that a compressed (or zipped) version is incredibly reduced. Disk space is cheap, though. Even a humble memory stick now has thousands of times more storage than the hard disks that I grew up with. I have seen many XML designers try to use very short key words as though this helps make it more efficient. Apart from making it more cryptic, it doesn’t help at all.

But what about its memory usage? Right, now we’re getting to the more serious issues. Let me first bring the non-software people up-to-speed about memory. A computer has a limited amount of physical memory (or RAM). Secondary storage, such as hard disks and memory sticks, is typically much larger in size than the available RAM. However, data has to be transferred from secondary storage to RAM before it can be processed by a program, and code (the program instructions) has to be similarly transferred before it can be executed. The operating system (O/S) creates an abstract view of memory for each program called a virtual address space, and this is made up of both RAM and secondary storage. This means that the O/S has to do a lot of work behind the scenes to support this abstraction by making code and data available in RAM when it’s needed, and this process is called paging. It keeps track of pages (fixed-sized chunks of code/data), and when they were last used, so that it can push older ones back out to disk in order to make room for newer requirements. If a program tries to randomly (rather than sequentially) access a large amount of data in its virtual address space then it can result in disastrous performance, and an effect known as thrashing.

The size of each program’s virtual address space is constrained by the computer’s address size. This means that on a 32-bit machine, a program can only address 0 to 4GB, irrespective of how much RAM is available. The situation is often worse than this because some of the virtual address space is used to give each program access to shared components.

Under Microsoft Windows, for instance, this is usually a 50/50 split so that each program can only address up to 2GB[2]. Hence, bringing large amounts of data into virtual memory unnecessarily can be inefficient on these systems. It is possible to create software that can manipulate massive amounts of data within these limitations[3] but the effort is huge.

So does XML require a lot of memory? Well, not in terms of the key words. These are compiled into a dictionary in memory, and references to these dictionary entries are small and of fixed-size. Hence, it basically doesn’t matter how big the words are. There are two approaches to processing an XML file, though, and the results are very different:

Tree-based. These load an XML file into an internal tree structure and allow a program to navigate around it. The World-Wide Web Consortium’s (W3C) Document Object Model (DOM) is a commonly used version. These are very easy to use but they can be memory hungry. Also, a program may only want access to part of the data, or it may want to convert the DOM into a different tree that’s more amenable to what it is doing. In the latter case, it means there will be two trees in memory before one is discarded in favour of the other.
Event-based. These perform a lexical scan of the XML file on disk and let the program know of parsing events, such as the start and end of elements, through a call-back mechanism. The program can then decide what it wants to listen for and what it wants to do with it. This is possibly less common because it requires more configuration to be provided in the code. SAX is the best known example.

XOM is an interesting open-source alternative in the Java world. Although based on SAX, it creates a tree representation in memory. However, the event-driven core allows it to be tailored to only load the XML parts of interest to the program. In effect, there is no reason why XML cannot be processed efficiently.

But… but … but … I have 26 million people in my tree and…. Yes, and you want to see them all at once, right? Many new computers now have 64-bit addressing which means their virtual address space is effectively unlimited (about 1.8 x 10¹⁹bytes) and they’re fully capable of allowing a program to use as much of the cheap RAM as it wants. Database vendors have known this for some time and found they could achieve massive performance boosts using in-memory databases. Unfortunately, there are still many 32-bit computers out there, and also many programs whose 32-bit addressing is set in concrete, even in a 64-bit environment.

STEMMA^® is a logical data model but predominantly uses XML as its file representation. It is tackling these issues by allowing data to be split across separate documents (i.e. files), and each document to be comprised of one-or-more self-contained datasets. This means that there is a lot of choice for how you want to divide up your data on disk. For instance, separating out shared places or events, or dividing people based on their surname or geography. When a document, or dataset, is loaded into memory then it is indexed at that point. One dataset can be loaded and indexed in less time than it takes to double-click. Memory-based indexes are far more efficient and flexible than database ones since they can be designed specifically to suit the program’s requirements.

This post is rather more technical than my others but I wanted to give a clear picture of the issues involved. There are many advantages to using so-called flat files in conjunction with memory-based indexing, including better sharing, greater longevity, stability and reliability, and supporting multiple applications on the same data. I hope that this post will encourage developers to think laterally and consider these advantages. During my own work, for instance, I found that a STEMMA document — which implicitly includes trees, narrative, timelines, and more — effectively became a “genealogical document”, in the sense of a word-processor document. It could be received by email and immediately loaded into a viewer (analogous to something like Word or Acrobat) to view the document contents, and to navigate around them in multiple ways.

[1] XML has a lot going for it, including schema versioning and namespaces. However, I personally draw the line at XSLT which is difficult to write, difficult to read/diagnose, very inefficient when applied to large XML files, and impossible to optimise.

[2] 32-bit Windows systems do have a special boot option to change this split to 3GB/1GB but it is rarely used. The situation can be worse on some other O/S types, such as the old DEC VMS systems which only allowed 1Gb addressability for each program’s P0-space.

[3] I was once software architect for a company that implemented a multi-dimensional database that ran on all the popular 32-bit machines of the time. This handled many Terabytes (thousands of GB) of data by performing the paging itself.

Wednesday 16 October 2013

Claverley Property Document Transcript

Sue Adams makes a case for transcription being an essential step to assimilating and understanding an historical document at Claverley Property Document Analysis, Part 1: Transcript. She uses the example of manorial court record of a property transaction in 1844 and has invited me to show how this might be transcribed in STEMMA®.

Sue had already transcribed the raw text and inserted her own annotation relating parts of it to the handwritten original. She had also included explicit line numbers to help make that correlation more easily.

This is a useful exercise for me as STEMMA is still an evolving project that hopes to address transcription as part of its comprehensive narrative support. There are several parts to this exercise so I would like to itemise them for subsequent discussion:

Identifying the people, places, events, and dates.
Linking people and place references to any corresponding entities.
Handling marginalia.
Handling uncertain characters.
Handling uncertain or unfamiliar words.
Adding line numbers.
Linking multiple page scans with single transcription.

In the interests of clarity I want to approach these items in a stepwise manner. I’ll initially deal with item vii.

Scanned images are represented in STEMMA using a Resource entity. The situation in Sue’s example of having multiple related page-scans but a single transcription occurs frequently and there are several ways of handling it. The common feature is to put those scans in a single folder (or use a common root for the file name) and employ a parameterised Resource entity to represent them all. This is convenient since it also provides a single point to associate the transcribed text.

<Title> Manorial court records for Claverley property transaction, 1844 </Title>

<Type> Document </Type>

<URL> file:mydocuments/ClaverleyProp/P{$Image}.jpg </URL>

</Params>

<Text>

... transcribed text from below...

</Text>

</Resource>

The ‘Page’ parameter is defined more for documentation purposes than for image access. A specific image could be represented using a derivative of this Resource using STEMMA’s inheritance mechanism. For instance:

</Params>

</Resource>

This creates another named Resource that inherits properties from the base Resource. However, in this instance, I will elect to generate transient unnamed Resource entities for each page using a ResourceRef element since I can do this on-the-fly and place them at the appropriate points in the transcription body. For instance:

…narrative text...

</ResourceRef>

…narrative text...

The Mode attribute causes the image not to be displayed, but identifies it as related the current transcription. This allows transcribed lines, paragraphs, columns, etc., to be linked to specific locations in the current image (using x/y attributes), and thus allowing parallel scrolling of the image and transcription.

I’ll now add relevant mark-up to the text in order to handle items i and iii-v. This will also include an equivalent to Sue’s annotation linking the text segments to the corresponding page images. STEMMA has support for diplomatic transcription[1] and transcription notes (see Recording Evidence) but there are schemes with more comprehensive sets such as TEI. The strength of STEMMA lies in the way it identifies persons, places, events, etc., and links them to any corresponding entities in your data, which we’ll come to later.

A dwelling house together with its outbuildings, curtilage, and the adjacent land appropriated to its use.

</Text>

Something added to another, more important thing; an appendage.

</Text>

A way to enter a place or the act of entering a place.

</Text>

<ms>

<Text>

New court session starts half way down the page

</Text>

</ResourceRef>

<ms>

Manor of Claverley} to wit

<DateRef Value='1844-04-25'>25th April 1844</DateRef>

</ms>

</Anom>

The Court Baron purchased of <PersonRef>Thomas Whitman</PersonRef>

Esquire Lord of this manor held at the dwelling house of

<PersonRef>John Crowther</PersonRef> called the <PlaceRef>Kings Arms</PlaceRef> situate in <PlaceRef>Claverley</PlaceRef>

within this manor on <DateRef Value='1844-04-25'>Thursdays the twenty fifth day of

April in the year of our Lord One thousand eight hundred

and forty four and in the seventh year of the reign of her

present Majesty Queen Victoria</DateRef> Before <PersonRef>Francis Harrison</PersonRef> deputy

Steward there and in the presence of <PersonRef>Christopher Gabert</PersonRef> and

<PersonRef>Edward Crowther</PersonRef> two copyholders of this manor.

</ms>

</Text>

<ms>

<Text>

Case 1 not transcribed as it does not concern people of interest. I photographed the start of the court session, then skipped to the cases of interest on a later page. Page number query - not the page following the previous image. Case x starting half way down page.

</Text>

</ResourceRef>

To this Court come <PersonRef>John Wilson</PersonRef> of <PlaceRef>Aston</PlaceRef> within this

manor Farmer and <PersonRef>Samuel Nicholls</PersonRef> late of <PlaceRef>Catstree</PlaceRef> in the

<PlaceRef>parish of Worfield</PlaceRef> but now of <PlaceRef>Bridgnorth</PlaceRef> in the <PlaceRef>county of Salop</PlaceRef>

Gentleman Devisees in trust named in the last will and testament

of <PersonRef>John Felton</PersonRef> heretofore of <PlaceRef>Hopstone</PlaceRef> but late of <PlaceRef>Draycott</PlaceRef> within

this manor Yeoman late copyholder of this manor deceased

in their own proper persons and in consideration of the Sum

of three hundred and fifteen pounds seven shillings of lawful

British money to them the said <PersonRef>John Wilson</PersonRef> and <PersonRef>Samuel Nicholls</PersonRef>

.in hand well and truly paid by <PersonRef>Sarah Ward Nicholls</PersonRef> of

<PlaceRef>Catstree</PlaceRef> aforesaid Spinster before the passing of this surrender

as and for the purchase money for the hereditaments hereinafter

mentioned surrender into the hands of the Lord of this manor

by his deputy Steward aforesaid by the rod according to the custom

</ResourceRef>

of this manor All that piece or parcel of land called or known

by the name of <PlaceRef>Mill Hill</PlaceRef> and all that newly erected <Alt> messuage <FromText Key='tMessuage'/></Alt> or

dwelling house and outbuildings on the same piece of land or some

part thereof with the <Alt> appurtenances <FromText Key='tAppurtenances'/></Alt> formerly <PersonRef>Grosvenors</PersonRef> and

late <PersonRef>Onions's[?]</PersonRef> situate in the <PlaceRef>township of Sleathton</PlaceRef> in the <PlaceRef>manor

of Claverley</PlaceRef> in the <PlaceRef>county of Salop</PlaceRef> formerly in the occupation

of <PersonRef>John Felton</PersonRef> and now of <PersonRef>William Ferrington</PersonRef> or his undertennants

containing by admeasurement three acres one rood and sixteen

perches or thereabouts being by computation the half of one

third part of a nook of land To the use and behoof of the

said <PersonRef>Sarah Ward Nicholls</PersonRef> her heirs and assigns for ever at

the will of the Lord according to the custom of this manor

<Text>

Case y. Undecipherable mark in margin.

</Text>

</NoteRef>

To this Court comes <PersonRef>Sarah Ward Nicholls</PersonRef> of <PlaceRef>Catstree</PlaceRef> in

the <PlaceRef>parish of Worfield</PlaceRef> in the <PlaceRef>County of Salop</PlaceRef> Spinster in her own

proper person and by virture of a surrender to her use at this

Court made by <PersonRef>John Wilson</PersonRef> of <PlaceRef>Aston</PlaceRef> within this manor

Farmer and <PersonRef>Samuel Nicholls</PersonRef> late of <PlaceRef>Catstree</PlaceRef> aforesaid but now

of <PlaceRef>Bridgnorth</PlaceRef> in the said <PlaceRef>County of Salop</PlaceRef> Gentleman Devisees in

trust named in the last will and testament of <PersonRef>John Felton</PersonRef>

heretofore of <PlaceRef>Hopstone</PlaceRef> but late of <PlaceRef>Draycott</PlaceRef> within this manor

Yeoman late a copyholder of this manor deceases desires to

be admitted tenant to the Lord of this manor according to the

custom of this manor of and to All that piece or parcel of land

called or known by the name of <PlaceRef>Mill Hill</PlaceRef> and all that newly

erected <Alt> messuage <FromText Key='tMessuage'/></Alt> or dwelling house and outbuildings on the same

piece of land or some part thereof with the <Alt> appurtenances <FromText Key='tAppurtenances'/></Alt> formerly

<PersonRef>Grosvenors</PersonRef> and late <PersonRef>Onions's</PersonRef> situate in the <PlaceRef>township of Heathton</PlaceRef>

in the <PlaceRef>manor of Claverley</PlaceRef> in the <PlaceRef>county of Salop</PlaceRef> formerly in the

occupation of <PersonRef>John Felton</PersonRef> and now of <PersonRef>William Ferrington</PersonRef> or

his undertenants containing by admeasurement three acres one

rood and sixteen perches or thereabouts being by computation

the half of one third part of a nook of land To whom the

Lord of this manor by his deputy Steward aforesaid by the

rod according to the custom of this manor hath granted the

premises aforesaid with the <Alt> appurtenances <FromText Key='tAppurtenances'/></Alt> and seizin thereof

To have and to hold the same premises with the <Alt> appurtenances<FromText Key='tAppurtenances'/></Alt>

unto the said <PersonRef>Sarah Ward Nicholls</PersonRef> her heirs and assigns

To the use and behoof of the said <PersonRef>Sarah Ward Nicholls</PersonRef> her heirs

</ResourceRef>

and assigns for ever at the will of the Lord according to the

custom of this manor by the rents and customary services

therefore due and of right accustomed and for such estate and

<Alt>ingress <FromText Key='tIngress'></Alt> the said <PersonRef>Sarah Ward Nicholls</PersonRef> doth give to the Lord

for a fine six pence half penny and four sixth parts of a

farthing and she is admitted tenant thereof in form aforesaid

and doth to the Lord fealty

<Alt Value='Francis'>Fran</Alt> Harrison

<Text>Signature</Text></PersonRef>

Deputy Steward of the said manor

<Text>

End of court session, another session follows.

</Text>

</NoteRef>

</ms>

</Text>

</Narrative>

This may look complicated but remember that this is the internal representation. Using an appropriate tool, it would all look just like a fancy word processor (see Structured Narrative for an in-depth presentation).

The Anom element has been used to reference text in the margin, and the Alt element has been used to add both alternatives (as in the signature) and clarifications. For instance, to provide definitions for Messuage and Appurtenances, neither of which are in my day-to-day vocabulary. The NoteRef element has been used to add general transcription notes and annotation.

There is a reference to Queen Victoria which I’ve left without any PersonRef mark-up. This was because it made more sense to include it as part of the preceding DateRef element.

In the mark-up so far, I have only identified person and place references by raw PersonRef and PlaceRef elements respectively. These indicate that they are such references but they do not identify the actual person or place being referenced. This difference is part of STEMMA’s E&C support and is referred to as shallow and deep semantics. I do not know whether all the associated persons will be represented individually in Sue’s data, but I am guessing that the places might be. The following is a small example demonstrating how some of the raw place references can be associated with specific places, and how the corresponding Place entities can be used to hold images, documents, maps, or historical narrative (see A Place for Everything for further details).

<Title> Shropshire </Title>

<Type> County </Type>

<Names>

<Canonical>Shropshire</Canonical>

<Token>Shropshire</Token>

<Token>Salop</Token>

<Token>Shrops</Token>

</Tokens>

</Sequence>

</Sequences>

</Names>

</Place>

<Title> Claverley Parish </Title>

<Type> ParishCivil </Type>

<PlaceName> Claverley </PlaceName>

<Text>

Claverley became a Royal Manor in 1102

</Text>

</Place>

<Title> Claverley Village </Title>

<Type> Village </Type>

<PlaceName> Claverley </PlaceName>

</Place>

<Title> Aston </Title>

<Type> Hamlet </Type>

<PlaceName> Aston </PlaceName>

</Place>

<Title> Bridgnorth Parish </Title>

<Type> ParishCivil </Type>

<PlaceName> Bridgnorth </PlaceName>

</Place>

<Title> Bridgnorth </Title>

<PlaceName> Bridgnorth </PlaceName>

</Place>

<Title> Kings Arms Public House </Title>

<Type> Building </Type>

<PlaceName> Kings Arms </PlaceName>

</Place>

This subset of the available places was selected to show the depth of the place-hierarchy and to demonstrate one with name variants. A typical reference to a couple of these would be:

<PersonRef>John Crowther</PersonRef> called the <PlaceRef Key=’wKingsArms’>Kings Arms</PlaceRef> situate in <PlaceRef Key=’wClaverleyVillage’>Claverley</PlaceRef>

within this manor on…

Note that the original place names, as they were written, are preserved in this scheme. The extra Key attribute is effectively a conclusion that associates those units of evidence with real places.

It’s worth pointing out, at this stage, that there are two broad approaches to creating a proof argument: top-down and bottom-up. The top-down approach is intuitively what most people would think of. You would write your main conclusions and cite the appropriate bits of evidence along the way. STEMMA builds the other way so that pieces of evidence — say in a transcript — can be highlighted using a NoteRef element and connected to a piece of text proposing a rationale or explanation. They, in turn, can be linked to other items of text to create a hierarchy of conclusions. However, this is a structural issue and a good tool can make the process appear in either direction.

In summary, I’ve tried to give a relatively clear picture of how STEMMA would be applied to this transcription, as opposed to simply producing one of potentially many end results to pick through. The code you see in this blog was largely created by hand and so I apologise for any coding errors.

** Post updated on 19 Apr 2017 to align with the changes in STEMMA V4.1 **

[1] A clear discussion on the difference between diplomatic transcription and typographic facsimile may be found in: Mary-Jo Kline and Susan Holbrook Perdue, Guide to Documentary Editing, 3rd edition, chapter 5, section IV (http://gde.upress.virginia.edu/05-gde.html#h2.4 : accessed 16 Oct 2013).