Sunday, 22 November 2015

STEMMA V4.0



A little later than I had expected, but I have now completed the changes necessary for STEMMA V4.0. This specification is now published on the STEMMA Web site and is anticipated to be the last major revision necessary for this micro-history data model (small refinements continuing).

The main focus of this change have been the correct separation of conclusion from information and evidence, and allowing them to support drill-down (inspecting a conclusion to see the associated how and why), and to support the alternative bottom-up approach of Source Mining. Although this has been a goal from the earliest work on this project, the associated research and experimentation hasn’t always taken the correct path — but then that’s the nature of research, and the model is better for it.

Much of the text on the Web site has been revised, often with significant re-wording, and similarly with some of my older blog-posts. Although this particular subject sits between two different worlds (genealogy and software), each with their own vocabulary that may clash or cause ambiguity, I also admit that some of my older word choices were the result of genealogical inexperience.

Changes to the data model include:

  • Introduction of a new Source entity that embraces both Citations and Resources for a particular information source. Citations and Resource entities are now connected to Source entity rather than to each other.
  • Support for source assimilation & analysis, source mining, and the ability to drill-down on conclusions, all provided via the Source entity.
  • The <References> element, within Events, is now superseded by <SourceLnk> which links to the new Source entity. Enclosed *Ref elements (e.g. <PersonRef>) changed to *Lnk elements for consistency. Removal of the ID attribute introduced in V3.0.
  • Support for cross-source analysis and correlation via a new Matrix entity.
  • Support for a generalised approach to multi-tier personae.
  • Additional of Animal entity, strongly modelled on Person entity, including related mark-up and namespaces.
  • <CitationLnk>/<ResourceLnk> from Person, Place, Group, and Event entities, changed to <SourceLnk>.
  • Reviewed the goal of sticking to XHTML tags for presentation, replacement of the <Hi> element with HTML-like ones, and the addition of support for <sup>/<sub> elements, columnar text, simple tables, and indentation.
  • Removal of ‘Unreadable’ mode from the <Anom> element.
  • Support for distinguishing manuscript and typescript transcriptions in the <Text> element. Support for numbering lines and pages in transcriptions. Positional control over annotations such as marginalia.
  • <FromText> element added to <Narrative> in order to share re-usable sections of text. This has meant that the NoteKey attribute, in the semantic mark-up, was no longer required and so was deleted.
  • Categorisation of the layers in a Citation chain.
  • The optional <DisplayFormat> element of the Citation entity has been re-interpreted as a set of pre-formatted language-specific strings. This may exist in addition to the mandatory set of named parameter values, and the two together can also be used as a simple citation-template.
  • The Intrinsic Functions, mentioned at the end of Semantic Mark-up, have been changed to Intrinsic Methods in preparation for defining a run-time object model. The set is also supplemented by ones for accessing subject-entity names.
  • Small changes to subject-entity *-name-mode vocabularies to factor-out a generic name-mode (missing from previous specification).
  • Place coordinates (including bounding shapes) are now time-dependent, the same as any parent-Place link.
  • Added Canton and Colony to place-type vocabulary. The place-type of House is now replaced by Number and Apartment for flexibility.
  • <Quality>, <Reliability>, and <Credibility> elements moved from the Citation entity to the new Source entity.

Although small refinements will continue, I want to concentrate subsequent efforts on describing advantages and philosophy of the data model, and in providing more worked examples.

There will be a series of blog-posts following this one that will provide a high-level introduction in order to set the scene.