An original goal of STEMMA was to be able to represent
rich-text narrative that could be used for authored works, including essays,
memories, and reports. In addition, it aimed to support transcription,
including transcribed extracts, which has quite specific requirements of its
own.
STEMMA V4.1 has concentrated on its mark-up in these areas
and has solved a number of long-standing issues with some novel approaches.
Such was the success of the approach to textual transcription that this version
also addresses audio transcription as a companion to it. I know of no other
system that addresses both of these in a consistent manner, and certainly not
when including rich-text authored work and semantic mark-up.
A goal of HTML5 was to separate structure and content from
presentation in Web pages. STEMMA has applied a similar principle to its descriptive
mark-up for both authored work and transcription.
STEMMA is not a presentation format. It therefore concerns
itself with narrative structure, content, and semantics, but not the finer
details of the presentation such as colours and fonts. STEMMA narrative may be
transformed into any number of presentational formats for visualisation (e.g.
HTML+CSS), and it is in these formats that such things would be configured,
including page size, style galleries, choice of footnote/endnote/tablenote
indicators, heading and cell formatting in tables, caption position, paragraph
separation, styles for semantic elements, and so on.
Unusually, it has also applied this principle to both
textual and audio transcription. Identifying the structure and content is more
important that the finer details of their style and presentation, and the
interpretation of any stylistic differences requires analysis rather than it
simply being a display matter. For instance, marking where a manuscript used
different colours in different places is more important than the specific
colours and shades — that level of detail can be written in narrative for the
reviewer rather than trying to use some limitless taxonomy for the software.
Similarly with different written styles, which may or may not have been
evidence of multiple authors. In a typescript document, it would equally apply
to different fonts, font-sizes, ink intensity, marginal alignment, or even
usage of grammar; all of these could have a bearing on the analysis of that
document.
In audio transcription, this approach simplifies a complex
area by giving freedom to the transcriber to detail the different voices,
intonations, noises, and gestures.
The functionality of STEMMA’s descriptive mark-up has now
evolved to the level where I can automatically generate blog-posts for research
articles directly from my internal representation.
In order to demonstrate the new version, I have used the
recent 5000-word article entitled Jesson
Lesson to generate a fully-worked STEMMA example, available at JessonLesson.xml.
This genuine research article included precise layout, transcribed extracts,
tabulation, endnotes and tablenotes, and hyperlinked images. Its 47 endnotes
included examples of reference-note citations, discursive notes, analytical
commentary, and multi-source references — the handling of which was outlined
previously at Cite
Seeing — but also included examples of conflated citations where details of
multiple people are placed in a single note for readability.
It was always a personal goal to produce better quality
research articles, and so force STEMMA to address real-world scenarios rather
than “desktop scenarios”. As a result of this, STEMMA’s general approach to
citations has shifted slightly. Although support of citation-elements —
implemented using its Parameter mechanism — has been enhanced, the focus of the
computer-readable form is now on correlation and interrogation rather than mere
formatting. The number of real-world
cases (see list under Citations)
is just too great for authored works to delegate formatting entirely to
software that acts blindly from mere values. This version, therefore, finds
a bridge between preferred hand-crafted forms and computer-readable
citation-elements.
Another area that has been enhanced greatly is tables, which
now support control over table width, column widths and alignment, captions,
and tablenotes (i.e. citations deposited at the foot of a table).
The existing <ts> element, used to mark text
transcribed from a typescript document, and the <ms> element, used to
mark text transcribed from a manuscript document, both have new ‘id’ and
‘scheme’ attributes. These label the respective contributions with user-defined
tags — ‘id’ for distinct contributions, such as different authors, and ‘scheme’
for stylistic variations — that can be described separately for the benefit of
the reviewer.
For instance:
<NoteRef><Text Class=’Legend’>
bold-blue – text was written with a broad-tipped turquoise felt
marker.
</Text></NoteRef>
<ms scheme=’bold-blue’>This section is now out-of-date and is
being reworked</ms>
The elements <page>, <col>, <p>, <line>,
and <posn> now take SVG-like image coordinates (percentage displacement
from top-left image corner) for linking transcription elements to a copy of the
original document. One use of this is to support parallel scrolling of image
and transcription for the end-user.
The associated image is specified by a preceding
<ResourceRef> element identifying a Resource entity using the mode
‘SynchImage’.
For audio transcription, the <voice> element provides
the analogy to <ts>/<ms>, and it similarly takes ‘id’ and ‘scheme’
attributes. This allows different vocal (or other audio) contributions to be
distinguished, and also their intonation, emotional delivery, artificial
accents, etc.
Additional features are supported in a way analogous to textual
transcription:
- Anomalous contributions from an individual that cannot be represented as text, including noises, pauses, and gestures – see <Anom>
- Alternative word meanings, clarifications, or other notes – <Alt> and <NoteRef>, exactly as with textual transcription
- Time synchronisation – time-stamping with <time>. This is analogous to the <posn> element, and other x/y coordinates, used for textual transcription.
For time-stamping, the associated recording is specified by
a preceding <ResourceRef> element identifying a Resource entity using the
mode ‘SynchAudio’, analogous to ‘SynchImage’ for images.
As well as marking distinct voices, these features include
the ability to mark overlapping contributions and background contributions. An
example demonstrating many of these features may be found at Dialogue
Transcription.
Specific changes to the data model include the following:
- ‘WhereIn’ attribute added to Citation Parameter definitions. This finally provides the missing criteria necessary for the automatic generation of shortened subsequent reference-note citations. ‘Subst’ attribute added to Citation parameter values in order to override formatting, or provide a substitution for cases on of a value being unavailable.
- <ParentCitationLnk> now allowed in both <CitationLnk> and <CitationRef> elements in order to create transient chained citations.
- Quality element, within Source entity, moved inside the Frame element.
- Review of entries in citation-layer-type namespace.
- DataControl element of Resource entity supports attribution text.
- Control of table widths, and individual column widths and alignments.
- Ability to align images when embedded within narrative.
- Ability to hyperlink images embedded in narrative.
- Requirement for enclosing Narrative element dropped for Text elements, except for top-level Narrative entities. Text elements can now be nested.
- <cb> replaced with <col>, and relationship between paragraphs and columns now reversed (paragraphs now within columns).
- ResourceRef Mode=SynchImage allows synchronisation between images and transcriptions.
- Corresponding SVG-x/y coordinates added to elements <page>, <col>, <p>, and <line>. Additional <posn> element defined to associate coordinates with arbitrary text locations.
- <Page>/<Line> renamed to <page>/><line> and moved alongside <p>/<col> as related to structure and content rather than semantics.
- Mode=Tablenote attribute supplementing Foonote and Endnote in various places.
- Text-element Header=boolean attribute replaced with Class=Header | H1 | H2 | H3 | Caption | Footnote | Endnote | Legend | Tablenote.
- Text-element Class=Caption attribute used in Resource/ResourceRef and tables for generating captions.
- Text-element Class=Footnote | Endnote | Tablenote attribute used in CitationRef to allow pre-formed (preferred) citations.
- Deprecated the <Text> attributes Abstract=boolean, Extract=boolean, Manuscript=boolean, and Transcript=boolean..<voice> mark-up added to supplement existing <ts>/<ms> mark-up. <ts>/<ms>/<voice> all enhanced to cope with different hands, voices, fonts, colours, etc.
- In transcripts of audio recordings, support for multiple voices, overlapping dialogue, intonation, gestures, noises, pauses, timestamps, etc.
- ResourceRef Mode=SynchAudio allows synchronisation between audio recordings and transcriptions, analogous to SynchImage for textual transcription (above).
- Complete revision of Mode values for CitationRef element.
- Relaxation of Date Parameters in order to cover the full range of calendars. One requirement was to represent the date-of-issue for newspaper sources that predated the Julian-to-Gregorian changeover.
Further refinements to this data model are uncertain as it
has now achieved the level of stability and functionality that was required for
its serious usage.
This comment has been removed by the author.
ReplyDelete