Parallax View ®

Tuesday, 9 May 2017

Interactive Trees in Blogs Using SVG

OK, something here for you all to play around with. This post demonstrates a proof-of-concept (POC) development showing how interactive family trees can be embedded within blog-posts, or within arbitrary Web pages. But you’ll have to wait until the bottom of this post before you can play with these toys.

Firstly, why would anyone want to do this? Well, the writers of genealogical blogs are generally given scant tools to do their job well; there should be no reason at all why a blog-post could not be as well-presented as an article in a journal, including the use of tables and endnotes, but the available tools are overly simplistic.

This POC is looking at the other end of the spectrum: highly specialised tools for genealogical writers. Including segments of family trees into a blog-post is wonderful to help show the relationships between the people it mentions. But more than this, if the segments are interactive (with clickable actions on the nodes) then they can be used for navigation and a host of other functions.

Until now, I have resorted to drawing my own tree segments in something like Word (using their ‘Shapes’ canvas) and then copying the results to my blogs as static images. Apart from being laborious, the end results were never as clear as I would like, and would go all fuzzy if I tried to expand them. The problem is that normal raster graphics involve fixed grids of pixels (with just one ideal resolution) and employ ”lossy” compression techniques to keep the size down. The technology that I’ve used for this POC is known as Scalable Vector Graphics (SVG) and gives crystal-clear images at all resolutions.

SVG can be embedded within HTML, but I know what you’re thinking: I’m not a programmer and I don’t want to learn how to use this technology or HTML. That’s fair enough, and it would be difficult to generate this stuff by hand anyway. What I’ve done is devise a simple notation that describes the people, relationships, and associated notes. I run this through a piece of code-generating software and out pops the relevant combination of SVG, HTML, CSS, and JavaScript.

So why don’t I generate this directly from a GEDCOM file? Well, this is primarily a POC, although I will be using it for serious purposes myself. The POC is not far removed from an installable product, but I don’t use GEDCOM and so I’ll leave that step to someone else, and similarly with taking the data from an online family tree using an API. Alternatively, trying to directly embed parts of a live online tree within a blog-post, or other Web page, using an HTML <iframe> element, would have a couple of big disadvantages: (1) search engines, such as Google, would not find the associated data in your blog-post, and (2) you would have no control over which parts to trim off.

The notation I have devised does have several configuration settings allowing selected branches to be trimmed (but still indicated), horizontal or vertical tree orientations to be selected, and actionable events for when clicking. It has built-in support for multiple spouses and dubious parentage of children (connecting to parents with a different line style).

When compared with wiki approaches to narrative, such as WeRelate, then blogs are less applicable to collaboration, and they generally have much less real-estate (available screen area) to play with. Although the SVG output from my code generator could be used in both blogs and wikis, it may rely on scroll bars in cases of limited space, such as blogs and mobile devices. In the blog case, though, it is easy to jump from the embedded SVG to a full-width copy in a page of its own — all it needs is a separate site to host the file (see below).

Example

OK, now for the example. The following tree represents part of my Ashbee line, and involves several people I’ve mentioned in previous blog-posts. The green circles represent families (in this context: marriage and/or children), and both the person nodes and the family circles can be clicked to present associated details in separate panels (Ctrl+Click or Shift+Click the nodes/circles will close those details, if necessary). These details can include anything you like, and I’ve used them here to include links to my relevant blog posts (i.e. using the tree as a navigational tool). For instance, check the details on the male line from the topmost William Ashbee to Mary Phyllis Ashbee. The dashed lines represent trimmed parts of the tree where further children or siblings have not been shown.

There’s a button to switch between the horizontal and vertical orientations — either of which may be using scroll bars — and another button to expand the current orientation into a browser page all of its own. I’ve used neocities to host the HTML files for these full-width versions, mainly because it’s free and it was staggering simple to use it for this purpose.

The tool to generate these trees is now freely shared with the genealogy blogging community. Since the time of writing, it has undergone many improvements, including thumbnail images and searchable photos. See SVG-FTG Summary.

Wednesday, 3 May 2017

The Jester and the Conjuror

Probably the most profound thing I've written to date, and I cannot express it accurately any other way. The jester and the conjuror are performers in a dialogue poem — tarot counterparts being the fool and the magician — representing mankind’s naïve questions and some authority dismissing those questions until the jester realises that he’s asking the wrong ones, at which point the conjuror acknowledges a kindred connection.

What are we but weak warm flesh, and blood in its hold;

ashes and dust in a tick, or the blink of a soul.

Foolish son of Adam —

Your fickle dust is shared. You are not your physicality:

mere silhouettes dancing on the face of reality.

Mountain, sea, and tempest share not your dreams of ebb and flow.

For them no time, and beauty none in song, love, or rainbow.

Are we lost in the vastness of infinity,

forgotten in the silence of eternity?

Foolish son of man —

No rule nor law can lay siege to such a far-distant wall.

Beyond number only are falsehoods by which you trip and fall.

Though celerity and brilliance be weighed and measured,

missed is the glister of your gift, and of life so treasured.

Why am I given free will to steer the fateless,

but so little time to illume the fathomless?

Foolish son of Jof —

Your will is held fast in its gyves by time’s pattern aslope,

bound between the weight of memory and the wings of hope.

Causation is the illusion that affords you your thought,

but seeking root and reason by chasing change will yield nought.

So all the myriad waning moons and mourning suns

witness not the passing lives, nor of what becomes?

Learned son reflected —

Now is the seat and palpable throne of the conscious mind,

extending dominion over qualia and their kind.

Unbowed by the measurable world, unconquered by rune,

for this is your fate; this is your legacy; this is You!

Dedicated to my late father

#Poetry #Science #Reality

Sunday, 23 April 2017

Transcription Boundaries

We all know about transcription, right … or do we? What are the ultimate goals? What are the limits, and are they inherent ones or self-imposed ones? I’m taking this opportunity to expand on some important transcription breakthroughs in the recent STEMMA V4.1 release.

Most people would begin by transcribing textual sources paragraph-by-paragraph, or sometimes line-by-line, dependent upon the actual source. It would quickly become apparent, though, that various scenarios cannot be transcribed directly as literatim text, such as uncertain characters or words, crossed-out text, text inserted or changed, and marginal annotation. What those people then have to do is decide on some form of mark-up to represent those scenarios (see Power of Annotation), but which one?

There are many schemes, ranging from old-style manuscript mark-up[1], through simple ASCII-character mark-up, to full-blown mark-up languages such as TEI (Text-Encoding Initiative). This latter technology, for instance, can represent semi-diplomatic or full diplomatic transcription of textual sources to digital form. Diplomatic transcription might be valuable for preservation but is that what we need for analysis?

Typescript Sources

This should be the easiest of the cases; when given a page of typed text then we might employ OCR to automate the conversion to a digital form. This is all very well if it is perfectly readable, but barely-readable sections, or additional hand-written annotation, would require a mark-up scheme.

And yet there are some subtle, but profoundly important, situations that rarely get mentioned. The presence of different fonts or typefaces in a printed electronic document would be taken for granted as indicating some semantic difference (e.g. a heading, abstract, or a footnote), but what about documents produced on an old-style typewriter? The presence of different typefaces might then indicate that a document was written on different machines at different times. Similarly with the alignment of the lines, or the marginal indent. But how do we indicate that in the digital form?

Suppose that there was a difference in the sophistication of the grammar in different sections, one that might provide a vital clue to different authors. How would that be represented?

A more important question is who would be the beneficiary of those indications? Schemes concerned with preservation will employ software taxonomies to categorise every eventually, but those subtleties — which could be crucial to the analysis and interpretation of a document by a researcher — would almost certainly be excluded as unimportant in the digital representation.

Manuscript Sources

When transcribing manuscript documents then the points I’ve just raised become much more prominent. Contributions from different authors are generally more obvious because of their handwriting styles, and these obviously need to be distinguished in order to support any analysis, but what about stylistic variations?

Suppose that someone had underlined a word. That would clearly be an indication of emphasis, and the transcriber might represent it using some mark-up language (e.g. word) or some lightweight mark-up language (e.g. __word__), but what if a different word was underlined twice, or more times? This question also applies to text that has been struck-out. My point is that this is an important piece of information to capture, but how much more is required for analysis than for preservation?

As another example, consider if the author had used different coloured inks. James Joyce and Virginia Woolf both used different coloured pens or crayons in their work. Should a mark-up scheme have taxonomies for the basic colours, or all possible shades and hues? Character size and intensity (e.g. from a firm hand) can also be indicative of something. Who would benefit, though, from knowing that one paragraph was in dark green and another in light green: the software or the researcher? Is there a practical limit to the number of important variations that software taxonomies can distinguish, and if so then why do we insist on that route?

Audio Sources

Schemes that deal with audio transcription are generally specialist, and distinct from those related to textual transcription. The main reason is that those stylistic variations multiply exponentially. Not only do the transcriptions have to distinguish between contributions from different speakers, but they also need to indicate such things as speaking quickly/slowly, loudly/softly/whispered, singing, false accents, mimicry, and even different intonation. Schemes for audio transcription try to define taxonomies for these cases — although there will always be cases that aren’t covered — and the area of intonation is treated in a very formal way by linguistic analysis.

There may be cases of unknown words, slang, or strange pronunciations, each of which may need clarifying annotation.

While it is clear that the field is complex, I want to make an argument that there is a broad categorisation of the scenarios that has parallels in textual transcription, and that a single approach can deal with all three transcription source types. First, let’s look at some further complexities for audio.

There may be utterances or sounds from a given contributor that cannot be transcribed directly as text. For instance, a sneeze, cough, sniff, yawn, whistle, laugh, or swallow.

There may be a significant pause in someone’s speech that is important in the context of their words.

There may be any number of gestures or items of non-verbal communication that are equally important to capture within the transcript. For instance, a nod, smile, head-shake, squint, frown, or applause.

There may be instances where different voices — each of which is being transcribed — are overlapping each other, or where there is some untranscribed background contribution.

Conclusion

We can group all the above scenarios into the following broad categories:

Language from different contributors. Distinguishing different hands, voices, etc.
Stylistic differences from any particular contributor. Different emphasis, emotional delivery, typeface, handwriting, etc.
Annotation where explanation or clarification is needed. Examples are unusual words, unknown words, slang, or local pronunciations.
Contributions that cannot be transcribed directly or wholly as text. This includes changes, marginal notes, noises, gestures, and pauses.
Parallel Contributions. This category is specifically related to audio.

STEMMA’s transcription support is designed to make material searchable, but also to support deep analysis. Some of these categories were already catered for in the cases of textual transcription, but supplementing them to cater for the remaining categories implicitly addressed audio transcription too. For instance, the <Alt> and <NoteRef> elements already catered for category #3 and needed no changes. The <Anom> element already represented textual anomalies, and so was extended to address the other anomalies in category #4.

The way that <Anom> was extended set the scene for the other extensions I will describe in a moment. Its existing taxonomy (see the http://stemma.parallaxview.co/anomaly-mode/ namespace) was given extra items of Gesture, Noise, and Pause. Within these, though, the specific gestures and noises are described using text, by and for the researcher, and not by using some limitless software taxonomy.

The STEMMA transcription elements <ts> (typescript sources) and <ms> (manuscript sources) were supplemented by <voice> (audio sources), and each were enhanced to cope with categories #1 and #2. They were extended with new attributes of ‘id’ and ‘scheme’, For instance:

<ms id=’id’ scheme=’scheme’>An example sentence</ms>

What these attributes do is attach a key representing the contributor (e.g. a hand, or a voice) and a specific stylistic variation of that contributor. There are no taxonomies used here since the differentiation and description may be subjective; the differentiation is designed to support analysis, not simply a matter of rendition; and there need to be no constraints.

The last category (#5) is addressed by specific variations of the <voice> element that allow it to be used as a container for multiple contributions.

A small example of an audio transcription employing these features may be found at Dialogue Transcription. The <ts>, <ms>, and <voice> elements are documented at Descriptive Mark-up.

The rationale behind this approach is actually quite a well-known one, although not in this field. In the area of Web mark-up, HTML5 tries to separate structure and content from presentation, the latter being left to something like CSS. For the formatting of Web pages, this avoids cluttering the mark-up describing the structure and content of page information, and ensures a consistent presentational style is applied across the pages. For transcription, it avoids cluttering the mark-up describing the structure and content from various contributors, but leaving complete freedom to the researcher to describe these in narrative as part of their analysis process.

[1] Rarely usable in a computer-based transcription because the old symbol set does not correspond with available symbols in an electronic document.

Wednesday, 19 April 2017

STEMMA V4.1

An original goal of STEMMA was to be able to represent rich-text narrative that could be used for authored works, including essays, memories, and reports. In addition, it aimed to support transcription, including transcribed extracts, which has quite specific requirements of its own.

STEMMA V4.1 has concentrated on its mark-up in these areas and has solved a number of long-standing issues with some novel approaches. Such was the success of the approach to textual transcription that this version also addresses audio transcription as a companion to it. I know of no other system that addresses both of these in a consistent manner, and certainly not when including rich-text authored work and semantic mark-up.

Overview

A goal of HTML5 was to separate structure and content from presentation in Web pages. STEMMA has applied a similar principle to its descriptive mark-up for both authored work and transcription.

STEMMA is not a presentation format. It therefore concerns itself with narrative structure, content, and semantics, but not the finer details of the presentation such as colours and fonts. STEMMA narrative may be transformed into any number of presentational formats for visualisation (e.g. HTML+CSS), and it is in these formats that such things would be configured, including page size, style galleries, choice of footnote/endnote/tablenote indicators, heading and cell formatting in tables, caption position, paragraph separation, styles for semantic elements, and so on.

Unusually, it has also applied this principle to both textual and audio transcription. Identifying the structure and content is more important that the finer details of their style and presentation, and the interpretation of any stylistic differences requires analysis rather than it simply being a display matter. For instance, marking where a manuscript used different colours in different places is more important than the specific colours and shades — that level of detail can be written in narrative for the reviewer rather than trying to use some limitless taxonomy for the software. Similarly with different written styles, which may or may not have been evidence of multiple authors. In a typescript document, it would equally apply to different fonts, font-sizes, ink intensity, marginal alignment, or even usage of grammar; all of these could have a bearing on the analysis of that document.

In audio transcription, this approach simplifies a complex area by giving freedom to the transcriber to detail the different voices, intonations, noises, and gestures.

Authored Work

The functionality of STEMMA’s descriptive mark-up has now evolved to the level where I can automatically generate blog-posts for research articles directly from my internal representation.

In order to demonstrate the new version, I have used the recent 5000-word article entitled Jesson Lesson to generate a fully-worked STEMMA example, available at JessonLesson.xml. This genuine research article included precise layout, transcribed extracts, tabulation, endnotes and tablenotes, and hyperlinked images. Its 47 endnotes included examples of reference-note citations, discursive notes, analytical commentary, and multi-source references — the handling of which was outlined previously at Cite Seeing — but also included examples of conflated citations where details of multiple people are placed in a single note for readability.

It was always a personal goal to produce better quality research articles, and so force STEMMA to address real-world scenarios rather than “desktop scenarios”. As a result of this, STEMMA’s general approach to citations has shifted slightly. Although support of citation-elements — implemented using its Parameter mechanism — has been enhanced, the focus of the computer-readable form is now on correlation and interrogation rather than mere formatting. The number of real-world cases (see list under Citations) is just too great for authored works to delegate formatting entirely to software that acts blindly from mere values. This version, therefore, finds a bridge between preferred hand-crafted forms and computer-readable citation-elements.

Another area that has been enhanced greatly is tables, which now support control over table width, column widths and alignment, captions, and tablenotes (i.e. citations deposited at the foot of a table).

Textual Transcription

The existing <ts> element, used to mark text transcribed from a typescript document, and the <ms> element, used to mark text transcribed from a manuscript document, both have new ‘id’ and ‘scheme’ attributes. These label the respective contributions with user-defined tags — ‘id’ for distinct contributions, such as different authors, and ‘scheme’ for stylistic variations — that can be described separately for the benefit of the reviewer.

For instance:

bold-blue – text was written with a broad-tipped turquoise felt marker.

</Text></NoteRef>

<ms scheme=’bold-blue’>This section is now out-of-date and is being reworked</ms>

The elements <page>, <col>, , <line>, and <posn> now take SVG-like image coordinates (percentage displacement from top-left image corner) for linking transcription elements to a copy of the original document. One use of this is to support parallel scrolling of image and transcription for the end-user.

The associated image is specified by a preceding <ResourceRef> element identifying a Resource entity using the mode ‘SynchImage’.

Audio Transcription

For audio transcription, the <voice> element provides the analogy to <ts>/<ms>, and it similarly takes ‘id’ and ‘scheme’ attributes. This allows different vocal (or other audio) contributions to be distinguished, and also their intonation, emotional delivery, artificial accents, etc.

Additional features are supported in a way analogous to textual transcription:

Anomalous contributions from an individual that cannot be represented as text, including noises, pauses, and gestures – see <Anom>
Alternative word meanings, clarifications, or other notes – <Alt> and <NoteRef>, exactly as with textual transcription
Time synchronisation – time-stamping with <time>. This is analogous to the <posn> element, and other x/y coordinates, used for textual transcription.

For time-stamping, the associated recording is specified by a preceding <ResourceRef> element identifying a Resource entity using the mode ‘SynchAudio’, analogous to ‘SynchImage’ for images.

As well as marking distinct voices, these features include the ability to mark overlapping contributions and background contributions. An example demonstrating many of these features may be found at Dialogue Transcription.

Change Details

Specific changes to the data model include the following:

‘WhereIn’ attribute added to Citation Parameter definitions. This finally provides the missing criteria necessary for the automatic generation of shortened subsequent reference-note citations. ‘Subst’ attribute added to Citation parameter values in order to override formatting, or provide a substitution for cases on of a value being unavailable.
<ParentCitationLnk> now allowed in both <CitationLnk> and <CitationRef> elements in order to create transient chained citations.
Quality element, within Source entity, moved inside the Frame element.
Review of entries in citation-layer-type namespace.
DataControl element of Resource entity supports attribution text.
Control of table widths, and individual column widths and alignments.
Ability to align images when embedded within narrative.
Ability to hyperlink images embedded in narrative.
Requirement for enclosing Narrative element dropped for Text elements, except for top-level Narrative entities. Text elements can now be nested.
<cb> replaced with <col>, and relationship between paragraphs and columns now reversed (paragraphs now within columns).
ResourceRef Mode=SynchImage allows synchronisation between images and transcriptions.
Corresponding SVG-x/y coordinates added to elements <page>, <col>, , and <line>. Additional <posn> element defined to associate coordinates with arbitrary text locations.
<Page>/<Line> renamed to <page>/><line> and moved alongside /<col> as related to structure and content rather than semantics.
Mode=Tablenote attribute supplementing Foonote and Endnote in various places.
Text-element Header=boolean attribute replaced with Class=Header | H1 | H2 | H3 | Caption | Footnote | Endnote | Legend | Tablenote.
Text-element Class=Caption attribute used in Resource/ResourceRef and tables for generating captions.
Text-element Class=Footnote | Endnote | Tablenote attribute used in CitationRef to allow pre-formed (preferred) citations.
Deprecated the <Text> attributes Abstract=boolean, Extract=boolean, Manuscript=boolean, and Transcript=boolean..<voice> mark-up added to supplement existing <ts>/<ms> mark-up. <ts>/<ms>/<voice> all enhanced to cope with different hands, voices, fonts, colours, etc.
In transcripts of audio recordings, support for multiple voices, overlapping dialogue, intonation, gestures, noises, pauses, timestamps, etc.
ResourceRef Mode=SynchAudio allows synchronisation between audio recordings and transcriptions, analogous to SynchImage for textual transcription (above).
Complete revision of Mode values for CitationRef element.
Relaxation of Date Parameters in order to cover the full range of calendars. One requirement was to represent the date-of-issue for newspaper sources that predated the Julian-to-Gregorian changeover.

Further refinements to this data model are uncertain as it has now achieved the level of stability and functionality that was required for its serious usage.