In A
Calendar for Your Date — Part I, I gave a brief tour of the variations in
current and historical calendar systems. I now want to approach the question of
how we should represent dates that are not expressed according to our Gregorian
calendar.
The different calendar systems may be categorised as
follows:[1]
Empirical. The start of the months or years is determined by direct
observation and intercalary days or months are inserted on an ad hoc basis.
Calculated. These are rule-based and so are predictable. Lunisolar
and Solar calendars may be astronomical
rather than arithmetic in that the
start of months and years may be determined through astronomical calculation
rather than purely by using a fixed rule. Calendars with “wandering years”,
such as the Egyptian civil calendar and the Mayan calendar, have a simple fixed
number of days per year.
Conversion between calculated calendars can be approached
algorithmically, if enough information is known, but empirical calendars
require the use of tables, and it is rarely the case that enough historical
information survives to make this an accurate process.
In general, when we see an historical date, we cannot always
convert it directly and unambiguously to our modern Gregorian calendar; it
requires some interpretation, and that in turn requires information about the
actual calendar variant that was being used, the social group involved, their
political and religious leanings, the weather, and maybe even the geographical
coordinates.
As a modern-day analogy of this problem, consider if I’d
written the date 9/6/2015. Now did I mean June 9th or September 6th?
If the date had been written in the US then you might believe that the latter
alternative is more likely. However, some knowledge of the author, and the fact
he has worked in the US but that he is English by birth, might suggest the
former alternative is more likely. Hopefully, you see the problem: the
converted value cannot always be faithful to the original source information,
and it might require an update based on the analysis of new information, or the
availability of a revised algorithm or tables.
A number of resources exist for calendar conversions — both
algorithms and documentation — although some seem to have disappeared since I
first saw them:
URL
|
Description
|
Status
|
Notes
|
Converter for historical calendars
|
|
|
|
Calendars and Their History
|
Currently inaccessible
|
See "inexact" in sec 1.3 observational calendars
|
|
Indian Calendrical Calculations
|
|
|
|
Convert a Date
|
|
|
|
Pancanga (version 3.14)
|
|
|
These resources involve specific calendar variants that
would have to be known in advance, and at least one acknowledges the inexact
nature of general calculations. In effect, a calculated date can never be more
accurate than the original, and there will be many cases where it will be less
accurate.
What I’m suggesting is that wholesale conversion of
historical dates to the Gregorian calendar is a very bad approach for genealogy,
and for historians in general. Obviously we need the ability to put dates from
different calendars on the same timeline, but that does not mean discarding the
original information in favour of a calculated alternative; a process which may
involve an increased degree of uncertainty or imprecision (as differentiated in
Warm
Fuzzy Dates), as well as some loss of evidence. Furthermore, if software is
going to collate these dates then it needs a representation that it can
understand, and we cannot suffice with just the written evidential form and the
calculated variant.
Let me explain what this last sentence means by using a
small example. STEMMA applies a bilateral approach to all data items
assimilated into a computer-readable form, as explained in Returning
to Normalised Names and Dates and in Is That
a Fact?. What this means is that it holds a transcript of the original
evidential form and a separate computer-readable (normalised) form. A
computer-readable date would be used for sorting and collation, but also for
generating a display form — say for a report or a chart — according to the
regional settings and personal preferences of the current end-user. For
instance:
Evidential form: 25 Dec 56
Normalised form: 1856-12-25
Display form: 25th
December 1856
There are a couple of points to note in this simple Gregorian
example. Firstly, those software developers who believe that it is possible to
automatically convert written (Gregorian-)dates to a normalised form (ISO 8601
in this case) would probably have interpreted this date as 1956 rather than
1856, thus emphasising the importance of the context of the information. I
could have used an evidential form such as “Christmas 56” to hammer that home
but I wanted to give a sense of its subtlety. Similarly, an evidential form of
“my birthday” is also referencing a date, but whose birthday, and in which year
— all of which is contextual information that a researcher would use to apply a
conversion. My second point is that the display form is generally (in modern
software) produced according to a “short”, “medium”, “long”, or “full” request,
and that request would examine the end-user’s settings in order to generate a
consistent representation for readability. This is an approach that could be
applied to all calendars, in principle.
In the case of a calendar conversion, a missing item would
be a normalised value in the alternative calendar; one that must be flagged as
being “calculated” to avoid ambiguity. The next example includes a date from
the French
Republican calendar converted to the Gregorian calendar..
Evidential form: 18 Brum an VIII
Normalised form: #FR#08-02-18
Display form: 18 Brumaire An 8
Normalised form: 1799-11-09 (Calc)
Display form: 9th November 1799 (Calc)
The normalised form is the STEMMA one since there is no standard
that I am aware of. The associated display form uses Arabic year numbers rather
than Roman numerals. Coinage of the time often used these rather than the Roman
numerals used elsewhere, but it would obviously be a display setting. The two
extra fields show the equivalent normalised date after conversion to the
Gregorian calendar, and its associated display form. The two normalised forms
are therefore distinct in that one is a direct implementation of the evidential
form whereas the other is a derivation. The second should therefore be flagged
as a calculated datum using something akin to the GEDCOM CAL flag (see
DATE_APPROXIMATED in the specification). STEMMA would allow the two forms to be
bound using the DATE_ENTITY
that’s also used for synchronised dates
(i.e. its generalised form of dual dates).
Unfortunately, there are no data standards to accommodate
the normalised representation of dates in other calendars. All we have is the
ISO 8601 date standard[2],
which is specific to the Gregorian calendar and largely the result of an
amalgamation of previous standards. Much of its content gets ignored in favour
of the pure representation of a Gregorian date and/or time, and that includes
ranges, ordinal dates, etc. A critique of that standard may be found at: Is
the ISO Date Standard Bad?.
GEDCOM 5.5 includes a small set of “date escapes”[3]
that can prefix a date value in order to address different calendars:
@#DGREGORIAN@ — Gregorian calendar
@#DJULIAN@ — Julian calendar
@#DHEBREW@ — Jewish calendar
@#DFRENCH R@ — French Republican calendar
@#DROMAN@ — for future definition
@#DUNKNOWN@ — for unknown calendars
This sounds
like a step in the right direction although the specification offers little
help on the encoding of year numbers or month names for the non-Gregorian
calendars. It does acknowledge the ambiguity of using words rather than numbers
via the statement: “No future calendar types will use words (e.g. month names)
from this list: FROM, TO, BEF, AFT, BET, AND, ABT, EST, CAL, or INT”.
In February of
2015, Bob Coret analysed the usage on this calendar feature in a sample of 82.9
million DATE lines from about 7000 GEDCOM files.[4] He
reported the following very low permille (i.e. tenths of a percent) usage — all
others being zero:
@#DJULIAN@ 0.123 ‰
@#DHEBREW@ 0.013 ‰
@#DFRENCH R@ 0.006 ‰
Clearly this feature is very underutilised, but what is the
reason? Is it that few people have dates in alternative calendars, or that they
only store the Gregorian equivalents, or that their software does not support
this feature?
Family
Historian uses a “[J]” prefix for entering
dates in the Julian calendar, and this has also become a display option in some
other products (e.g. TNG). For
instance: “[J] 1 Mar 1740”. A consequence is that this alternative syntax
occasionally creeps into exported GEDCOM dates to dirty the water.
The Unicode Common Locale Data Repository (CDLR) has also proposed a set of calendar
names for computer use at: http://unicode.org/repos/cldr/trunk/common/bcp47/calendar.xml,
although I cannot see any details of corresponding date encodings. It appears
to be part of an extension to BCP47
("Tags for Identifying Languages") called RFC6067 for "subtags that specify
language and/or locale-based behaviour or refinements to language tags,
according to work done by the Unicode Consortium”.
The MARC Extended Date/Time Format (EDTF)
makes no mention of calendars as it is applicable only to the Gregorian
calendar.
The Society of American Archivists (SAA) adopted DACS
(Describing Archives, A Content Standard) in 2004. This mentions alternative
calendar systems but only from a written point of view as opposed to a
digital one. Their Standards for Archival Description, Chapter 7 (Codes),
does mention the Julian calendar but only in the context of ordinal dates.
The MSS Working
Group discusses a number of issues related to date/time representation,
including dates from non-Gregorian calendars.
The ISO 8601 standard that addresses the Gregorian calendar
has a few attractive core features:
- It uses fixed-length all-numeric fields and so avoids language issues and textual ambiguties (see GEDCOM list of avoided names, above).
- The resultant text is implicitly sortable without the host software having to understand dates at all.
Ideally, any standard for the computer-readable date formats
in the other calendars should adopt a similar approach. This was STEMMA’s goal
from its inception. However, it found that it had to adopt a variation
of the ISO 8601 format in order to support missing levels of granularity
(such as yearly quarters) and to correctly sort differing granularities with
respect to each other — two criticisms in the aforementioned article. Apart
from the easy cases of the Julian and French Republican calendars, it has made
no further headway. What it has done, though, is to create a generic Date
entity that can be back-filled with the encodings for any number of calendars —
once they’ve been defined — and without changing its overall data model. This
is an approach that I strongly recommend to FHISO in order to avoid prematurely
dismissing this issue, and then later finding that some method of date escapes is required, similar to
GEDCOM.
A number of papers were received by FHISO on the subject of calendars, and their
approaches and coverage appear to be very constructive. At the time of writing,
there was no associated Exploratory Group established for research into this
field.
CFPS
|
Title
|
Description
|
Proposal to support dates BC as negative years
|
This paper presents a case for allowing dates BC to be
recorded using the standard Julian and Gregorian calendars, proposes a
representation for such dates that is naturally sortable.
|
|
Proposal to extend the calendar style mechanism of CFPS 43
into an abstract formatting model
|
CFPS 43′s style mechanism is extended into abstract
formatting model that would allow applications to format correctly dates
written in many unknown calendar systems.
|
|
Proposal to support the Julian calendar similarly to CFPS
17
|
Proposal for a Julian calendar with years starting on 1
Jan
|
|
Proposal to add style to the wholly-numeric representation
of dates in CFPS 13
|
Proposal to separate presentation from representation in
calendars in order to avoid a proliferation of calendars.
|
|
Proposal for compound calendars to resolve a difficulty
with default calendars
|
Proposal to allow the default calendar to be dependent on
the date.
|
|
Proposal for a Generalised Dual-Date Representation
|
Proposal for a generalised dual-date representation that
applies to multiple calendars
|
|
Proposal to Accommodate Alternative World Calendar Systems
|
Proposed adoption of a date syntax applicable to multiple
world calendars, both historical and modern-day.
|
A question I have heard before is why those uncertainties
and inaccuracies should be relevant to genealogists. What difference does it
make if you’re a few days out, or a month, or possibly even a year? I entirely
disagree with this thinking. Even if you’re only building a family tree then
the relationships and vital events might not be supported by direct and
non-conflicting evidence; there may have to be some interpretation, and some
correlation with information from elsewhere in order to justify them.
A bigger question people may pose is why the historical
calendars are of interest to genealogists. After all, there is at least some
agreed synchronisation between the six principal calendars that are in use
today. Not many people can trace their lineage back to, say, Caesar. Well, even
historical characters had lineage, and family history, so whether you’re
studying modern genealogy or ancient genealogy should be irrelevant. More than
this, though, I do not consider genealogy (including family trees and family
history) to be a special case that needs its own standards and methodologies.
It is a form of micro-history, which in turn is a form of history. The information
that we uncover and analyse in our research does not come from a world of its
own, and it cannot be considered in isolation. All those events — both
large-scale and small-scale — relate to the real world, and will affect each
other. Historical research needs a consistent scheme that respects the
integrity of our sources and the information found therein. To suggest that
software standards, or the Internet, or populist genealogy products, must stick
to Gregorian dates would be a case of the tail wagging the dog.
[1] E. G.
Richards, Mapping Time: The Calendar and
its History (1998; reprint, Oxford University Press, 2005), p.99.
[2] Data elements and interchange formats —
Information interchange — Representation of dates and times, International
Standard, ISO 8601:2004(E), 3rd ed. 1 Dec 2004; online copy obtained
from http://dotat.at/tmp/ISO_8601-2004_E.pdf
(accessed 11 Jul 2015).
[3]
Actually, version 5.3 also contained this feature but version 5.4 omitted it
with the following explanatory statement: “The Lineage-Linked GEDCOM Form is
restricted to Gregorian calendar forms. This version of GEDCOM chose not to
support multiple calendars. The reason is that support of multiple calendars would
require each receiving system to handle multiple calendar conversions”. Source:
Tamura Jones, "FamilySearch GEDCOM Specifications", Modern Software Experience (http://www.tamurajones.net/FamilySearchGEDCOMSpecifications.xhtml
: accessed 19 Jul 2015).
[4] Bob
Coret, "Usage of calendars in GEDCOM", Bob Coret in English, posted 5 Feb 2015 (http://blog-en.coret.org/2015/02/usage-of-calendars-in-gedcom.html
: accessed 19 Jul 2015).
No comments:
Post a Comment