Sunday 19 July 2015

A Calendar for Your Date — Part II



In A Calendar for Your Date — Part I, I gave a brief tour of the variations in current and historical calendar systems. I now want to approach the question of how we should represent dates that are not expressed according to our Gregorian calendar.

The different calendar systems may be categorised as follows:[1]

Empirical. The start of the months or years is determined by direct observation and intercalary days or months are inserted on an ad hoc basis.

Calculated. These are rule-based and so are predictable. Lunisolar and Solar calendars may be astronomical rather than arithmetic in that the start of months and years may be determined through astronomical calculation rather than purely by using a fixed rule. Calendars with “wandering years”, such as the Egyptian civil calendar and the Mayan calendar, have a simple fixed number of days per year.

Conversion between calculated calendars can be approached algorithmically, if enough information is known, but empirical calendars require the use of tables, and it is rarely the case that enough historical information survives to make this an accurate process.



In general, when we see an historical date, we cannot always convert it directly and unambiguously to our modern Gregorian calendar; it requires some interpretation, and that in turn requires information about the actual calendar variant that was being used, the social group involved, their political and religious leanings, the weather, and maybe even the geographical coordinates.

As a modern-day analogy of this problem, consider if I’d written the date 9/6/2015. Now did I mean June 9th or September 6th? If the date had been written in the US then you might believe that the latter alternative is more likely. However, some knowledge of the author, and the fact he has worked in the US but that he is English by birth, might suggest the former alternative is more likely. Hopefully, you see the problem: the converted value cannot always be faithful to the original source information, and it might require an update based on the analysis of new information, or the availability of a revised algorithm or tables.

A number of resources exist for calendar conversions — both algorithms and documentation — although some seem to have disappeared since I first saw them:

URL
Description
Status
Notes
Converter for historical calendars


Calendars and Their History
Currently inaccessible
See "inexact" in sec 1.3 observational calendars
Indian Calendrical Calculations


Convert a Date


Pancanga (version 3.14)



These resources involve specific calendar variants that would have to be known in advance, and at least one acknowledges the inexact nature of general calculations. In effect, a calculated date can never be more accurate than the original, and there will be many cases where it will be less accurate.

What I’m suggesting is that wholesale conversion of historical dates to the Gregorian calendar is a very bad approach for genealogy, and for historians in general. Obviously we need the ability to put dates from different calendars on the same timeline, but that does not mean discarding the original information in favour of a calculated alternative; a process which may involve an increased degree of uncertainty or imprecision (as differentiated in Warm Fuzzy Dates), as well as some loss of evidence. Furthermore, if software is going to collate these dates then it needs a representation that it can understand, and we cannot suffice with just the written evidential form and the calculated variant.

Let me explain what this last sentence means by using a small example. STEMMA applies a bilateral approach to all data items assimilated into a computer-readable form, as explained in Returning to Normalised Names and Dates and in Is That a Fact?. What this means is that it holds a transcript of the original evidential form and a separate computer-readable (normalised) form. A computer-readable date would be used for sorting and collation, but also for generating a display form — say for a report or a chart — according to the regional settings and personal preferences of the current end-user. For instance:

Evidential form:     25 Dec 56
Normalised form:     1856-12-25
Display form:        25th December 1856

There are a couple of points to note in this simple Gregorian example. Firstly, those software developers who believe that it is possible to automatically convert written (Gregorian-)dates to a normalised form (ISO 8601 in this case) would probably have interpreted this date as 1956 rather than 1856, thus emphasising the importance of the context of the information. I could have used an evidential form such as “Christmas 56” to hammer that home but I wanted to give a sense of its subtlety. Similarly, an evidential form of “my birthday” is also referencing a date, but whose birthday, and in which year — all of which is contextual information that a researcher would use to apply a conversion. My second point is that the display form is generally (in modern software) produced according to a “short”, “medium”, “long”, or “full” request, and that request would examine the end-user’s settings in order to generate a consistent representation for readability. This is an approach that could be applied to all calendars, in principle.

In the case of a calendar conversion, a missing item would be a normalised value in the alternative calendar; one that must be flagged as being “calculated” to avoid ambiguity. The next example includes a date from the French Republican calendar converted to the Gregorian calendar..

Evidential form:     18 Brum an VIII
Normalised form:     #FR#08-02-18
Display form:        18 Brumaire An 8
Normalised form:     1799-11-09         (Calc)
Display form:        9th November 1799  (Calc)

The normalised form is the STEMMA one since there is no standard that I am aware of. The associated display form uses Arabic year numbers rather than Roman numerals. Coinage of the time often used these rather than the Roman numerals used elsewhere, but it would obviously be a display setting. The two extra fields show the equivalent normalised date after conversion to the Gregorian calendar, and its associated display form. The two normalised forms are therefore distinct in that one is a direct implementation of the evidential form whereas the other is a derivation. The second should therefore be flagged as a calculated datum using something akin to the GEDCOM CAL flag (see DATE_APPROXIMATED in the specification). STEMMA would allow the two forms to be bound using the DATE_ENTITY that’s also used for synchronised dates (i.e. its generalised form of dual dates).

Unfortunately, there are no data standards to accommodate the normalised representation of dates in other calendars. All we have is the ISO 8601 date standard[2], which is specific to the Gregorian calendar and largely the result of an amalgamation of previous standards. Much of its content gets ignored in favour of the pure representation of a Gregorian date and/or time, and that includes ranges, ordinal dates, etc. A critique of that standard may be found at: Is the ISO Date Standard Bad?.

GEDCOM 5.5 includes a small set of “date escapes”[3] that can prefix a date value in order to address different calendars:

@#DGREGORIAN@ — Gregorian calendar
@#DJULIAN@ — Julian calendar
@#DHEBREW@ — Jewish calendar
@#DFRENCH R@ — French Republican calendar
@#DROMAN@ — for future definition
@#DUNKNOWN@ — for unknown calendars

This sounds like a step in the right direction although the specification offers little help on the encoding of year numbers or month names for the non-Gregorian calendars. It does acknowledge the ambiguity of using words rather than numbers via the statement: “No future calendar types will use words (e.g. month names) from this list: FROM, TO, BEF, AFT, BET, AND, ABT, EST, CAL, or INT”.

In February of 2015, Bob Coret analysed the usage on this calendar feature in a sample of 82.9 million DATE lines from about 7000 GEDCOM files.[4] He reported the following very low permille (i.e. tenths of a percent) usage — all others being zero:

@#DJULIAN@        0.123 ‰
@#DHEBREW@     0.013 ‰
@#DFRENCH R@  0.006 ‰

Clearly this feature is very underutilised, but what is the reason? Is it that few people have dates in alternative calendars, or that they only store the Gregorian equivalents, or that their software does not support this feature?

Family Historian uses a “[J]” prefix for entering dates in the Julian calendar, and this has also become a display option in some other products (e.g. TNG). For instance: “[J] 1 Mar 1740”. A consequence is that this alternative syntax occasionally creeps into exported GEDCOM dates to dirty the water.

The Unicode Common Locale Data Repository (CDLR) has also proposed a set of calendar names for computer use at: http://unicode.org/repos/cldr/trunk/common/bcp47/calendar.xml, although I cannot see any details of corresponding date encodings. It appears to be part of an extension to BCP47 ("Tags for Identifying Languages") called RFC6067 for "subtags that specify language and/or locale-based behaviour or refinements to language tags, according to work done by the Unicode Consortium”.

The MARC Extended Date/Time Format (EDTF) makes no mention of calendars as it is applicable only to the Gregorian calendar.

The Society of American Archivists (SAA) adopted DACS (Describing Archives, A Content Standard) in 2004. This mentions alternative calendar systems but only from a written point of view as opposed to a digital one. Their Standards for Archival Description, Chapter 7 (Codes), does mention the Julian calendar but only in the context of ordinal dates.

The MSS Working Group discusses a number of issues related to date/time representation, including dates from non-Gregorian calendars.

The ISO 8601 standard that addresses the Gregorian calendar has a few attractive core features:

  • It uses fixed-length all-numeric fields and so avoids language issues and textual ambiguties (see GEDCOM list of avoided names, above).
  • The resultant text is implicitly sortable without the host software having to understand dates at all.

Ideally, any standard for the computer-readable date formats in the other calendars should adopt a similar approach. This was STEMMA’s goal from its inception. However, it found that it had to adopt a variation of the ISO 8601 format in order to support missing levels of granularity (such as yearly quarters) and to correctly sort differing granularities with respect to each other — two criticisms in the aforementioned article. Apart from the easy cases of the Julian and French Republican calendars, it has made no further headway. What it has done, though, is to create a generic Date entity that can be back-filled with the encodings for any number of calendars — once they’ve been defined — and without changing its overall data model. This is an approach that I strongly recommend to FHISO in order to avoid prematurely dismissing this issue, and then later finding that some method of date escapes is required, similar to GEDCOM.

A number of papers were received by FHISO on the subject of calendars, and their approaches and coverage appear to be very constructive. At the time of writing, there was no associated Exploratory Group established for research into this field.

CFPS
Title
Description
Proposal to support dates BC as negative years
This paper presents a case for allowing dates BC to be recorded using the standard Julian and Gregorian calendars, proposes a representation for such dates that is naturally sortable.
Proposal to extend the calendar style mechanism of CFPS 43 into an abstract formatting model
CFPS 43′s style mechanism is extended into abstract formatting model that would allow applications to format correctly dates written in many unknown calendar systems.
Proposal to support the Julian calendar similarly to CFPS 17
Proposal for a Julian calendar with years starting on 1 Jan
Proposal to add style to the wholly-numeric representation of dates in CFPS 13
Proposal to separate presentation from representation in calendars in order to avoid a proliferation of calendars.
Proposal for compound calendars to resolve a difficulty with default calendars
Proposal to allow the default calendar to be dependent on the date.
Proposal for a Generalised Dual-Date Representation
Proposal for a generalised dual-date representation that applies to multiple calendars
Proposal to Accommodate Alternative World Calendar Systems
Proposed adoption of a date syntax applicable to multiple world calendars, both historical and modern-day.


A question I have heard before is why those uncertainties and inaccuracies should be relevant to genealogists. What difference does it make if you’re a few days out, or a month, or possibly even a year? I entirely disagree with this thinking. Even if you’re only building a family tree then the relationships and vital events might not be supported by direct and non-conflicting evidence; there may have to be some interpretation, and some correlation with information from elsewhere in order to justify them.

A bigger question people may pose is why the historical calendars are of interest to genealogists. After all, there is at least some agreed synchronisation between the six principal calendars that are in use today. Not many people can trace their lineage back to, say, Caesar. Well, even historical characters had lineage, and family history, so whether you’re studying modern genealogy or ancient genealogy should be irrelevant. More than this, though, I do not consider genealogy (including family trees and family history) to be a special case that needs its own standards and methodologies. It is a form of micro-history, which in turn is a form of history. The information that we uncover and analyse in our research does not come from a world of its own, and it cannot be considered in isolation. All those events — both large-scale and small-scale — relate to the real world, and will affect each other. Historical research needs a consistent scheme that respects the integrity of our sources and the information found therein. To suggest that software standards, or the Internet, or populist genealogy products, must stick to Gregorian dates would be a case of the tail wagging the dog.



[1] E. G. Richards, Mapping Time: The Calendar and its History (1998; reprint, Oxford University Press, 2005), p.99.
[2] Data elements and interchange formats — Information interchange — Representation of dates and times, International Standard, ISO 8601:2004(E), 3rd ed. 1 Dec 2004; online copy obtained from http://dotat.at/tmp/ISO_8601-2004_E.pdf (accessed 11 Jul 2015).
[3] Actually, version 5.3 also contained this feature but version 5.4 omitted it with the following explanatory statement: “The Lineage-Linked GEDCOM Form is restricted to Gregorian calendar forms. This version of GEDCOM chose not to support multiple calendars. The reason is that support of multiple calendars would require each receiving system to handle multiple calendar conversions”. Source: Tamura Jones, "FamilySearch GEDCOM Specifications", Modern Software Experience (http://www.tamurajones.net/FamilySearchGEDCOMSpecifications.xhtml : accessed 19 Jul 2015).
[4] Bob Coret, "Usage of calendars in GEDCOM", Bob Coret in English, posted 5 Feb 2015 (http://blog-en.coret.org/2015/02/usage-of-calendars-in-gedcom.html : accessed 19 Jul 2015).

Sunday 12 July 2015

A Calendar for Your Date — Part I



In our everyday world, a calendar is generally something that you hang on the wall, and by which you count off the days until some birthday, a vacation, or the next public holiday. A calendar is a much bigger concept than this, though, and it affects both genealogists and historians in ways that we don’t like to think about since they complicate our worldview.

So what is a calendar? A calendar is a mechanism by which dates are reckoned in a given culture. Historically, that meant that it allowed the passing of days to be recorded and so the return of the seasons or astronomical phenomena to be predicted.

There are six principal calendars in current use: Gregorian, Jewish, Islamic, Indian, Chinese, and Julian,[1] but a list of many historical calendars can be found at: List of Calendars. If we encounter a source with a date expressed according to one of these calendars then how should we represent it? Before I try and answer that, I want to give a small tour to illustrate how complex the subject is.

Calendars can be based on natural cycles, such as astronomical events, or contrived (man-made) cycles. The main astronomical systems are:[2]

Lunar. Calendars based on the counting of lunations: cycles of the phases of the moon. The average lunation is now known to be 29.530589 days, but historically it wasn’t known to that level of precision. Addition of an extra day, to get things back in step, might have been done on ad hoc basis rather than according to some rigid rule. Other problems — other than the natural fluctuations, and observing phase changes in bad weather — include the fact that the first instant of a new phase depends on both the latitude and longitude of the observer. The Islamic calendar is the only modern-day example.

Lunisolar. Calendars where the cycles of the moon and of the year (i.e. seasons) were combined and extra months occasionally added to keep them synchronised — sometimes relying on an arbitrary decision by a local priest rather than a formulaic approach. The Jewish, Chinese, Japanese, and Indian calendars are examples.

Solar. Calendars based entirely on the cycle of the sun, and abandoning those of the moon. Although our modern Gregorian calendar has months, these are not tied to the phases of the moon and so it is a solar calendar rather than a lunisolar one. Other examples include the Egyptian calendar, which had a fixed 365-day so-called “wandering” year, and the Julian calendar which had a fixed 365.25-day year.

The root of many problems with calendars based on natural cycles is that there is no fixed integral relationship between the periods of astronomical events such as the rotation of the earth about its axis (a day), the phases of the moon (a month), and the rotation of the earth about the sun (a year): their relationships are both fractional and continually varying. In fact, we cannot even say that the length of a day now is the same as it was at some point in prehistory because there was no common yardstick by which to directly compare them.[3] We can extrapolate our understanding of the motions within our solar system back to ancient times but they’re the predictable motions. If some asteroid had once passed close by the earth then it could have had a significant effect on the mean solar day. We’re currently aware of these fractional relationships by virtue of the leap-year where an extra day is inserted into our calendar every four years — except if the year is a multiple of 100, and it is not a multiple of 400. This process of adding a day (in general: intercalation) is simply trying to keep our notions of a day and a year in step.



Although several instances of contrived cycles have once existed, our most familiar modern-day example is the seven-day week.

The years themselves may be counted from the beginning of some reign or from some important or regular cultural event. Years counted according to the reign of some sovereign, monarch, or pope are termed regnal years. Even now, Acts of Parliament, in England, might be dated such as 3 Elizabeth II, meaning: third year of the reign of Elizabeth II. In ancient Greece, years were sometimes numbered according to the Olympiad: the four-yearly period between their successive games. The Japanese nengo system counts years according to a number of eras, originally determined by court officials but later determined by the accession of emperors (similar to the old Chinese dynasties). Our own modern year numbering is according to the Christian Era (AD/BC), also known as the Common Era (CE/BCE), although this has only a single epoch. Years in the French Republican calendar began with the Republican Era: the first year of the republic. Years may sometimes be named rather than numbered, as with Roman ones based on the name of a consul, or Greek ones based on the name of an archon. Knowledge of the associated epochs — the starting points for the counting — in some alternative calendar system is therefore a prerequisite for accurate conversion to it. This knowledge becomes less known as the importance of the epoch becomes more minor. In England, the records of manors held by the church sometimes used Episcopal or abbatial years. Even though the sequence of bishops or abbots is generally known, their exact dates may not be — especially in the case of minor abbots who did not sit in the House of Lords.

What the majority of the world uses today is known as the Gregorian calendar, named after Pope Gregory XIII who introduced it via a papal bull in 1582 as a replacement for the Julian calendar. The Julian calendar used a fixed year of 365.25 days, but the average tropical year is more like 365.24219 days, and that meant that events were drifting very slightly behind each year. A major problem was that Easter was defined by the First Council of Nicaea (AD 325) in relation to the spring equinox, and after nearly 13 centuries of the overly-long year its date had drifted backwards by about 10 days of where it used to be. As well as setting a new average calendar year of 365.2425 days, the Gregorian calendar removed 10 days in order to move Easter back to where it was before.

I don’t want to dwell too long on this Julian-to-Gregorian transition during this particular post (I’ll cover that another time) as there are many other calendar systems. I want to illustrate the complexities of those calendar systems, and explain how converting between them is not an exact science. If we allow our modern Gregorian calendar to be used in a proleptic fashion, where it can be used for dates prior to its invention, then we find that there is some uncertainty in translating dates from those other systems to our modern system.

The Julian calendar was introduced by Julius Caesar in 46 BC as a reform of the earlier Roman calendar, supposedly introduced when Rome was founded by Romulus in about 753 BC. The Roman calendar was a lunar calendar with 10 months (December being the 10th month) and a year of 304 days. This was unworkable for farmers who needed to be more aware of the approaching seasons, and so Numa Pompilius (the second king of Rome), in about 713 BC, introduced two more months, thus making the year 354 days (12 x 29.5 days), although an extra day was added to make it 355 for superstitious reasons. This still drifted with respect to the solar year and so several schemes were devised to try and improve it using intercalary days or months. However, the decision of when and how-much were usually in the hands of the priests who shamelessly misused that power — when they hadn’t neglected the actual need — for political reasons (e.g. changing the term of someone’s office) or financial advantage (e.g. taxes, rents). When the Julian calendar was introduced, there were so many corrections to put things back in order that 46 BC was a year of some 445 days, often referred to as “the year of confusion”. In effect, there are many calendar variations here, and some rather irregular intercalations.[4]

The Revised Julian calendar was a variation conceived in 1923 that allowed the years of the Julian calendar (still used by the Eastern Orthodox Church) and Gregorian calendars to remain in step — at least until the year 2799.

Although often overlooked, the extra day in leap years of the Gregorian calendar was originally achieved by repeating the 24th February; a practice inherited from the Julian calendar. It wasn’t until 1662 that it was achieved by adding an extra day at the end of February. Today, a residual repercussion of this is a difference in the date of the feast of St. Matthias during leap years as celebrated by the Catholic and Anglican churches.[5] Earlier still, different Christian churches in Romanised Britain celebrated Easter one day apart.[6] In effect, knowing the name of a celebration or festival doesn’t uniquely determine its date.

The Islamic calendar is the only surviving lunar calendar. It consists of 12 months of alternating 29 and 30 days over a 30-year cycle, except in embolismic years where an intercalary day is added at the end of the 12th month. There are 11 such intercalations and so each 30-year cycle has 10631 days (30 x 12 x 29.5 + 11), and this keeps the month in synchronisation with the moon. There are some variations of the actual years in which the intercalation occurs but it is essentially a rule-based calendar. It is sometimes called the Tabular Islamic calendar in order to distinguish it from the “popular” Islamic calendar where the actual start of each month is determined by a religious authority based on the first visibility of the crescent lunulae of the new moons. This empirical approach is obviously problematic as the appearance depends upon the weather and geographical coordinates, and the date can be one or two days different from the calculated one. It can therefore vary from one part of Islam to another.[7]

India’s calendars are particularly troublesome as they are “…intricate, complex, and subject to numerous local variations”.[8] Most are lunisolar but there are solar calendars too. When India became independent, in 1947, their first Prime Minister, Jawaharlal Nehru, set about a number of reforms, and one of these involved the Indian Calendar Reform Committee, appointed in 1952. They found that there were over 30 well-developed calendars in use across India, and they set about creating a unified Indian national calendar, which was adopted on 22 Mar 1957. However, India’s diverse population meant that the Gregorian calendar was still used (by Christians and for administration), and also the Islamic calendar. Indian and Gregorian dates are therefore presented side-by-side by The Gazette of India, in news broadcasts by All India Radio, and in calendars and communications issued by the Government of India.

In the older Indian calendars, the variations of the lunisolar ones included different month names, the date of the start of the year, the phase of the moon on which the year or month started, intercalation rules, and the era from which years were counted, thus making any attempt at general conversion to Gregorian dates “futile”.[9] However, the solar calendars, used in parts of Bengal and Madras, “were susceptible to almost infinite variations”, thus making it very difficult to convert an Indian date to a Gregorian one without specific knowledge of the calendar and locality.[10]

A similar story of innumerable variations applies to the Chinese calendars.

What I wanted to emphasise in this first post is that conversion from these other calendars to our Gregorian one is not simply a matter of looking at the written form and then doing some arithmetic. Even when the calendar system was rule-based, rather than dependent upon observation or ad hoc decisions, the conversion may need information about the actual calendar variant being used, their location, their religion, etc.

If an historian recorded a length in cubits then a very similar situation would arise since there were different cubit definitions used in different places. A write-up may give an indication of what that length might have been in feet (or metres) but the converted value would need substantiating, and it wouldn’t be a direct replacement for the original information.

To genealogists, a more familiar case might be how we handle dates and ages in our sources. Even if we encounter a date in our own calendar then it may be incomplete, or it may be secondary information. Although many genealogists treat them as “facts”, there’s no such thing in principle. With ages then we should be seeing the issue more clearly: we certainly don’t perform a simple subtraction of the age from the date of recording, and then use that date as a “fact”. Someone may have lied about their age, or it may have been age-next-birthday rather than age-last-birthday (as in early Canadian censuses), or it may have been rounded down (as in 1841 census of England and Wales). The essential point is that the calculated date is not information that was in the consulted source; it is derived from it, and it may need more information in order to make that derivation. Any calendar conversion is also a calculation, and similarly different from source information.

In Part II of this post, I want to look at the representation of dates from other calendars, and why a general scheme should be important to us.



[1] “Introduction to Calendars”, USNO (http://aa.usno.navy.mil/faq/docs/calendars.php : accessed 10 Jul 2015), first paragraph.
[2] E. G. Richards, Mapping Time: The Calendar and its History (1998; reprint, Oxford University Press, 2005), pp. 92–97.
[3] An examination of fossil records has shown that a year once consisted of nearly 400 daily cycles, but finding the length of one such cycle is harder to determine. Observations have shown that the earth’s rotation is slowing down, and calculations suggest that it was once substantially faster — some reports as low as 16 hours per day. That would result in a greater centrifugal force acting against gravity, and things would probably have felt lighter millions of years ago.
[4] David Ewing Duncan, The Calendar (London: Fourth Estate Ltd, 1998), pp.40–43.
[5] Richards, p.101.
[6] Duncan, pp.104–5.
[7] Richards, p.93, p.234.
[8] Richards, p.174.
[9] Richards, p.182.
[10] Richards, p.177.