Friday, 4 January 2019

Organising More Resources


Time for a rant since I keep on seeing the same old suggestions for organising digital resources that fail at "naming 101".  Organising by either person name, location, or date are not only restrictive but impractical and unhelpful in many circumstances; and trying to put these terms into file or folder names is a road to nowhere.



So, Tony, what is your problem here? Well, there are several issues:

  • There's a huge difference between the physical organisation of material (digital or otherwise) and the indexing according to your mode(s) of access.
  • Coding in a file or folder name forces you to make a limiting choice. The classic example is a group photograph that has several people of different surname and family connections.
  • Access isn't always by just one category of index-term: you may want, for instance, all resources related to Proctors, in the city of Nottingham, during the 1950s; or, all Jessons who were present at a particular event.

I've said before that physically organising by the nature or provenance of the material is not only the archival way, but it has advantages for maintenance, changes of ownership, and even making inferences (e.g. identifying a person in a photograph). Anyone who splits up an inherited photographic collection should have their fingers taped together. In contrast, indexing helps you access your material according to various categories, such as surname, location, etc.

In Organising Digital Resources, I described the difference between these two concepts, and how indexing for your mode(s) of access can be done according to multiple inclusive categories; you're not forced to choose just one. Unfortunately, although we have real archivists in our community, the tendency is still to miss the analogy between their professional organisational schemes and the digital world, and so oversimplify such things to coding surnames, etc., into file and folder names. Professionals in the digital world do not do this, and their schemes would also be the ones used by archives to implement their own schemes.

Perhaps the best arguments against the way things should be done are: (1) that your software of choice may be rather limited, or (2) that browsing the resources in the absence of any specialised software leaves them hard to understand.

In a much older article, Organising Photographs, I mentioned the use of meta-data, and how this could be used to add important information (visible only to software) to images, or to add index-terms to all file types (including images) in order to aid in their access from a simple Windows search. This is not unlike writing information (or source labels) on the back of physical photographs, except that software could use the digital equivalent to help access them. A major goal of that article was to show that digital resources could be indexed using very simple software technology available on all our computers, in contrast to using some highly specialised genealogical software. Well, some people would still not like the invisible nature of that meta-data, and it's still poorly supported by standards, and hence by different computer operating systems.

So let's explore the analogies with physical artefacts. If you had a photograph album then you would probably have written details underneath each picture. If you didn't have an album — just a biscuit tin on the top of your wardrobe — then you may have written details on the back of each picture. However, I have some WWI photographs of soldiers that were sent to their families as postcards. That means you cannot write on either side without damage to the precious original, and in that case you might just have a separate piece of paper, or better still an envelope, with the salient details written on it.

One alternative for digital  resources might be to use a simple non-specialised bit of software such as Excel, which uses multidimensional hierarchical indexes — effectively a scaled down version of an OLAP database. It's proprietary, yes, and it's opaque, yes, but it can be kept alongside the digital resources.

Because I arrange my own digital material hierarchically, akin to a micro-archive (see Hierarchical Sources), then I also use text files to describe the material at each level (e.g. for fonds, series, or even items), and this presents a very simple alternative that is a closer analogy to "separate paper" idea: buddy files. For instance, in a collection of photographs, I would have a single text file of a fixed name (e.g. Description.txt) with all the details of what the collection is, where it came from, when, and who had it before that. Alongside each photograph (i.e. at the item level) I often have a buddy file of the same name with not just a plain-text description of the where, when, and who, but tags that can be searched on.

For example, suppose I have an image of family photograph called Picture1.jpg then I might also have a Picture1.txt text file with tags as follows (descriptive text not shown here).

Figure 1 – Picture1.txt and Picture1.jpg

This has effectively three categories of hierarchical index-terms, and a search through the buddy files for #Proctor, #Nottingham, #1950s would throw up the name Picture1 whose associated image could be viewed unaided by any specialised software.

Here's another example for comparison:

Figure 2 – Picture2.txt and Picture2.jpg


How you name the item-level files is partly irrelevant, as long as they're unique at each level of organisation. For photographs, you could invent your own scheme of codes, similar to an archive, or use something more meaningful — it doesn't matter because anyone browsing the images directly would also have the buddy-file details on hand. For images of material obtained from an archive, though, and this would include any census images for England and Wales, I would strongly recommend using the assigned archival codes in the image names.

So, this is probably the simplest scheme possible, and it doesn't rely on hidden meta-data, or databases, or specialised genealogical software. Such software could still index your resources, as I've already explained, but this scheme provides your digital resources with plain-text notes and index-terms of their own — ones that would follow them if ever they were transferred elsewhere or duplicated for someone.

The organisation of your physical artefacts should follow the precedent set by archives, so why not do digital resources in a similar fashion. For instance, my extended family has a large stone chess board, originally seated on a wooden table, dated 1859 with the name of my ancestor carved into it, and with the initials of an in-law who was a stone mason. I may have images of it but the artefact has a fundamental existence of its own, and I would need to index this in my software. This may be unusual, but many of us have letters, certificates, ephemera, other original documents, and photographs. Older photographs were obviously printed and so scans are derivative, but modern photographs are "born digital" and so it's the printed forms that are derivative. Either way, keeping paper-based copies is always wise. Believe it or not, there are people who recommend scanning old photographs so that the paper copies can be thrown out — no taped fingers for them; I recommend those nice white jackets that fasten at the back. :-)


So wouldn't it be a little messy to select one of these textual buddy files from the search results, find the corresponding image file(s), and then open it? Well, no, not at all. It's extremely easy for some programmer to create a tiny program to do this for you, much like the code attached to the aforementioned 'Organising Photographs' article. Put simply, you could right-click on the buddy file you want, and select 'Open with <ProgramName>', and that program would find the image file for you and automatically open it in place of the text file. To be more bullet-proof, it would be best to use a special file type rather than *.txt (e.g *.meta), in which case the program could be registered as the one to always use for that file type, and you would merely have to double-click on the *.meta buddy file. There must be a commercial opportunity here.

[Free example code mentioned in Comments, and demonstration available on request]