Saturday, 19 December 2015

Organising Photographs



The question of how to organise your photographs in your file store, or even your general genealogical files, is a frequent one. Everyone has their own preferred scheme, but I want to try and add a different perspective on this subject. The suggestions I will make will use Windows as an example, but similar techniques will be possible elsewhere.

The ultimate brick wall that everyone will stub their toes on is that there are multiple ways of categorising their photographs, and a simple filename cannot adequately embrace them all. For instance, naming them by person (but which person if there’s a group), or by surname (again, which surname in, say, a wedding group), by event, by place, or by date. Every attempt to achieve this using just the filename will be a compromise of some sort.

There are different ways of grouping the same pictures
Figure 1 - There are different ways of grouping the same pictures.

I have written, before, that my own choice is to group the files by their provenance, and then rely on a software application to present them in other ways, and to associate them with descriptions, stories, timelines, and so on: Hierarchical Sources. There are some issues with using a specialised application, but we will come to that in a moment.

Keywords

Another option is to use keywords, such that each picture can have an arbitrary number of keywords relating to personal names, surnames, places, events, dates, or whatever you want. This is good for finding related pictures in a large set, but is it ideal for organising them in a browsable way? Although products such as Adobe Photoshop Lightroom organise pictures by keyword, this is just another type of specialised application, and so has the same issues that I hinted at above; the keywords really need to be a core feature of the operating system.

Windows has a feature called Tags which are effectively user-defined keywords that can be added to your files, but prior to Windows 7 only Microsoft office documents supported them, and there weren’t many tools to make use of them. Before I talk about them, let me first present a bargain-basement equivalent that would work under, say, Windows XP (yes, there are still people who use XP). For illustration, let’s assume we have a folder with the following three images files in:

Ann_Jones_1985.jpg
Jane_Smith_1980.jpg
Joan_James_1983.jpg

By dividing-up a filename using a character such as underscore, you’re effectively providing sets of keywords. Their order is not really important as files can be found no matter where the relevant keywords appear. Words can be grouped together to create compound keywords by using a different character, such as Joan-Smith_John-James_Marriage_1970.jpg.

The old Windows XP Search box could search on multiple filename parts, separated by semicolons, and this would achieve a search-by-keyword.

Searching by keyword under Windows XP
Figure 2 - Searching by keyword under Windows XP.

In this example, the search is looking for all files with either “Jane” or “James” in their name. The actual ordering of the keywords in a filename might be chosen so that the default sorting achieves some vaguely useful grouping.

So how did this change under Windows 7? For a start, its new-style Search box allows Boolean operators so that you can now type “Jane OR James” (equivalent to the XP example, above) or “Jane AND James” (for which none of the three example files would have matched).

Another change in Windows 7 was that the support for file Tags was greatly increased. The file Properties dialog, on its Details tab, will show a Tags field if the current file-type supports them. Clicking to the right of the Tags label allows you to enter multiple keywords, separated by semicolons, and these are then hidden away inside the file’s meta-data.

Entering keyword Tags in Windows 7
Figure 3 - Entering keyword Tags in Windows 7.

In the Search box, where we had previously typed separate filename parts, we can now use terms such as “tag:Jane”, and it will then search for files with those Tags rather than ones with particular words in their filenames.

Searching by Tags in Windows 7
Figure 4 - Searching by Tags in Windows 7.

Again, we can use the Boolean operators to say something like “tag:Jane OR tag:James”. OK, so what are the advantages of this scheme over the bargain-basement one using just the filename? Both schemes allow Boolean operators, and both operate case-blind. However, those Tags are discrete items of meta-data and so leave you to name the file any way you want. Also, the Tag names are matched as complete words and so there’s no risk of an accidental match, such as “Ann” matching “Anne” and “Anna”, etc.

Windows 7 also allows you to sort your files by their Tags — look on the View menu, under Sort by — but keywords are still primarily a way of finding content rather than presenting it. If the advantages are so great for organising pictures, or any files, by their provenance, and for relying on a specialised application to present them in a much richer fashion — with the added context of stories, timelines, and so on — then why don’t we all do it that way?

This subject came up in a Google Hangout in the DearMYRTLE's Genealogy Community, hosted by Pat Richley-Erickson (aka DearMYRTLE), on 19 Jan 2015. Twice during that Hangout — once at 15:00 into the recording, and then later at 35:40 — Pat made the astute observation that relatives (and especially the younger ones) will just want to browse some “cool old photographs” and not mess around with a specialised application. It’s sad but true that if there isn’t a description directly visible when they open the file then they won’t find the details. Remember that in the traditional family albums there would usually have been something written under each picture.

The technology is there to put a description inside each picture — in that same meta-data area where the Tags live — and this could even include an optional “wire frame” diagram that could be overlaid to identify individuals in the picture. That could have relevant links for each of those people to the data held in your specialised genealogy application. You would probably have to write your own picture-viewer application in order to see all that content, but you would then be back to the same problem again.

Proxies

When you click on a file, your operating system checks what application is registered for opening a file of that type. Although you may use the same application for all your image file-types, it is possible to make the association type-specific; for instance, using Microsoft Paint (mspaint.exe) for *.bmp files and Microsoft Office Picture Viewer (ois.exe) for *.jpg files. However, each association is fixed for a given file-type.

It is possible, though, to go via an intermediary application to make an intelligent choice for you. This would mean the vendor of your genealogy application producing a very tiny proxy application that looks at the image file you’re trying to open, and then determines whether to load it in the default image viewer (for that file-type) or in their own genealogy application.

Using a proxy viewer
Figure 5 - Using a proxy viewer.

I have written a sample C program that demonstrates this principle using alternative viewers for plain text files: proxy.c. This looks to see if the image filename ends in some genealogical identifier of the form: “ID-identifier” (e.g. Joan_James_1983_ID-1AF92G.jpg). It would be just as feasible to involve the folder path in its decision-making, or even looking inside at the file’s meta-data for an identifier there, but this scheme was simpler.

If this sample proxy finds such an identifier then it launches a specialised viewer with the arguments: <filename> <identifier>, and in all other cases it launches a default viewer with the single argument: <filename>. When configured correctly then those young relatives could happily click on images anywhere on the computer, and they would see it in the appropriate viewer depending on whether they’re part of your genealogy collection or not.

Yes, this would need some help from your genealogy application to ensure that your files have the correct identifier in their name, and hiding that information inside each file’s meta-data would be cleaner. What about the configuration, though? The proxy has to take over a number of file associations (one for each of the image types you’re interested in), and remember what their default viewers were so that it can invoke them when necessary. Well, that turns out to be quite easy: during installation, it would simply displace the existing default viewer for each file-type, and pass that to the proxy as either another argument or via a command-line option. This also serves as a way of saving the file-path of each default viewer. A later uninstall would then have those previous file-paths available so that it could put things back exactly as they were.

This approach can also be applied to non-image file-types, such as Word documents. This could make the difference between a machine that happens to hold your genealogy data, and a “genealogy machine”. Who knows, maybe someone will do this now.