I have been approached by the good people in the HSBC UK archives to help with a little project. The archives contain, amongst many other fascinating things, four boxes of war cards. Rachael Porter from the UK Archives team explains;
‘”Next year marks 100 years since the start of the First World War. Here at HSBC Archives we are keen to mark this anniversary and, in doing so, also showcase some of our records relating to this period. We have a set of index cards, naming Midland Bank staff members who went off to serve in the armed forces during the conflict. Each card records detailed information about the employee, and in some cases unique information about his service record, which cannot be found anywhere else. We’d love for members of these men’s families, and military history enthusiasts, to be able to access these records; and a project centered around digitizing them and making them available, whilst also tying in with the anniversary, would be a real achievement for us.”
The question is how to do that? And where to do that? To ensure people get the most value out of this amazing and important data as possible. I am after a bit of help.
The how/where? (a few quickly thrown together ideas)
The boxes contain approximately 5,000 index cards. They have typed field names and hand written details with things such as staff name, branch, rank, regiment etc. The back of the card includes notes on their time in the forces.
I think it would be great to scan the cards front and back and store them on a platform capable of allowing other people to transcribe and add to the data. The actual scanning is tricky/time intensive (and there may be no way round that) due to both the volume and the fact these are 100 years old and sticking them in a duplex scanner maybe a little risky. If anyone has experience of scanning index cards with these kinds of machines I would love to know more. Have you been involved in any projects that had to scan in a large set of paper based data?
A platform like the one built (in a week) for the Guardian MP’s Expenses investigation sprang to mind. This allowed masses of PDF scans to be uploaded and then provide some simple yet powerful tools for annotating and categorising the data. Unfortunately the site is no longer live but there are some good reads on how and why they built it.
Another good example of the genre is Old Weather, which published thousands of old nautical weather logs and asks people to transcribe them. Whether it is feasible for the project to build something to this level is debatable, small budget and short timescales as usual, but it would be great to try because it would become reusable for future data sets rather than just be a one off set of scanning and tagging of data.
I am looking for any platforms that support this kind of data load and amendment. At a basic level we could use Flickr to upload the images then use tagging, sets (or whatever they are called now), comments etc to try and build a usable and searchable set of data. Evernote was another tool that sprang to mind, to capture and attempt to transcribe the data but I am not sure if it really suited to this kind of task, especially if you wanted to build something else on top of it.
I am also looking for suggestions of other tools that would assist with this. Whether it be a platform like Flickr or a set of open source tools somewhere. Anything we can have a play with.
Also are there any organisations/people that specialise in this, other archives or museums for example. Any specialist military history sites? Once the data is scanned and annotated where should it live? Submitting the data set to the National Archives was a good suggestion by my colleague as it seems they have a nice looking API.
This was a very quick scrawl of ideas. The key for me is tools to help with the capture, storage and annotation of the data. Any help would be greatly appreciated. Feel free to get in touch via the comments below or Twitter if you prefer.
Update 14:30 29/05/13
I have had some lovely ideas shared via Twitter and on our internal blog at work. Mechanical Turk has had many mentions and I was foolish to not include it. Simon shared the brilliant looking open hardware book scanner. Please keep them coming you lovely people.
Update 10:30 31/05/13
I have had some pointers to Guardian staff that worked on the MP’s expenses system and an offer to help code such a system from a brilliant man who used to work for the Guardian. We may have a solution from our Central Scanning department in Coventry but we cannot test it until the week of June the 10th due to system access requirements, fingers crossed. Rachael has been in touch with the Imperial War Museum. Things are looking good.
Update 15:40 26/06/13
It has been slow progress over the last few weeks then some interesting things happened within a couple of days. First we have the go ahead and the means to scan the cards. This will be happening next Wednesday in Coventry, where we will be running the cards through these scanning beasts. We also had a great meeting with Luke Smith (see his great comment below for links to some interesting resources) from the Imperial War Museum who is working on the fantastic looking Lives of the First World War http://www.livesofthefirstworldwar.org/. We have also received some very good advice/pointers/introdcutions to interesting people from Kim Plowright. It is getting a bit exciting.