Look at all the people…

A few months ago, one of my colleagues, Paul Hammer, a software developer with the UMD Libraries’ Software Systems Development and Research (SSDR), stopped by my office and mentioned to me that something in one of my recent blog posts was bothering him. Specifically, it was these two sentences:

Unfortunately, unlike our dependable analog collections, keeping track of all of this digitized content can sometimes be unwieldy.   One of my big goals is to reach the point where an inventory of these digital collections can provide me with the equivalent of a “Shelf location” and statistics at the push of a button.

Paul reminded me that a lot of human effort, management and coercion went into acquiring, tracking, cataloging and circulating information in the analog world.  If the staff, managers and profession were not diligently encouraging librarians, archivists and other professionals into using similar standards and practices, then no two collections would be remotely comparable.  He noted: “We need to recognize that this effort is just as big and difficult in the computer world.  Computers do not do all of this work for you regardless of how much we wish out were otherwise.  Computers just offer a really big room of shelves on which to put things and the ability to program helpers.  Helpers who are only capable of doing *exactly* what you ask of them — at nearly speed of light.”

I want to thank Paul for putting things in perspective.  First, his comments reminded me that Rome was not built in a day. Second, as Paul, and many of the recent projects I have worked on have shown, computers will only do exactly what you tell them to do and only contain as much logic as the human provides to them.  Third, I think that it is safe to say that standards and best practices are even  more important in the digital world than in the analog.

Last year, the UMD Libraries received funding for a project to digitize a portion of correspondence written by the American author, Katherine Anne Porter, whose papers reside at the University of Maryland.  What seemed at first to be a straightforward project turned into quite a complex and interesting one that is still not 100% complete.  At least a dozen UMD Libraries’ staff participated in some portion of the project, not to mention external parties such as our digitization vendor.  Joanne Archer in Special Collections and University Archives (SCUA) managed the project.   Two content specialists within SCUA (Librarian Emeritus Beth Alvarez and PhD candidate Liz DePriest) selected the approximately 2000 letters for the first phase of digitization.  Robin Pike, Manager, Digital Conversion and Media Reformatting (DCMR), facilitated the contracts and negotiation with the digitization vendor.  The correspondence was digitized in eight batches, and Special Collections staff had to prepare metadata for every letter, and prepare the packages for delivery.   Once digitization was complete, Eric Cartier (DCMR) performed QC on all of the deliverables (TIF, JPG, OCR text and hOCR xml).  Trevor Muñoz, Assistant Dean for Digital Humanities Research, used the raw data to develop several proof-of-concept possibilities for future data use and analysis.  Josh Westgard, graduate assistant for Digital Programs and Initiatives (DPI), facilitated transfer of the files for preservation.

And that is not all.  Fedora as a repository is an excellent example of a computer system that needs to be told exactly what to do.  We have not, to date, added any complex objects of the type of these letters (digital objects represented by an image, an OCR file, and an hOCR file).  DPI gathered the requirements for this new object type (UMD_CORRESPONDENCE) and delivered them to Software Systems Development and Research (SSDR).  Ben Wallberg, Manager, SSDR and two developers, Irina Belyaeva and Paul Hammer, worked to translate those requirements into reality.  What followed was a period of testing and analysis.  Likewise, we currently add content to our Fedora repository in three ways: 1) one-by-one using a home-grown web-based administrative interface and 2) using project-specific batch loading scripts that require developer mediation, and 3) using a batch loader developed by Josh Westgard in DPI that currently only works with audio and video content. For the Katherine Anne Porter project, logic dictated that we go with Door #2, and use a project-specific batch loading process.  In this case, SSDR and DPI agreed to use this as an excuse to develop and test an alternate method for batch ingest, with an eye towards developing a more generic, user-driven batch loader in future.

Irina and Paul worked on the batch loader for Katherine Anne Porter, and, when it was ready for testing, we ran into a series of minor, but educational complications.  First, it was necessary to massage and clean-up the metadata much more than anticipated, since SCUA had been using the spreadsheet to capture more information than needed for ingest. Second, other types of metadata errors caused the load to fail numerous times. This led, however, to the development of more rigorous validation checks on the metadata prior to ingest.  After the load was complete, I worked with Josh Westgard to analyze the success and we uncovered additional minor glitches, which we will account for in later loads.

The work is not complete.  The letters are ingested, but not viewable.  We still need to make changes to both our back-end administrative tool and our front-end public interface in order to accommodate this new content type.  And who knows what other types of user needs and requirements will necessitate additional work.  The data itself is rich and interesting.  Our hope is that it will be used both by scholars conducting traditional types of archival research as well as digital humanists interested in deciphering and analyzing the texts by computer-driven means.

This spring, Digital Systems and Stewardship hired its first ever Project Manager.  Ann Levin comes to the UMD Libraries with years of experience working on systems much more complex than our own.  As is obvious from the project description above, all of our work currently touches many different people with different skills and priorities within our organization.  It is our hope that we can start to formalize some of this work, develop more consistent workflows, and develop policies and procedures that ensure adherence to specified best practices and standards moving forward. The work has already started.  As Paul correctly pointed out to me several months ago, working with computers requires just as much, if not more, human involvement than some of our analog work. Planning is key. One reason the word “digital” causes instant anxiety for many people is that just as things such as access and indexing can move much more swiftly in a digital system than analog, it is also possible to entirely eliminate data instantly.  Paul provided this analogy:

Imagine an archive where everyone working there had the power to empty and restock the shelves with a wave of their hand.  That any given shelf could suddenly disappear.  That a box that used to be really popular can still be taken off the shelf but we have forgotten how to open it.  All of these things are all too possible in digital storage.  Think of the extra vigilance necessary just to know that what you have is really what you have.

Scary. But my original sentiment remains the same. With every new project, we move closer towards trusting our work, and reaching a point where creating, managing, and providing access to digital content really can seem as simple as the “push of a button.”  We just need to recognize all of the work, effort, and vigilance that goes into creating that single button.