Unstuck in the Mud: Concrete Tasks for Forward Motion

Last time, we talked about the first steps we were taking to identify the workflows and procedures that need to be teased out and codified as official policies for Special Collections born-digital materials. Over the last month our group has been revising existing policies and drafting new ones where necessary. Given the title of the post, it’s important to point out that one way to prevent getting stuck is to accept that first drafts can be rough-and-ready. It’s too easy to get hung-up on word choice and formatting when writing formal policy language—the key is to just start writing and iron out the details later.

Another common sticking point for working groups is the existence of (real or perceived) external dependencies. “We can’t finish our work until they finish theirs!” The Born-Digital Working Group consists of two active subgroups, Tools and Policies/Procedures, supported by a less-active Administrative group for when higher-level support is needed. As we drafted policies, it became increasingly obvious that many decisions depended on input from the Tools group and on the larger institutional capacity for digital preservation. What file formats can we safely claim to preserve? What physical formats are we equipped to work with? The Tools group was facing a similar quandary: How could they firmly select tools without knowing the workflows (i.e. procedures) and the policy requirements that the tools need to meet?

Although the sub-groups were partly created to avoid the difficulty of finding large blocks of time compatible with diverse schedules, there was no helping it—the two groups had to spend some time together to solve these issues. With deadlines looming, we scheduled a joint meeting for April 1, informally dubbed “The Conclave.” No one was allowed to leave until we resolved the problems and had clear tasks to see us through to our All-Hands meeting in May.

Some of the topics we discussed included a debate over the creation of disk images and how we can clearly articulate the implications of this process to donors; the need to determine more precisely what file formats we will commit to migrating over time; and the identification of multiple work spaces for files that would ensure the appropriate levels of access. Perhaps the most helpful exercise was a detailed walkthrough of theBitCurator workflow, led by Porter Olsen and drawn on the whiteboard by Joshua Westgard. Doing this clearly illustrated where each group needed input from the other, allowing us to break off and tackle specific problems for 30 minutes before reconvening to wrap-up and assign concrete next steps.

workflows

One of the outcomes of this three-hour discussion was a realization that we need to more clearly define the goals of a Special Collections born-digital records program. Drawing on previous surveys of staff needs and expectations we have begun to concisely define the staff and system needs for working with these materials. We have also been coordinating more closely with Jennie Knies in the UMD Libraries’ Digital Stewardship unit to translate the collection requirements into technical requirements. This will be used by our IT developers to examine software and tools. The goal is to make sure that we have documented our full vision, regardless of technological limitations (or possibilities).  Our development team will then work to identify the best way to implement this within the context of our other requirements, and our existing systems.  We have created a spreadsheet that attempts to tie all of this information together, linking requirements, potential tools, workflow stage, policy, and priority.

spreadsheet

We will all come together as a group in May to see what we have accomplished over the course of the semester. Sometimes it seems like we have been spinning our wheels, but the Conclave helped pull us out of the mud and sharpened our focus and priorities for the future.

Born-Digital Working Group: Configuring FRED

Submitted by: Eric Cartier, April 5, 2013FRED at UMD Libraries

In mid-March, the Tools subgroup met FRED, our Forensic Recovery of Evidence Device. The subject lines we’ve shared since then (e.g., “tinkering with FRED today”) reflect the approach we’re taking:  careful, playful, open-minded. We marveled at all the ports, laid out and photographed the various cables and adapters included in the toolbox, and took turns at the keyboard. There was much to do before any imaging occurred, though.

We spoke at length about network security, viruses, connecting to the Internet, and safeguarding personally identifiable information, which we’re sure to obtain in future images we make. Porter noted that Digital Intelligence, the company that manufactures FRED, assumes that one will connect the machine to the Internet, while Josh played the devil’s advocate, acting Thomas Pynchon-paranoid. The immediate action we took at the conversation’s conclusion was to connect to the Internet via a USB network adapter to install Microsoft Security Essentials. Next we updated all the Windows, Adobe, and Java applications. A clean machine, we agreed, should be virus protected and fitted with all the latest software updates.

The FRED system has two drives, one of which is dual partitioned into Windows 7 Ultimate (64 bit) and Win98 DOS. This is the operating system environment we initially worked in, where we made other essential downloads including BitCurator and Oracle VM VirtualBox. Later, because BitCurator is native Linux, we chose to install SUSE Linux 12.1 on FRED’s empty DATA drive.

FRED accessories

Returning to Windows 7, the first device we connected to the UltraBay 3D Hardware Write-Blocker was Digital Stewardship’s 2 TB external hard drive, which contained images of some media from the Bill Bly Collection. Tableau Imager didn’t recognize it, nor did it register a 2 GB thumb drive that we inserted in the USB 3.0 port, although each device was visible on the list of the computer’s drives. Reading through the text-based instructions again, we discovered that the UltraBay has a power supply independent of the FRED tower (Digital Intelligence does not include diagrams or screenshots in its instructions), which, once turned on, allowed us to image the thumb drive. No matter which target directory we selected, however, the external hard drive repeatedly failed to image, due to lack of storage space. Tableau Imager offers EnCase E01 and Raw Disk dd imaging options, both of which are set to capture all the bits, so 2 TB was a bit much to ask of the machine.

Our progress configuring FRED has been fun and sometimes frustrating, but always steady. Over the next couple of months, our goal is to attempt to image every imaginable format on FRED and our BitCurator Digitization Workstation. Which system, with which software (BitCurator, Tableau Imager, FTK Imager), works most effectively? Learning what’s possible to accomplish with our equipment will be a beneficial exercise to complete before the arrival of our National Digital Stewardship Residency fellow in September.

Born-Digital Working Group: Policies and Procedures for Collecting Born-Digital Material

WordItOut-Word-cloud-169620

The Born-Digital Working Group, Policies and Procedures subgroup, has spent February examining the changes we will need to make to existing policies to accommodate born digital material. The goal of the subgroup over the course of the next few months is to:

  • examine current Special Collections policies such as collection development policies, donor agreements, and the UM processing manual
  • review policies that consider born-digital or electronic media at other institutions, especially within the AIMS project
  • create modular policies and agreements for the UMD Libraries that consider born-digital media
  • identify the input we will need from the Administrative and Tools subgroups that will determine the content of some of the policies.


Special Collections does not currently have an overarching collections policy. Instead each subject area within special collections has smaller, separate policies, none of which specifically address collecting born-digital material. Our subgroup will develop a policy for born digital material that will provide Special Collections staff who are working with donors a clear understanding of our capability to provide long term stewardship of digital material. It will also give guidance on the type of information that should be gathered at the early stages of donor development.  We expect that we will draw heavily on the born-digital sections of other institutions’ existing policies.

Examining the existing donor agreements at first glance seems to be the most straightforward aspect of our work. Special Collections uses a standardized deed of gift form which is modular in format and takes into account various rights, privacy, and use restrictions. We plan to add points and revise current statements to consider born-digital media. However, some of the questions we need to reflect in the donor agreements include how born-digital material will be transferred or captured, donors’ preference in terms of files previously deleted but recovered in the transfer process to the library, the scope of what we can provide in terms of preservation of the born-digital material, and specific conditions on access to materials. Although the donor agreement seemed the easiest place to start it become clear that establishing what we can and are willing to collect (i.e. the collection policy) is the critical first step for this group. It’s also clear that we need to work closely with our tools group to understand what will be technically feasible at the University of Maryland. 

While part of the scope of this group will be making changes to the Special Collections Processing Manual it is already clear that this will happen much later down the road once the tools group has made recommendations for ingesting and accessing born digital materials. 

Fortunately, we are not the first to begin work on these issues and we will be relying heavily on the work of other institutions. Our first steps are to examine the following resources: 


The BDWG has started it’s work in earnest at this point and it’s the questions we need to answer are becoming more clear. Our FRED (Forensic Recovery of Evidence Device) has arrived so soon we will be able to start thinking more concretely about workflows and procedures.

Alas, poor Metadata!

Submitted by: Jennie Levine Knies, February 8, 2013

The Born-Digital Working Group has already undergone a radical change since the last blog post.  Originally, the group members divided into four subgroups in order to tackle the different aspects of the born-digital workflow.  We are now three.  RIP Metadata subgroup. The original intent of the Metadata subgroup was to look at everything needed to create a properly-described submission information package (SIP).   The group met on January 28 and quickly discovered that it was both very easy and very difficult to talk about this topic in a vacuum.  We discussed the redundancies not only between our work and the work of the Tools subgroup, but also with future decisions about access to content.   After much soul-searching, and a confusing white-board diagram involving a monkey, a hat, and a floppy disk, we suggested folding the Metadata subgroup into the Tools subgroup and focusing more on the initial acquisition and processing of born-digital content. Understanding the digital files and how to accession them on the digital shelf is our first real challenge.

The Tools subgroup will be using two different types of workstations to develop workflows to image, analyze, and prepare the born-digital content for submission into our repository.  In the non-digital world, the work of the Tools subgroup equates to picking up archival materials from a donor, moving them from the garbage bags in which they were stored to clean records-center cartons, assigning an accession number, and describing them enough that a basic accession record can be created.  We envisioned the work of the Metadata subgroup picking up at this point – at the point where the archivists would appraise, describe, place in context, and arrange the content. This is where the monkey and the hat come into the picture.

http://hdl.handle.net/1903.1/9024

The Beast from Ryder, Djuna Barnes, 1928 (http://hdl.handle.net/1903.1/9024)

The University of Maryland currently uses a home-grown system for capturing archival description. The “monkey” is a Microsoft Access database fondly referred to as “The Beast,” into which Special Collections librarians enter all of their archival description into convenient forms, where it is then extracted using a Java-based script into a neat EAD-encoded archival finding aid and distributed online via ArchivesUM.  The Beast allows for the basic metadata collection allowed by EAD – we gather series, sub series, box, folder, title, dates, physical description, and restriction information at the folder level, and occasionally at the item level.  The “hat” is our Fedora-based Digital Collectionsrepository.  In a separate workflow, the University of Maryland is creating digitized content and ingesting it into our digital repository.  The Digital Collections descriptive and technical metadata are also home-grown (something we hope to migrate out of in the not-so-distant future) and also much more detailed than what you might find in a traditional EAD finding aid.  Like the archival collections, some material is described at a folder level and sometimes at an item level, but item-level description is more common here.  Currently, the two systems do not talk to each other.  We developed a process to ingest the EAD finding aids into our Fedora-based Digital Collections at the time of ingest into ArchivesUM. But what is searchable in Digital Collections for the EAD finding aids is really just a collection-level record.  As a side note, the University of Maryland Libraries also host an institutional repository (DRUM), which is entirely separate and based on DSpace.   DRUM already houses a great deal of born-digital content, and the distinction between what is there and what is collected by our Special Collections may be growing less clear.  We also have large amounts of data (both digitized books and web archives) currently stored in the Internet Archive, not to mention descriptive metadata in our catalog, that ultimately will need to be integrated with our other digital content.

Where do born-digital materials fit into all of this?  Like the rest of the five linear miles of archival collections at the University of Maryland, these items are part of archival collections, just in newer formats.  Like the content in Digital Collections, they are digital, the difference being that they are not surrogates of analog items.  Should born-digital materials be described in an archival finding aid? Should they be discoverable and viewable in some way in their native environment? Yes. Will our staff and users be happy about having to learn how to use another silo system to keep track of born-digital materials? Probably not. And this is why we dissolved the Metadata group.  Until we know what our initial analyses and boxing/packaging process is capable of returning to us, it is a little difficult to envision by what means the archivists will be able to describe the material.  Parallel to the work of the Born-Digital Working Group is the expectation that in the next two years, the University of Maryland Libraries will migrate out of their home-grown system for archival finding aids, and move to something more widely adopted, most likely ArchivesSpace.  When that happens, more dynamic automated linking between Digital Collections and the archival management tool will be developed.  Thinking holistically, managing born-digital content needs to fall into that workflow somehow. We still envision that the Tools subgroup will gather some requirements that will really fall more into the area of archival description, and we still plan to do some experimentation with tools that allow for metadata gathering, such as BitCurator,Archivematica and Curator’s Workbench, to better understand how these work and what parts of the workflow they might help us to capture. Is this the right approach?  After much thought, it feels more manageable to us, and anything that keeps us from feeling paralyzed or overwhelmed is a step in the right direction.

The Born-Digital Working Group Divides and Conquers

Back in October, we introduced the MITH/UM Libraries Born Digital Working Group (BDWG) with a post about processing the Bill Bly Collection.  Since then we’ve firmed up our goals (“start collecting/working with diverse born digital materials in the libraries”  being a bit nebulous and… huge) and divided ourselves into sub-groups to conquer them. Goals and groups decided upon, we’re going to try to give bi-weekly updates on our work, cross-posted to the MITH and Special Collections blogs. We’ll be cycling through the groups to ensure every area is covered; those areas are: tools, policies/procedures, metadata, and administration.

Tools
Originally called “Technology/BitCurator/hardware/software/tools,” this subgroup is dedicated to pre-processing work–everything that happens before an acquisition is deposited in the digital repository. The Tools group is led by Jennie Levine Knies and includes Amanda Visconti, Eric Cartier, Matt Kirschenbaum, Porter Olsen and Rachel Donahue.

Policy/Procedures
Dedicated to developing the many guidelines necessary to implement new digital workflows in the libraries. The Policy/Procedures group is led by Joanne Archer and includes Caitlin Wells, Daniel Mack, Rachel Donahue, Robin Pike, and Trevor Muñoz.

Metadata
Dedicated to data about data. Specifically, this group will look at everything that’s needed to create a properly-described submission information package (SIP). The Metadata Group is led by Joshua Westgard and includes Eric Cartier,Jennie Levine Knies, and Rachel Donahue.

Administration
Dedicated to providing the high-level support needed by change agents everywhere. Administration was originally lumped in with Policy/Procedures, but we broke it out to keep things specific and manageable. The Administration group is led by Trevor Muñoz and includes Daniel Mack, Jennie Levine Knies, Joanne Archer, Matthew Kirschenbaum, and Rachel Donahue.

As you read our posts in the future, bear in mind that we’re essentially starting from scratch. We’re unlikely to have anything amazingly groundbreaking to share, but we hope that being transparent about our work might help other organizations undergoing similar changes.

We Descended: Processing the Bill Bly Collection with the UMD Born-Digital Working Group

In early September of 2012 the University of Maryland Libraries and the Maryland Institute for Technology in the Humanities (MITH) joined forces to launch its Born-Digital Working Group. As a vehicle for leveraging some rapidly emerging institutional strengths in processing and maintaining born-digital collections, as well as conducting research around the challenges associated with those activities, the group was a natural way to give those efforts some internal structure and coherence. It also formalized a relationship between a university research library, in particular its Special Collections department, and a working digital humanities center. We are excited for this collaboration, one in which we can each learn from the other, bringing different sets of skills to the table to begin tackling the issues of stewarding born digital materials.

One area where work has already begin is in processing the papers of Bill Bly, who joins Deena Larsen as an early hypertext and electronic literature pioneer whose manuscript materials and collections of computer hardware and software are housed at MITH. The Bly collection includes a complete run of titles from the innovative hypertext publisher Eastgate Systems (still in their original packaging), as well as his own personal collection of hypertext fiction and ephemera harvested in the wild, and important records associated with community events like the 1999 CyberMountain conference.  For MITH, the collection tantalizes us with questions related to our research agenda in textual scholarship and media archaeology. (Intriguingly, Bly is perhaps best-known for his ongoing fiction series We Descend, which features an archivist as its protagonist.)

ImageOne defining feature of the Bill Bly collection is the eclectic nature of the objects that it contains. In addition to the papers commonly found in a collection, the Bly collection also includes two laptops used by the author while writing We Descend; software manuals that range from the highly-specialized Hypercard hypertext authoring software to basic Mac OS user guides; vintage keyboards and mice; and loose floppy diskettes. Making appraisal decisions are key to archival work. What do we keep? What do we discard? These questions quickly arose in the Bly papers as well. The Born-Digital Working Group had a sustained discussion about what to do with the software manuals included in the Bly collection. Were they an important part of understanding the technological environment in which the author worked, and therefore essential to the collection? Or, because they are mass-produced books of no specific connection to Bly’s work, could they be moved into a reference library with a separation sheet indicating that they were originally contained in the collection? While we haven’t yet come to a definitive answer, it seems clear that similar conversations will become increasingly common in the archival community.

ImageGone are the days when an author’s papers are actually, well, papers. In addition to pens and typewriters, we now use keyboards, mice, even dictation software as the means by which we write. Will no author’s collection be complete without a functioning vintage system running the same version of Dragon Natural Speaking that the author used? Or, will practical constraints compel us to a different solution?

For Special Collections, the Bill Bly Papers allows staff to test-drive procedures for dealing with material of this type. As we started work on the papers last week our goal was to establish basic intellectual control over the collection, starting with the paper portion and moving to the electronic. We wanted to begin to understand where our accessioning procedures would be impacted by the existence of born digital material. As Special Collections is at the very beginning stages of dealing with born-digital material, working with the Bly papers allows us to begin conceptualizing how procedures and workflows for hybrid collections of papers and electronic materials might be different.

Thinking about all the steps involved in processing a hybrid collection can seem overwhelming. Luckily, projects already exist to guide us through this process and we are consulting both OCLC’s report “You’ve Got to Walk Before You Can Run: First Steps for Managing Born-Digital Content on Physical Media” and the AIMS white paper on born digital collections. Looking at the guidance in these reports in concert with our very first baby steps on this project a number of question have arisen already:

  • Does the legal agreement with this donor adequately cover digital material? How will we need to modify special collections donor agreements to cover born-digital material?
  • Our inventory consisted of recording of the number and type of disks. Clearly, this is not truly accessioning these materials. What tools are available that will allows to accession the files on those disks? Will we bother dealing with commercial software programs?
  • Will we use forensic imaging or simply copy the files?
  • Is the media important? Should we keep it, take photographs, or simply dispose of the media once the data is captured?

What we have already learned from this project is that we don’t have to know all the answers at this point and that we shouldn’t expect to. The collaboration between these two institutions allows us to experiment, to raise questions, and then seek the answers together. We recognize  a primary goal of this project is to more clearly define our capabilities and begin developing a born digital policy as the project develops. At this stage in the process, the questions truly do seem more important than the answers, which only adds to the value, to what we can learn by collaborating to examine the issues demonstrated within the collection of an early adopter e-literature author such as Bill Bly.