Fedora 4 Digital Repository Implementation Project

I would like to take this opportunity to formally announce the launch of the Fedora 4 Digital Repository project, which aims to implement a new system to replace our existing 7-year-old Fedora 2.2.2-based Digital Collections.  Fedora has come a long way in the last several years and we are very excited about the possibilities offered by the newest version.  Because the differences between our older version and the latest are so diverse, this is a more complicated project than a simple upgrade.

An initial project planning group consisting of myself, Ben Wallberg, Peter Eichman, and Bria Parker, have outlined our primary objectives for the project:

  • Leverage repository improvements provided by Fedora 4 application
  • Migrate selected existing services and applications
  • Develop new features

You may read more about Fedora 4 as an application here: http://duraspace.org/node/2394.  Our complete objectives document is also available for reading: Fedora 4 Objectives.

It is important to note that we are hoping that this new repository will reduce some silos in our portfolio, and be more than just a place to house metadata and access copies of select digital assets.   We are moving forward with an awareness of the importance of a system to not just house, but manage, our digital assets, and to allow for more flexibility over who, what, when, where, and how our staff and our users can work with our content.

At a practical level, some of the changes/improvements we hope to make include:

  • Replacement of existing Administrative Tools interface with a community-developed and maintained application, such as Islandora.
  • Batch ingest mechanisms that can be user-operated and integrated with the Administrative Tools
  • Replacement of current homegrown metadata schemas with standard schemas, such as MODS and PREMIS
  • More advanced content model, allowing description and control of objects down to the node level, rather than at the descriptive record level
  • Enhanced user-generated reporting
  • Flexible authentication and authorization controls

This is a major project, one that will take approximately a year although we have yet to set firm milestones or deadlines. In the meantime, we are ceasing any major developments on the existing Fedora repository, with exception of crucial maintenance issues. We have noted and categorized existing outstanding metadata sweeps and will handle those during the migration process.  We appreciate your patience as we work on the new system, which will be a most welcome improvement.

Stew of the Month: October 2014

Welcome to a new issue of Stew of the Month, a monthly blog from Digital Systems and Stewardship (DSS) at the University of Maryland Libraries. This blog provides news and updates from the DSS Division. We welcome comments, feedback and ideas for improving our products and services.

New Technologies

Peter Eichman led a DSS brown bag introducing and demonstrating Vagrant.  Vagrant is a developer tool to “Create and configure lightweight, reproducible, and portable development environments.”. DSS developers have already begun using Vagrant to support development of Libraries’ applications.

Josh Westgard has been focused primarily on setup and support of various web applications, including Omeka and ArchivesSpace, as well as digital preservation and file management tasks.

Collection-building

DRUM

Theses and dissertations from the 2014 summer sessions are now available in DRUM (http://hdl.handle.net/1903/3) bringing the total number to 9,799.

Hathi Trust
Several of our colleagues in the Libraries are now assisting us in making copyright determinations for books in HathiTrust.  As part of the CRMS-World grant with the University of Michigan,   Johnnie Love, Leigh Ann DePope, Loretta Tatum, Paul Bushmiller, and Yeo-Hee Koh join Donna King, Audrey Lengel, and Terry Owen as representatives of the University of Maryland Libraries on the project.

Historic Maryland Newspapers

In October the Historic Maryland Newspapers Project asked their Advisory Board to recommend newspaper titles for digitization during the 2014-2016 National Digital Newspaper Program (NDNP) grant cycle. Board members suggested titles from all over the state and ranked them in order of importance. After tallying and reviewing the results, Doug McElrath and Liz Caringola narrowed down the results to the following titles:

  1. Aegis & Intelligencer, Bel Air, 1864-1922 (published 1864-1923)
  2. Catoctin Clarion, Mechanicsville (Thurmont), 1871-1922 (published 1871-1942)
  3. Cecil Whig, Elkton, 1841-1922 (published 1841-current)
  4. Daily Banner, Cambridge, 1902-1922 (published 1902-1960)
  5. Democratic Advocate, Westminster, 1865-1922 (published 1865-1972)
  6. Montgomery County Sentinel, Rockville, 1856-1922 (published 1855-1974)
  7. Port Tobacco Times, and Charles County Advertiser, Port Tobacco, 1845-1898 (published 1845-1898)
  8. Prince George’s Enquirer and Southern Maryland Advertiser, Upper Marlboro, 1882-1922 (published 1882-1925)
  9. St. Mary’s Beacon/Gazette, Leonardtown, 1852-1922 (published 1845-1983)

The microfilm of these titles will be evaluated for technical quality and bibliographic completeness before making the final decision to digitize. In addition to these titles, the project will also complete digitization of Der Deutsche Correspondent from 1914 to 1918 in partnership with the Maryland Historical Society. Digitization for the second NDNP grant should begin in early 2015.

Plant Patents

Jennie Knies and Robin Pike met with staff of the Engineering and Physical Sciences Library (Nevenka Zdravkovska, Robin Dasler, Alex Carroll, and Jim Miller) to discuss a process and workflows for digitizing color plates from U.S. Government plant patents.  The pilot project is underway and the patents should be available via the Libraries’ website in early 2015.  The project will complement similar efforts at other institutions, for example, the New York Public Library, who has digitized the color plates from 2012-2014.

Digitization Activities

Fifty volumes were included in the monthly shipment of books and serials for digitization by the Internet Archive, and 46 deteriorating films from the Library Media Services collection were sent to a film digitization vendor.

Abby Lee digitized Harmony or Chord formation, relation and progression: being introductory to the art of Musical Composition to which is prefixed by a brief view of Musical Notation, circa 1871-1872, an unpublished, handwritten manuscript from the Lowell Mason Collection from SCPA. This document will be transcribed by SCPA staff to make this document searchable for their patrons.

Twenty-one historical French pamphlets and 121 university publications were digitized in-house and submitted to the Internet Archive as part of an ongoing effort to make additional unique materials in both of these collection areas available to the public.

Software Development

Peter Eichman promoted the latest AlephRx improvement into production for use by USMAI staff.

No other projects were completed but progress toward completion was made on:

Digital Scholarship and Publishing

The UMD Libraries Open Access Fund (http://www.lib.umd.edu/oa/openaccessfund) is now accepting applications.  Dean Steele was able to acquire additional funds from the Provost’s office, the Division of Research, and other deans on campus for 2014-2015.

USMAI (University System of Maryland and Affiliated Institutions) Library Consortium

The Consortial Library Applications Support (CLAS) team continued meeting with Ben Wallberg (DSS) through October, working on the process of installing and testing Kuali Open Library Environment (OLE) for UM College Park. Our initial OLE test installation was populated with demonstration data from other OLE partner institutions and proved to be difficult to work with. In early October, we installed OLE version 1.5.3, and started to have more success with loading our own local data for testing purposes.  CLAS team members are also participating in weekly online meetings of the OLE Implementation Group, which includes representatives from various OLE partners including Duke, Villanova, and Indiana University (among others). We plan to continue testing the OLE system to demonstrate its capabilities to USMAI for more informed decision making.

Peter Eichman (DSS) recently lent his programming skills to the CLAS team to make some improvements to the AlephRx problem reporting system. The updated version of AlephRx was rolled out on October 13. Improvements include:

  • a new functional area for reports—Password Reset—which we hope will make submitting these requests easier;
  • improved consistency of terminology throughout the AlephRx website (titles, labels);
  • a new format for email messages, to encourage replies through the AlephRx web site and help keep the comments related to each Rx together;
  • a fix to the “Active” filter on the list of reports, so it now correctly includes any reports not “closed”, and the filter buttons at top of summary list of reports have been rearranged in more logical order; and
  • a new “from” e-mail address, to ensure reliable delivery of emails to DSS.

The CLAS team extends its thanks to our USMAI colleagues who were tapped to do testing of the form in the week before the rollout—your help in quality assurance testing is much appreciated. Many thanks also to Linda Seguin, who spearheaded the internal testing of the revised AlephRx system and took care of many details necessary to make the changeover to the new version go smoothly.

Staffing

DCMR welcomed two new digitization assistants: Rachel Dook and Brin Winterbottom. Rachel is also the Graduate Assistant in Preservation and Conservation and Brin is also an hourly student in the Art Library. Both are students in the iSchool.

Francis Kayiwa joined USS as a System Analyst and will be providing System Administration and User Support. He received his Bachelor from St. Bonaventure University and his Master of Library Science from State University of New York at Buffalo (SUNY at Buffalo).  He is coming to us from Richard J. Daley Library, University of Illinois at Chicago where he worked as a Library Systems Coordinator.

Events

On Saturday, October 18, 2014, Sandra, Preston, Victoria and Uche from USS was asked to provide a 3D printing demonstration for alumni before the homecoming football game. The event was located in the Samuel Riggs IV Alumni Center. USS displayed two of our 3D printers, the Makerbot Replicator 5th generation and Makerbot Replicator 2. Throughout the event, we printed miniature copies of Testudo similar to the statue in front of the McKeldin Library. We gave away over 100 miniature Testudo statues in assorted colors. The red Testudo miniature statue was popular for obvious reasons. We were easily the busiest table and experienced a lot of interest and excitement about 3D printing from all age groups. People were so excited, that some of our 3D models we had on display mysteriously walked away. Also, someone at the event was even willing to pay for a larger red Testudo statue we had on display. We communicated that 3D printers can produce prosthetic limbs but can also produce everyday household items, such as a wrench or cup. Many alumni were pleased to know how applicable 3D printing could be. We received many questions on why 3D printing is available in the Libraries, but not in other places on campus, specifically Engineering and Architecture We received many questions on why 3D printing is available in the Libraries, but not in other places on campus, specifically in the Engineering and Architecture programs. We explained that libraries are no longer centered on books, but now provide more of an innovative and technology driven environment for creative thinking and entrepreneurship. Many alumni were pleased to know that this service is available in the Libraries for all campus students. And even though we cannot be 100% certain, USS feels that the successfulness of our 3D printing demonstration was the reason why Maryland won the homecoming football game later that day.  Go Terps!

Conferences, Workshops and Professional Activities

Eric Cartier attended the Mid-Atlantic Regional Archives Conference (MARAC) in Baltimore on October and delivered a presentation entitled “Creating Digitization Workflows That Work at UMD Libraries.”

Liz Caringola joined Eric Cartier as co-chair of the Emerging Technologies Discussion Group.Liz Caringola attended MARAC and presented on the impact that links and citations in Wikipedia articles have on driving traffic to the digitized newspapers of the National Digital Newspaper Program (NDNP) and suggested next steps for NDNP participants that wish to pursue activities related to Wikipedia.

Jennie Knies and Babak Hamidzadeh attended the Fall Academic Preservation Trust (APTrust) meeting in Washington, DC.  Jennie gave a presentation on the University of Maryland Libraries’ high-level preservation ecosystem and discussed potential uses for APTrust services within that context.

Ben Wallberg, Paul Hammer, Mohamed Abdul Rasheed, and Joshua Westgard along with Bria Parker from Metadata Services attended the DC Area Fedora Users Group meeting where they connected with other local Fedora users and received a nice overview of new features in Fedora 4.  Paul Hammer attended the second day of the meeting to receive Fedora 4 developer training.

Josh Westgard attended the IEEE conference in Bethesda, MD on Oct. 27-30, where he presented a poster describing his work to improve an upload bot currently being used by the National Archives to upload digitized materials from its collections into Wikimedia Commons.  The poster was summarized in a short article published in the proceedings of the conference: “The Bot Will Serve You Now: Automating Access to Archival Materials,” Proceedings of the 2014 IEEE International Conference on Big Data, Oct. 27-30, Washington DC, ed. Jimmy Lin, et al. (ISBN 978-1-4799-5665-4), pp. 73-74.

With leadership from Josh Westgard and Bria Parker, the Libraries Coding Workshop began meeting on a weekly basis on Oct. 13, and is going strong.  The participants are working through CodeAcademy lessons, and on collaborative projects, using Python, shell scripting, and XSLT.  Karl Nilsen gave a brief demonstration of natural language processing in the UMD Libraries Coding Workshop using 108 volumes of The Carpenter, a periodical produced by the United Brotherhood of Carpenters and Joiners of America. The periodical is part of the Libraries’ collections in labor history, and digitized issues are available from the Internet Archive.

Karl Nilsen completed a one-day course on natural language processing taught by District Data Labs.

Stew of the Month: September 2014

Welcome to a new issue of Stew of the Month, a monthly blog from Digital Systems and Stewardship (DSS) at the University of Maryland Libraries. This blog provides news and updates from the DSS Division. We welcome comments, feedback and ideas for improving our products and services.

New Technologies

Born-Digital Workflows

Graduate Assistant Alice Prael (Digital Programs and Initiatives) began work with fellow graduate assistant Amy Wickner (Special Collections and University Archives) to formalize and finalize born-digital workflows and processes using the forensic workstation (FRED) in Hornbake Library and a suite of tools that include Bit Curator.

Coding Workshop

Josh Westgard and Bria Parker (Metadata Librarian, Technical Services) began planning to relaunch the UMD Libraries’ Coding Workshop. The first meeting will be in October.

ESRI Geoportal Sandbox

Paul Hammer worked with Jennie Knies and Mary Kate Cannistra (Public Services) to install and configure a sandbox version of a tool called “ESRI Geoportal.” Esri Geoportal Server is a free, open source product that enables discovery and use of geospatial resources including datasets, rasters, and Web services. It helps organizations manage and publish metadata for their geospatial resources to let users discover and connect to those resources. The Geoportal Server supports standards-based clearinghouse and metadata discovery applications. The trial will run through January 2015.

Journal Survey Tool

DSS Staff (Jennie Knies, Mark Hemhauser, Paul Hammer, Uche Enwesi) worked with staff in Collections Services and Strategies to put the finishing touches on the Journal Survey Tool, a web-based application that will be used this fall to solicit input from faculty, staff, and graduate students about priorities for serials.  The main work on this project was completed this past spring, with extensive assistance from Josh Westgard.

One Button Studio Presentation Cart

Public Services and DivIT created a presentation room on the 2nd Floor of Mckeldin in the Terrapin Learning Commons (TLC) area a few years ago. The room was intended to be used so students could record presentations to analyze before presenting in class. At the time, the room contained a wide-screen TV, Blu-Ray player, Dell computer, and a hi-def mounted camera in the back of the room. However, the room’s setup wasn’t as user-friendly as hoped. Also, curious students that were using the room for group study would always seem to break something that involved calling a vendor to fix.

USS reexamined the situation and envisioned other ideas that could be used to accomplish the intended goal. A technician in USS discovered the “One Button Studio.” The software provided a very simple user interface to record video to a USB flash drive. After more investigation and testing, USS acquired a Mac minicomputer, display monitor, digital camera, shot gun microphone, small audio mixer, HDMI adapter, and a rolling cart to begin building the One Button Studio environment. The mac mini, audio mixer, HDMI adapter and cables are all hidden inside the rolling cart’s cabinet. The system is on a rolling cart so that it is mobile and can be used in multiple locations. Using the One Button Studio cart is very simple. Once the power for the cart is plugged in, the hidden mac mini automatically turns on and load the One Button Studio software. The student only needs to power on the digital camera. Once the software loads, the student can plug a USB flash drive into the USB connector. Once an image appears on the screen, the student presses the One Button device that will start the recording process. When the student is finished recording, they press the One Button device again. The application then converts the video to a file and saves it on the flash drive for the student to watch later.

The cart is currently behind the TLC desk so it is loaned out like any other device from the desk. This way we can have accountability when things get missing or break. The process from acquiring the One Button technology to finally providing a finish product to TLC took the effort of multiple staff in USS. This was truly a collaborative effort.

Tanner Wray from Montgomery College visited Mckeldin Library with some other colleagues in order to see the One Touch Button cart. Tanner was truly amazed. Many schools have rooms dedicated for recording. But, this was the first time he seen someone design a portable system for recording.

 

1

Picture of the cart full length

 

IMG_0148

This just show the cart with the Camera, Monitor Mic, USB reader and button to start the recording.

5

4

 

3

 

6

Collection-building

DRUM

Publications from the Agriculture Law Education Initiative (ALEI) are currently being deposited in DRUM (http://hdl.handle.net/1903/15555).  ALEI (http://umaglaw.org/) is a new collaboration between the University of Maryland College Park College of Agriculture & Natural Resources, the University of Maryland Carey School of Law, and the University of Maryland Eastern Shore School of Agricultural and Natural Sciences committed to providing Maryland farmers with the information they need to prosper while complying with the complex network of laws and policies protecting the integrity of the state’s food system and environment.

Presentations from the spring 2014 MARAC (Mid-Atlantic Regional Archives Conference), held in Rochester NY this past April, have recently been deposited in DRUM (http://hdl.handle.net/1903/15602).  “Film, Freedom, and Feminism” was the theme for the spring meeting.

Prange Children’s Book Collection, cleanup and wrapup

Paul Hammer, working with Jennie Knies and Prange Collection staff continued the process of quality control on the Prange Digital Children’s Book Collection.  At the time of writing, all 8,059 books have been successfully imported into Digital Collections.

Research Data

Karl Nilsen and Robin Dasler made substantial progress on deploying a local copy the Extragalactic Distance Database (EDD). Steps included configuring the Apache HTTP server, securing the MySQL database and user accounts, loading data into the database, initializing a Git repository for the application code and creating a development branch.
Karl Nilsen and colleagues in Research Data Services drafted a collection policy for data, software code, and other research products that were generated, produced, or collected by UMD researchers and their collaborators at other institutions. The policy will help guide the Libraries’ data curation and digital scholarship activities.

Digitization Activities

Robin Pike worked with Joanne Archer (SCUA), Yelena Luckert (Research Services), Vin Novara (SCPA), Bria Parker (MSD), and Linda Sarigol (LMS) to coordinate a schedule to ship materials to digitization vendors over the next fiscal year.

Since Robin and Eric Cartier finalized the digitization setup, Eric began training digitization assistants Audrey Lengel, Alison Skaggs, and Abby Yee at the Performing Arts Audio Digitization Studio (PAADS). This studio will serve as the location to digitize audio requests and future projects from the Performing Arts Library, Special Collections in Performing Arts, and International Piano Archives at Maryland. It will be staffed by trained DCMR digitization assistants who will split their time between the Hornbake Digitization Center and PAADS, as needed.

After students digitized selected audiocassettes from the Katherine Anne Porter papers and the Paul Porter papers, Eric determined that the cassettes had substandard audio quality and performed basic audio restoration using specialized software on ten selected recordings with the worst audio quality. Special Collections personnel will determine if this level of audio restoration is sufficient, or if we will use a vendor to perform this work. This process goes beyond DCMR’s normal operations but is merited by the research importance of these prominent collections.

In celebration of Banned Books Week (September 21-27), SCUA had DCMR digitize numerous book covers and spines for the online exhibit on Flickr.

DCMR staff provided digitized images and retrieved digital video files from the Football Films collection to Athletics Archivist Amanda Hawk for the Maryland Athletics Hall of Fame ceremony, which will occur on October 3.

Software Development

Shian Chang and Cindy Zhao, working closely with Laura Cleary in Special Collections, completed building the web framework for Special Collections exhibits in Hippo CMS.  The first exhibit released is Beyond the Battle: Bladensburg Rediscovered which uses the Unify responsive website template.  Once fully implemented we will be able to cut new exhibit sites from a template on-demand, without any need for custom programming.

Peter Eichman dusted the cobwebs off the AlephRx homegrown ticket tracking system used to track consortial requests/problems for Aleph.  He has moved the code base into GitHub and modernized the development environment using Vagrant and along the way made some improvements to the interface and fixed some bugs.

Mohamed Abdul Rasheed has continued progress on migrating the Jim Henson Works to a Solr based search as well as adding a new database search for Special Collections in Performing Arts (SCPA) scores database.  Completion of both projects is expected in October.

Digital Scholarship and Publishing

In September, the Libraries officially approved the launch of a formal Digital Scholarship and Publishing Program.  The program builds on current offerings and introduces a new suite of services that are flexible, extensible, and vital to the needs of our faculty.  This includes providing platforms
to publish electronic journals and other types of digital publications and a limited menu of consulting services related to publishing, such as training on author identity.

After a banner inaugural year, the UMD Libraries Open Access Publishing Fund is up and running again for 2014-2015.  Submission information and selection criteria are available at http://www.lib.umd.edu/oa/openaccessfund.

USMAI (University System of Maryland and Affiliated Institutions Consortium)

Kuali OLE (Open Library Environment) implementation: The Consortial Library Applications Support (CLAS) team has been meeting with Ben Wallberg (DSS) to develop further an existing plan for testing OLE. The plan will later be executed to conduct the tests of the system. David Steelman, the developer working with the CLAS team, using an open source tool called “Vagrant”, has constructed a reproducible environment in which Kuali OLE development can be done.

Aleph system support: Hans Breitenlohner investigated the recent pc_server performance issues, which were generating user reports of Aleph slowness on a nearly daily basis in recent weeks. Hans found that the problem stemmed from the way WorldCat Local (WCL), the Aleph z39.50 server, and the Aleph pc_server interact. The slowdowns in Aleph performance were occurring whenever a user searching WCL retrieved (or tried to retrieve) a title with a large number of items—and many titles in Aleph have well over 1200 items.

WCL requests records in online public catalog (OPAC) format, which includes item and circulation data. The z39 server retrieves item information from the pc_server, 50 items at a time, collects it, and returns it to WCL in a single response. This operation soon overloads the server capacity when there are large numbers of items. For example, a single hit on the New York Times (which has over 7,000 item holdings) in WCL would keep two of our CPUs (one quarter of our system) busy for about 17 minutes. Given these circumstances, it is not surprising that there were times when the requested work greatly exceeded the capacity of our system. To get around this, Hans added an additional check to the z39 server, which looks at the total number of items and modifies the response when the title has more than 1000 items. Hans’s fix seems to have alleviated the performance issues seen recently.

Security patching: On September 26, Ex Libris notified its customers of vulnerability to the ‘shellshock’ exploit in all Unix/Linux systems that use the Bash shell (a popular command-line shell), posing a threat to all Ex Libris systems/products running on Unix/Linux. DSS systems administrators were already aware of the issue, and promptly applied the necessary patches to all affected servers.

Acquisitions/serials support for USMAI: Mark Hemhauser has been working with Towson and Yankee Book Peddler (YBP) on setting up a loader for shelf ready firm orders. Mark also ran a fresh license report of USMAI licenses for College Park, and budget reports for Salisbury and the University of Baltimore.

Circulation support for USMAI: David Wilt ran 10 ad hoc reports run for Bowie, College Park, Morgan, Saint Mary’s, UM Law, UB Law, and Frostburg. David also updated, changed, or created new circulation rules/parameters and/or item statuses for UM Law, Shady Grove, and the University of Baltimore.

Aleph user interface support: The CLAS team noticed that we were receiving more problem report emails sent from the Aleph OPAC (catalog.umd.edu) by folks who are affiliated with the Montgomery County Public Schools (MCPS). Because there was no specific option for “Montgomery County Public Schools” in the dropdown of choices for “campus affiliation” on the problem form, MCPS folks were forced to choose “Other”. The form sends the messages with “other” affiliation directly to the CLAS team, who would triage them and forward them to the appropriate staff. To streamline the process for MCPS patrons, Heidi Hanson modified the “problems/comments” form add an option to select “Montgomery County Public Schools” in the dropdown of choices for “campus affiliation”. Now problem report email messages from MCPS-affiliated users are routed directly to the usg_mcps email group monitored by Priddy Library staff, who serve as MCPS liaisons.

Staffing

Josh Westgard joined Digital Programs and Initiatives on September 22, as Systems Librarian.

Alice Prael joined DSS in September as the graduate assistant, Digital Programs and Initiatives. Alice is currently in her second year at the iSchool, in the Digital Curation concentration.

The Historic Maryland Newspapers Project welcomed a new Student Assistant in September. Jordan Lee is in her second year of the MLS program and is a GA in the College of Behavioral and Social Sciences Advising Office. Welcome, Jordan!

David Steelman joined SSDR as a System Analyst and will be providing software development and applicant support for USMAI, working along with the Consortial Library Applications Support (CLAS) team in DSS. David received his Bachelor of Science, Comprehensive, from Villanova University and his Master of Science, Computer Science, from the University of Maryland, College Park.  He is coming to us from Raytheon Solipsys Corporation where he worked as a Senior Software Engineer, working on projects such as the Tactical Display Framework (TDF), a Java-based object-oriented Command and Control Battle Management package.

It has rained and poured this month for SSDR in that we also welcomed three new Graduate Assistant software developers, each in the second year of their programs.  Sakshi Jain is in the Masters of Information Management program while Rohit Arora and Vivian Thayil are in the Masters of Telecommunications Engineering program.

Events

As part of the Future of the Research Library Speaker Series, Martin Sandler, Director of the Center for Library Initiatives for the CIC, will be speaking on Thursday, 4 December from 10:00 am to 11:30 am in the Special Events Room.  Details are available at http://www.lib.umd.edu/speakerseries.

Conferences, Workshops and Professional Activities

Jennie Levine Knies, Doug McElrath, and Liz Caringola attended the annual meeting of the National Digital Newspaper Program (NDNP) from September 16-18 in Washington, DC.  Liz Caringola presented on the results of our Wikipedia project this summer. Liz also represented Maryland at a pre-conference meeting called Beyond NDNP, which discussed issues of project sustainability after NDNP funding ends, shared infrastructure, and standards and best practices for newspaper digitization.

Josh Westgard was notified that his poster, “The Bot Will Serve You Now: Automating Access to Archival Materials,” was accepted by the IEEE Conference on Big Data, which will be held October 27-30, in Bethesda, Maryland.

Jennie Knies joined the BitCurator Consortium Start-Up Committee.  The Start-up Committee is the decision-making group until the Executive Council is elected in spring 2015.

Knight News Challenge: Libraries. Our application…

The Knight Foundation recently issued a news challenge: How might we leverage libraries as a platform to build more knowledgeable communities? Here at the University of Maryland Libraries, we felt that we had an idea.

Improving Discovery in Digital Newspapers through Crowdsourcing the Development of Semantic Models

“We will develop tools that enable users of digitized newspapers to intuitively create connections between the concepts, people, places, things, and ideas written about in the newspaper pages, which will facilitate further discovery and analysis by researchers at all levels.”
The process of working on this application was fun and inspiring.  Our Associate Dean for Digital Systems and Stewardship, Babak Hamidzadeh, had the original vision. He enlisted myself (Jennie Knies) and Liz Caringola, our Maryland Historic Newspapers librarian, to help flesh out some of the ideas.  The UMD Libraries’ Communications director, Eric Bartheld, and our Director of Development, Heather Foss, also contributed. Ed Summers (MITH) and Dr. Ira Chinoy (Journalism) provided excellent feedback and encouragement. Rebecca Wilson, the UMD Libraries’ graphic designer, created this compelling graphic under a very tight deadline.
 KnightProposalImage
The application itself had very strict word/character requirements, which was a fascinating challenge in itself.  750 characters (that includes spaces!) to communicate the entire idea?
We think that we are uniquely positioned to develop these types of tools – we have the enthusiasm, the content (thanks to the Maryland Historic Newspapers project and to Chronicling America), and the resources and expertise to make this a reality.  Fingers-crossed that we get a lot of “applause!” There are a lot of amazing proposals for the Knight Foundation to choose from, but I hope we get to be one of them.

UMD Libraries Join BitCurator Consortium as Charter Member

The University of Maryland Libraries are in the midst of working on policies, procedures, and workflows for managing born-digital content.  3 1/2″ and 5 1/4″ floppy disks, along with Zip disks, CD-ROMs, and DVDs already live within the archival and manuscript collections within Special Collections and University Archives.  The challenges involved in preserving these media and the content stored on them are numerous.  Often, equipment or software necessary to use older disks is obsolete or unavailable.  The disks themselves may become damaged due to misuse, or, simply, time. Law enforcement agencies who need to read hard drives and other media for forensic research have been at the forefront of developing hardware, software and other tools to work with older media.  Funded by the Andrew W. Mellon Foundation, BitCurator is a tool designed specifically for libraries and archives.  It is a fully-contained system that contains easy-to-use interfaces to allow for some standard activities necessary for copying, reading, and curating digital media. For the University of Maryland Libraries, the existence of BitCurator has saved us from having to reinvent the wheel when it comes to beginning our born-digital activities.  Our main installation lives in Hornbake Library, on our Forensic Recovery of Evidence Device (FRED).  This fall, two graduate assistants, Amy Wickner (Special Collections and University Archives) and Alice Prael (Digital Programs and Initiatives), will pick up where the UMD Libraries’ Born-Digital Working Group left off earlier this year to finalize some our basic born-digital workflows.

The BitCurator Consortium operates as an affiliated community of the Educopia Institute, a non-profit organization that advances cultural, scientific, and scholarly institutions by catalyzing networks and collaborative communities to facilitate collective impact. The University of Maryland Libraries have signed on as a charter member and are delighted to be involved in this endeavor.

“Managing born-digital acquisitions is becoming a top concern in research libraries, archives, and museums worldwide,” shares co-founder Dr. Christopher (Cal) Lee. “The BCC now provides a crucial hub where curators can learn from each other, share challenges and successes, and together define and advance technical and administrative workflows for born-digital content.” Co-founder Dr. Matthew Kirschenbaum adds: “Tools without actively invested communities wither on the vine, become dead bits. The BCC is not just an extension of BitCurator, in a very real sense it will now become BitCurator.”

Institutions responsible for the curation of born-digital materials are invited to become members of the BCC. New members will join an active, growing community of practice and gain entry into an international conversation around this emerging set of practices. Other member benefits include:

•    Voting rights
•    Eligibility to serve on the BCC Executive Council and Committees
•    Professional development and training opportunities
•    Subscription to a dedicated BCC member mailing list
•    Special registration rates for BCC events

BitCuratorConsortiumCharter-InvertNoAlpha-300

UMD’s Digital Preservation Policy, updates

In early 2014, the UMD Libraries published its first Digital Preservation Policy.  In the policy, we specify that it must be reviewed on an annual basis, and so this summer, a small task force consisting of myself, Robin Pike, and Joanne Archer reviewed the document, and made a few minor changes. The most significant change was to add an entire section about “Financial Commitment.”  The other change was to modify how we approach actual implementation of the plan. More on that below, but first, what have we accomplished in the past year?

In the past year, various players at the UMD Libraries have embarked on projects or development that ultimately ties into our Digital Preservation Policy. These activities include:

  • A repository research team (Jennie Knies, Ben Wallberg, Babak Hamidzadeh) developed a high-level requirements document for a Bit-Level Preservation System. Ben Wallberg presented on these requirements at Open Repositories 2014 in Helsinki, Finland
  • Software Systems Development and Research (SSDR) installed the ACE Audit Manager tool on our Digital Repository at the University of Maryland (DRUM) DSpace system
  • A task force consisting of Jennie Knies (DSS), and Joanne Archer and Cassie Schmitt (Special Collections) continued the work of the UMD’s Born-Digital Working Group to finalize workflows for processing born-digital archival and manuscript materials.  While not complete, we have developed a plan to complete the first stage of workflows by the end of 2014
  • Over 120,000 files created and archived to UMD’s Division of Information Technology and subsequently to Iron Mountain and enhanced workflow for documenting said files
  • Robin Pike and Jennie Knies published “Catching Up: Creating a Digital Preservation Policy,” in Archival Practice 1, no. 1 (2014)
  • Began plans for upgrading Fedora repository from Fedora 2.2.2 to Fedora 4.0

Much of the work involving documentation and policy development, however, remains abstract and somewhat elusive.  In the past year, we have attempted to pull together all documentation of policies and procedures relating to digital preservation activities.  We have also begun the process of researching real costs of digital preservation (storage costs, human resources, etc.)  In addition, I have written something that I informally call “Policies of Where to Put Stuff,” and formally something like “Digital Preservation Networks Policy,” a document for which I have had writer’s block for the last four months, but hope to finish soon, as it is integral to how we manage digital content moving forward.

The Digital Preservation Policy, intended to be a high-level document to guide the creation and implementation of additional policies and procedures related to digital preservation, contained an appendix intended to outline the documentation necessary to implement the plan.  The appendix in the original plan was based on the Center for Research Libraries, Metrics for Repository Assessment, which were based on the ISO 14721:2012 standard. This standard is commonly referred to as the OAIS  reference model and was developed through the Consultative Committee for Space Data Systems (CCSDS.)  The appendix was very detailed and while it was broken into easily-understandable categories and clearly defined the types of policies and procedures we needed to establish, we have found it difficult to map those requirements and categories to the policies and procedures currently in place.

In July, I was fortunate enough to attend Digital Preservation 2014, the annual meeting of the National Digital Information Infrastructure and Preservation Program and the National Digital Stewardship Alliance.  There I heard a wonderful presentation by Bert Lyons from AVPreserve entitled, Mapping Standards for Richer Assessments: NDSA Levels of Digital Preservation and ISO 16363:2012.  That was my “A-ha!” moment.  As Bert pointed out in his presentation, ISO 16363:2012 is very long and there is a lot of overlap between individual components. AVPreserve have created a wonderful document that maps the NSDA levels of digital preservation and the ISO requirements.

The NDSA Levels for Digital Preservation for those who are not familiar, are incredibly straightforward.  They are in a table and broken into five primary sections: Storage and Geographic Location, File Fixity and Data Integrity, Information Security, Metadata, and File Formats. Associated with each category are four increasingly more rigid levels of digital preservation. For example, to fit into Level 1 under Storage and Geographic Location, the requirement is to have two copies that are not collocated, and to move files from things like hard drives or DVDs onto your own storage media ASAP.  Done! We have achieved Level 1.

As I write, my graduate student is creating a version of the NDSA Levels that we can annotate. I loved the simple suggestion by Bret that we use the NDSA Levels as a sort of bar graph to visualize our progress.  We then plan to then use the AVPreserve mapping document to do a more detailed analysis of where we currently stand, and where we need to go with our digital preservation program.

Initially, we wondered if annual review of the Digital Preservation Policy was excessive. However, in these early stages of our program, I realize now how important it is to take stock at regular, and frequent intervals. The UMD Libraries are currently also revising our strategic plan, and the results of that activity will most likely make for interesting revisions in 2015, when we sit down to review the policy again.

Look at all the people…

A few months ago, one of my colleagues, Paul Hammer, a software developer with the UMD Libraries’ Software Systems Development and Research (SSDR), stopped by my office and mentioned to me that something in one of my recent blog posts was bothering him. Specifically, it was these two sentences:

Unfortunately, unlike our dependable analog collections, keeping track of all of this digitized content can sometimes be unwieldy.   One of my big goals is to reach the point where an inventory of these digital collections can provide me with the equivalent of a “Shelf location” and statistics at the push of a button.

Paul reminded me that a lot of human effort, management and coercion went into acquiring, tracking, cataloging and circulating information in the analog world.  If the staff, managers and profession were not diligently encouraging librarians, archivists and other professionals into using similar standards and practices, then no two collections would be remotely comparable.  He noted: “We need to recognize that this effort is just as big and difficult in the computer world.  Computers do not do all of this work for you regardless of how much we wish out were otherwise.  Computers just offer a really big room of shelves on which to put things and the ability to program helpers.  Helpers who are only capable of doing *exactly* what you ask of them — at nearly speed of light.”

I want to thank Paul for putting things in perspective.  First, his comments reminded me that Rome was not built in a day. Second, as Paul, and many of the recent projects I have worked on have shown, computers will only do exactly what you tell them to do and only contain as much logic as the human provides to them.  Third, I think that it is safe to say that standards and best practices are even  more important in the digital world than in the analog.

Last year, the UMD Libraries received funding for a project to digitize a portion of correspondence written by the American author, Katherine Anne Porter, whose papers reside at the University of Maryland.  What seemed at first to be a straightforward project turned into quite a complex and interesting one that is still not 100% complete.  At least a dozen UMD Libraries’ staff participated in some portion of the project, not to mention external parties such as our digitization vendor.  Joanne Archer in Special Collections and University Archives (SCUA) managed the project.   Two content specialists within SCUA (Librarian Emeritus Beth Alvarez and PhD candidate Liz DePriest) selected the approximately 2000 letters for the first phase of digitization.  Robin Pike, Manager, Digital Conversion and Media Reformatting (DCMR), facilitated the contracts and negotiation with the digitization vendor.  The correspondence was digitized in eight batches, and Special Collections staff had to prepare metadata for every letter, and prepare the packages for delivery.   Once digitization was complete, Eric Cartier (DCMR) performed QC on all of the deliverables (TIF, JPG, OCR text and hOCR xml).  Trevor Muñoz, Assistant Dean for Digital Humanities Research, used the raw data to develop several proof-of-concept possibilities for future data use and analysis.  Josh Westgard, graduate assistant for Digital Programs and Initiatives (DPI), facilitated transfer of the files for preservation.

And that is not all.  Fedora as a repository is an excellent example of a computer system that needs to be told exactly what to do.  We have not, to date, added any complex objects of the type of these letters (digital objects represented by an image, an OCR file, and an hOCR file).  DPI gathered the requirements for this new object type (UMD_CORRESPONDENCE) and delivered them to Software Systems Development and Research (SSDR).  Ben Wallberg, Manager, SSDR and two developers, Irina Belyaeva and Paul Hammer, worked to translate those requirements into reality.  What followed was a period of testing and analysis.  Likewise, we currently add content to our Fedora repository in three ways: 1) one-by-one using a home-grown web-based administrative interface and 2) using project-specific batch loading scripts that require developer mediation, and 3) using a batch loader developed by Josh Westgard in DPI that currently only works with audio and video content. For the Katherine Anne Porter project, logic dictated that we go with Door #2, and use a project-specific batch loading process.  In this case, SSDR and DPI agreed to use this as an excuse to develop and test an alternate method for batch ingest, with an eye towards developing a more generic, user-driven batch loader in future.

Irina and Paul worked on the batch loader for Katherine Anne Porter, and, when it was ready for testing, we ran into a series of minor, but educational complications.  First, it was necessary to massage and clean-up the metadata much more than anticipated, since SCUA had been using the spreadsheet to capture more information than needed for ingest. Second, other types of metadata errors caused the load to fail numerous times. This led, however, to the development of more rigorous validation checks on the metadata prior to ingest.  After the load was complete, I worked with Josh Westgard to analyze the success and we uncovered additional minor glitches, which we will account for in later loads.

The work is not complete.  The letters are ingested, but not viewable.  We still need to make changes to both our back-end administrative tool and our front-end public interface in order to accommodate this new content type.  And who knows what other types of user needs and requirements will necessitate additional work.  The data itself is rich and interesting.  Our hope is that it will be used both by scholars conducting traditional types of archival research as well as digital humanists interested in deciphering and analyzing the texts by computer-driven means.

This spring, Digital Systems and Stewardship hired its first ever Project Manager.  Ann Levin comes to the UMD Libraries with years of experience working on systems much more complex than our own.  As is obvious from the project description above, all of our work currently touches many different people with different skills and priorities within our organization.  It is our hope that we can start to formalize some of this work, develop more consistent workflows, and develop policies and procedures that ensure adherence to specified best practices and standards moving forward. The work has already started.  As Paul correctly pointed out to me several months ago, working with computers requires just as much, if not more, human involvement than some of our analog work. Planning is key. One reason the word “digital” causes instant anxiety for many people is that just as things such as access and indexing can move much more swiftly in a digital system than analog, it is also possible to entirely eliminate data instantly.  Paul provided this analogy:

Imagine an archive where everyone working there had the power to empty and restock the shelves with a wave of their hand.  That any given shelf could suddenly disappear.  That a box that used to be really popular can still be taken off the shelf but we have forgotten how to open it.  All of these things are all too possible in digital storage.  Think of the extra vigilance necessary just to know that what you have is really what you have.

Scary. But my original sentiment remains the same. With every new project, we move closer towards trusting our work, and reaching a point where creating, managing, and providing access to digital content really can seem as simple as the “push of a button.”  We just need to recognize all of the work, effort, and vigilance that goes into creating that single button.

Cool Tools: High Performance Sound Technologies for Access and Scholarship (HiPSTAS!)

I was delighted and intrigued to read an article in the March 26, 2014 web edition of the Chronicle of Higher Education: Scholars Collaborate to Make Sound Recordings More Accessible.  It described a project spearheaded by Tanya Clement, former University of Maryland employee, creator of In Transition: Selected Poems by the Baroness Elsa von Freytag-Loringhoven, and now assistant professor at the University of Texas at Austin.

I am always on the lookout for “cool tools” that we may consider using some day for our own work, and their are a lot out there. The HiPSTAS Research and Development with Repositories (HRDR) project is funded by an NEH Institute for Advanced Topics in the Digital Humanities grant to develop and evaluate a computational system for librarians and archivists for discovering and cataloging sound collections.  From the HiPSTAS blog:

The HRDR project will include three primary products: (1) a release of ARLO (Automated Recognition with Layered Optimization) that leverages machine learning and visualizations to augment the creation of descriptive metadata for use with a variety of repositories (such as a MySQL database, Fedora, or CONTENTdm); (2) a Drupal ARLO module for Mukurtu, an open source content management system, specifically designed for use by indigenous communities worldwide; (3) a white paper that details best practices for automatically generating descriptive metadata for spoken word digital audio collections in the humanities.

 

I, for, one, am looking forward to the output of this project, and at the prospect of a faster way to increase access to our fragile sound recordings.

Solr System

We are in the process of integrating Apache Solr to work with our Fedora-based digital repository on the back-end.  I am not going to pretend that I know and understand all of the technical details about Solr, as listed on their home page.  My layperson interpretation of its features are as follows:

  1. Solr is a standalone enterprise search server with a REST-like API. I think this means that Solr runs on its own and can be accessed via a URL in a web browser.
  2. You put documents in it (called “indexing”) via XML, JSON, CSV or binary over HTTP. At UMD, our “documents” are the FOXML xml files where Fedora stores our metadata
  3. You query it via HTTP GET and receive XML, JSON, or CSV results. We can use a web browser and a URL to construct queries.

At UMD, we ingest content into our Fedora repository via two methods: a home-grown web-based administrative interface for adding images and via batch ingest (which currently requires developer assistance). We use the administrative interface to manage the metadata for our digital objects. However, the administrative interface has always lacked robust reporting capabilities. Solr includes a robust administrative interface of its own that allows for the construction of complex queries and reporting outputs. For me, as a user, this is Solr’s greatest benefit for us. Our Software Systems Development and Research team try whenever possible to put as much knowledge in the hands of the users.  It is a win-win situation.  For them, it eliminates having to answer and investigate really basic questions for me, and for me, it enables me to achieve results and do my work without having to depend on others.

Solr requires first the development of a schema, which is essentially a file that explains what we want to index and how.  Understanding how to read and interpret the schema is a first step to understanding how Solr works. First, you define fields, and these fields are related to our metadata.  In a simple example, a “Title” field in Solr is an index on the <title> tag in our Fedora metadata.  Within a field, we can define how the field acts.  For example, we have defined a field type of “umd_default” that runs a series of filters on our data.  These filters are the key to understanding how searching works in Solr. I’m going to use the following piece of correspondence as an example: Letter by Truman M. Hawley to his brother describing Civil War battle. Includes envelope, September 26, 1862. When Solr indexes this title it does a number of things.  Many of these things are customizable, and this is what is important to understand.

  1. It separates and analyzes each word and assigns locations to them. “Letter” is in location 1 and takes up spaces 0-5 (the space at the end of the word is included in the word)
  2. It determines the type of word (is it alphanumeric? Or just a number? 1864 is just a number)
  3. It removes punctuation. Finally, a place where no one cares about commas.
  4. It removes stopwords. We apply a “StopFilterFactory” filter to remove stopwords. These can be customized. In our system, “by,” and “to,” are considered stopwords and we do not index them.
  5. It converts everything to lower case. Solr does not have to do this. We apply a “LowerCaseFilterFactory”  with the assumption that our users will not need to place emphasis or relevancy on case in searches.
  6. We apply an “AsciiFoldingFilterFactory” that converts alphabetic, numeric, and symbolic Unicode characters which are not in the “Basic Latin” Unicode block into their ASCII equivalents, if one exists. So, for example, a search on “Munoz” will match on “Muñoz”
  7. We apply the “PorterStemFilter” to the Title field.  This filter applies an algorithm that essentially truncates words based on assumptions about endings. In the example above, “describing” becomes “describ” and “battle” becomes “battl.”
What we are left with is indexing on the following terms:

letter truman m hawlei hi brother describ civil war battl includ envelop septemb 24 1864 This means that I could run the following query in Solr q= Title:(truman AND civil AND describ AND battl) and receive this letter as a hit.  Solr still allows for the capability of phrase queries (“Letter by Truman M. Hawley”), or for wildcard searches: (Truman AND Hawl*). Our implementation of Solr currently assumes a boolean “OR” as the default operator in a search string. So, if I thought to myself, I am interested in looking for content having to do with the Civil War in the month of September, I might type into a search box something like “civil september.” How this translates based on our configuration is “Search the Title field for anything containing the term “civil” OR “septemb.” Here are just a few examples out of my over 300 results:

  • The Greek beginning,Classical civilization
  • The Classical age,Classical civilization
  • Ancient civilizations The Vikings
  • Ancient civilizations The Aztecs
  • Ancient civilizations The Mayans
  • The Civil War in Maryland Collection
  • The Celts,Ancient civilizations
  • Acts of faith,Jewish civilization in Spain
  • Brick by brick: a civil rights story

How is this possible? Well, if I investigate how our PorterStemFilter analyses “civilization,” I discover that it becomes “civil.”  Also, as a user, in my brain, I am thinking that I want results that have to do both with the Civil War AND September, and Solr is returning results that have to do with either.  If I manually adjust my search to be a boolean “AND” search – Title:(civil AND September), I only see three relevant results. This might lead me to believe that we should instantly change our default search to “AND” instead of “OR” since obviously, if I type a search into a box and it has two terms, I want to see records with both those terms.  Our current default in our public interface is “AND.” And also, we should turn off the PorterStemFilter because all of those “civilization” hits are annoying. If I want to search for “Civil*” I will search for “Civil*.”

But is it so simple? What is best for the user? What default settings will be most useful for our users? This is a different discussion and I will be working with my colleagues on the Collections side of things to try to answer some of these questions. Solr is so robust, and can be used to fit so many different situations, that truly configuring it in the most effective way is overwhelming, but also exciting.

Where is all of our digital stuff?

I like to think that we, at the University of Maryland, are not unlike other university libraries, in that we have a lot of digital content, and, just like with books, we have it in a lot of different places.    Unfortunately, unlike our dependable analog collections, keeping track of all of this digitized content can sometimes be unwieldy.   One of my big goals is to reach the point where an inventory of these digital collections can provide me with the equivalent of a “Shelf location” and statistics at the push of a button.  One project I have been working on has involved documenting and locating all of the UMD Libraries’ digital content, in a first step towards this goal.  I am focusing right now on things that we create or that we own outright, vs. content that comes to us in the form of a subscription database, which is a whole issue in itself. We don’t have one repository to rule them all in a physical sense. Rather, I like to think of our “repository” at present as an “ecosystem.” Here are some parts of our digital repository ecosystem.

DRUM (DSpace) http://drum.lib.umd.edu

Stats: Close to 14,000 records.  Approximately 8,800 of these are University of Maryland theses and dissertations.

DRUM is the Digital Repository at the University of Maryland. Currently, there are three types of materials in the collections: faculty-deposited documents, a Library-managed collection of UMD theses and dissertations, and collections of technical reports.  As a digital repository, files are maintained in DRUM for the long term. Descriptive information on the deposited works is distributed freely to search engines. Unlike the Web, where pages come and go and addresses to resources can change overnight, repository items have a permanent URL and the UMD Libraries committed to maintaining the service into the future.  In general, DRUM is format-agnostic, and strives to preserve only the bitstreams submitted to it in a file system and the metadata in a Postgres database.  DSpace requires the maintenance of a Bitstream Format Registry, but this serves merely as a method to specify allowable file formats for upload; it does not guarantee things like display, viewers, or emulation.  DSpace does provide some conversion services, for example, conversion of Postscript format to PDF.  DRUM metadata may be OAI-PMH harvested, and portions of it are sent to OCLC via the Digital Collections Gateway. A workflow exists to place thesis and dissertation metadata into OCLC. Most of DRUM is accessible via Google Scholar.

Digital Collections (Fedora) http://digital.lib.umd.edu

Stats: 21,000 bibliographic units representing over 220,000 discrete digital objects.

Digital Collections is the portal to digitized materials from the collections of the University of Maryland Libraries.  It is composed primarily of content digitized from our analog holdings in Special Collections and other departments. The University of Maryland’s Digital Collections support the teaching and research mission of the University by facilitating access to digital collections, information, and knowledge.  Content is presently limited to image files (TIFF/JPG), TEI, EAD, and streaming audio and video.  Fedora manages the descriptive metadata, technical metadata, and the access derivative file.   While Fedora can be developed to accept any format, our implementation currently only easily accepts TIFF and JPG images, and TEI-encoded/EAD-encoded XML documents. We are not currently using Fedora to inventory/keep track of our preservation TIFF masters.  Audiovisual records are basically metadata pointers to an external streaming system.  Fedora metadata may be OAI-PMH harvested, and portions of it are sent to OCLC via the Digital Collections Gateway.  Google does crawl the site and many resources are available via a Google search.

Chronicling America (Library of Congress) http://www.chroniclingamerica.loc.gov

Stats: We have currently submitted approximately 25,000 newspaper pages to the Library of Congress, and anticipate a total of 100,000 pages by August 2014.

Chronicling America is the website that provides access to the files created and submitted as part of the National Digital Newspaper Project (NDNP) grants.  We submit all files (TIFF, JP2, PDF, ALTO XML) to the Library of Congress, and they archive a copy.  We are currently archiving a copy locally, in addition to the copies archived by LoC.  One complete copy of each batch is sent to UMD’s Division of IT for archiving. In addition, Digital Systems and Stewardship saves a copy of each batch to local tape backup, and retains the original batch hard drive in the server room in McKeldin Library.

HathiTrust http://www.hathitrust.org

Stats: Nothing yet! Plan to begin submitting content in 2014

HathiTrust provides long-term preservation and access services to member institutions.  For institutions with content to deposit, participation enables immediate preservation and access services, including bibliographic and full-text searching of the materials within the larger HathiTrust corpus, reading and download of content where available, and the ability to build public or private collections of materials. HathiTrust accepts TIFF images and OCR files in either ALTO XML or hOCR.  They provide conversion tools to convert TIFF masters into JPEG 2000 for access purposes.

Internet Archive http://www.archive.org

Stats: Almost 4,000 books, with over 840,000 pages

The Internet Archive is a 501(c)(3) non-profit that was founded to build an Internet library. Its purposes include offering permanent access for researchers, historians, scholars, people with disabilities, and the general public to historical collections that exist in digital format. The UMD Libraries contribute content to the Internet Archive in two ways.  First, we submit material to be digitized at a subsidized rate as part of the Lyrasis Mass Digitization Collaborative.  The material must be relatively sturdy, and either not be in copyright, or we should be able to prove that we have permission from the copyright holder.  We have also been adding content digitized in-house (usually rare or fragile), and upload the access (PDF) files and metadata to the Internet Archives ourselves.  The Internet Archive produces JPEG2000 and PDF files at the time of digitization.  They produce both cropped and uncropped JPEG2000 files for each volume. The UMD Libraries saves locally and archives to the UMD Division of IT the cropped JPEG2000 files and the PDFs.

***

I am already aware of other types of digital content that we will have to track.  Born-Digital records and personal files from our Special Collections and University Archives.  eBooks in PDF and other formats that we purchase for the collection and have to determine how to serve to the public.  Publications, such as journals, websites, and databases.  Research data.  I hope to return to this post in 2020 and smile at how confused, naive, and inexperienced we all were at all of this.  Until then, I will keep working to pull everything together as best I can.