Stew of the month: September 2017

Welcome to a new issue of Stew of the Month, a monthly blog from Digital Systems and Stewardship (DSS) at the University of Maryland Libraries. This blog provides news and updates from the DSS Division. We welcome comments, feedback and ideas for improving our products and services.

Digitization Activities

Historic Maryland Newspapers Project

Rebecca Wack submitted batch cumberland to the Library of Congress and through batch j to the digitization vendor; two more batches will be submitted to the vendor in October.

Wack and Robin Pike submitted the second interim report to the National Endowment for the Humanities and the Library of Congress detailing the grant progress and outreach accomplishments over the past six months.

Wack formed a Twitter group of NDNP state awardees for monthly campaigns to occur the second Tuesday of the month under the hashtag #ChronAmParty. The first campaign will be #CreepyNews and will highlight Halloween historic news.

Synergies Among African American History and Culture (AADHum)

Scott Pennington completed quality assurance on the deliverables from batch 1 and the 15 audiotapes from the David C. Driskell Center. He will share the enhanced metadata from all batches with SCUA and Driskell Center staff to enhance current collection guides.

Other Digitization Activities

Rebecca Wack worked with Vin Novara (SCPA) to write and submit a Letter of Inquiry for a Grammy Museum Foundation Preservation Implementation Grant to preserve and digitize a portion of “The Listening Room,” a radio program from the Robert Sherman Collection.

Robin Pike and other Digitization Initiatives Committee members revised the procedures and proposal form for FY19 digitization project proposals to account for an increased emphasis on the staff and financial resources required of preservation activities before digitization. These revisions will be presented at the October 19 Library Assembly meeting when the call for proposals opens.

Pike shipped the following digitization projects to vendors, beginning the FY18 digitization cycle: Spiro Agnew audio recordings, Athletics videotapes, The Black Explosion student newspaper, Arthur Godfrey films, and serials from the Mass Media and Culture collection area.

Eric Cartier began meeting with SCPA and SCUA collection managers to begin planning the 2018 calendar year in-house digitization projects. He also worked with experienced student assistants to train new and returning student assistants to begin in-house audio digitization, enabling the completion of more requests and projects in-house.

Cartier and Digitization Assistants also completed digitizing materials for the physical and virtual Labor Exhibit, which opened October 6.

Digital  Programs and Initiatives

Digital Collections

Now that the Diamondback Student Newspapers project is well underway (with the FY 2016 data having been loaded and released), Joshua Westgard has begun work on the data handler for the Katherine Anne Porter Correspondence. Because of the modular design of the batchload client developed by DPI and SSDR, the only section of the code that needs modification is the piece that interprets the original data and assembles it into repository objects. Work on the FY2017 Diamondback data continues in parallel to the work on the KAP project.

New Additions to DRUM

Almost 300 theses and dissertations from UMD summer 2017 graduates have recently been deposited in DRUM bringing the total to more than 13,000. Here’s the breakdown of new entries by college:

82 – A. James Clark School of Engineering
21 – College of Agriculture & Natural Resources
41 – College of Arts & Humanities
46 – College of Behavioral & Social Sciences
67 – College of Computer, Mathematical & Natural Sciences
18 – College of Education
3 – College of Information Studies
4 – Philip Merrill School of Journalism
6 – Robert H. Smith School of Business
2 – School of Architecture, Planning, & Preservation
6 – School of Public Health
2 – School of Public Policy

Check out the latest research from UMD grads at the UMD Theses and Dissertations Collection in DRUM.

Reports from the Partnership for Action Learning in Sustainability (PALS) have recently been deposited in DRUM (http://hdl.handle.net/1903/19607). Administered by the National Center for Smart Growth Research & Learning, PALS is designed to provide low-cost assistance to local governments while creating real-world problem-solving experiences for UMD students. Faculty incorporate the jurisdiction’s specific issues as part of their course and students use the classroom concepts to complete these sustainability-focused projects. Students gain experience while working with a real client and produce a useful product for the partner city or county. Currently all reports in DRUM are restricted to campus-use only but, as permissions are obtained, the access restrictions will be lifted.

Open Journal Systems Upgrade Planning

DPI Graduate Assistant Carlos Alvarado is investigating updating our electronic journal publishing platform to the latest software version, in close collaboration with Terry Owen, Josh Westgard, and Kate Dohe.  Open Journal Systems (OJS) 3.0 represents a significant upgrade effort, with substantial changes to the user interface for editors and authors, as well as modernized, responsive journal templates for readers.

Software Development

Fedora Content Repository

The UMD Student Newspapers public interface is now available, with digitized versions of The Diamondback student newspaper from 1910-1971.  This interface is built using Fedora Content Repository, IIIF, Mirador, Solr, and Hippo CMS technologies.

ArchivesSpace

We have deployed ArchivesSpace 2.1 which contains the overhauled Public User Interface.  Planning is now underway for the changes necessary to release this as the preferred public access to Archival Collections beginning in January.

Hippo

Work continued on the Libi staff intranet replacement and the upgrade to Hippo version 11. Hippo 11 is planned for release at the end of October.

Reciprocal Borrowing

We added new features in Reciprocal Borrowing 1.1.1 and 1.1.2 which are currently in the user testing pipeline.  These support a change from using Shibboleth affiliation attributes to a reciprocal borrowing specific entitlement attribute for checking member eligibility to participate in the program.

USMAI (University System of Maryland and Affiliated Institutions) Library Consortium

The CLAS team responded to 122 Aleph Rx submissions and 35 e-resource requests from across the consortium’s libraries in September.

Aleph Session Fixation Issue Resolved

One of the oldest, unresolved Aleph Rx tickets, #11311, was resolved in September. This fixed a longstanding security issue in the Aleph OPAC.

Aleph Inventory Functionality

At the request of several USMAI libraries, CLAS has been investigating Aleph’s inventory functionality in order to determine the feasibility of implementing this functionality and assessing its usefulness USMAI libraries that wish to conduct inventories of their collections. Following CLAS’ initial investigation, a short term working group will be formed to review the available functionality, document recommended workflows, and assess any gaps in functionality.

Problem Reporting Forms Migration

CLAS hosts several HTML forms for use by USMAI, mostly for submission of requests and issues by staff at USMAI libraries but also some end-user forms. Work is currently underway to migrate these forms to our form system Wufoo. This will give the forms a new look, allow us to take advantage of some more modern form functionality, and simplify the process of modifying the forms. As we migrate these, we’ll also review the forms to make sure they collect necessary information effectively. And, we’ll get feedback from form users to make sure they meet the needs of USMAI.

MD-SOAR

The development of autosuggest functionality for subjects and formats on the MD-SOAR submission form has been completed and released in production. The autosuggest feature will allow users submitting records to choose from already submitted metadata values, which will minimize variations in metadata values, resulting in better discovery.

More batch loads were completed for Salisbury in September. The remainder will be completed in early October. The process was slowed by the discovery of a bug in the DSpace process for creating thumbnails. This bug is expected to be fixed in the upgrade to DSpace v6, currently scheduled to start in mid-October. These new collections and other recently submitted items can be viewed in MD-SOAR’s list of recent submissions.

Staffing

DCMR welcomed two new student assistants. Maggie McCready works for Eric Cartier in the Hornbake Digitization Center and Maya Reid began working on the Office of Research, Planning and Assessment office records digitization project; both are first semester students in the College of Information Studies specialized in archives.

Conferences, workshops and professional development

David Dahl attended the Maryland Research and Education Network’s annual symposium on September 29th.

Rebecca Wack, Robin Pike, and Doug McElrath (SCUA) attended the National Digital Newspaper Program Awardees Conference September 11-13 in Washington, DC. Pike presented on performing copyright research on newspapers published between 1923-1963 and McElrath presented on performing outreach to genealogical communities.

Kate Dohe’s article with Erin Pappas (University of Virginia Libraries) “The many flavors of ‘yes’: Libraries, collaboration, and improv” was published in the September issue of College & Research Libraries News. During the month of September it was among the most-viewed articles in the online issue.

On Oct. 3-4, several DSS staff members participated in the semi-annual DC Area Fedora User Group meeting held at NASA’s Goddard Space Flight Center in Greenbelt, MD. Ben Wallberg introduced the recently released Diamondback Student Newspapers interface, and Peter Eichman gave a presentation on his work developing RDF content models for OCR text using the W3C Web Annotation Standard.  In addition to being the primary organizer of the meeting, Joshua Westgard presented on three topics: (1) the newly formalized Fedora API, and the API alignment sprints recently undertaken by the Fedora community, (2) a Python-based batchload client developed by UMD, and (3) the import/export feature and tooling developed by the Fedora community.  The meeting highlighted both the recent progress in the Fedora community toward meeting the challenges of building reliable, flexible, and scalable repository services, and also the significant contributions made by the UMD Libraries toward achieving those goals.

David Durden and Kate Dohe attended the Research Data Management Implementations Workshop in Arlington, VA on September 14-15.

David Durden presented on the topic of Data Librarianship to new UMD iSchool students in Beth St. Jean’s course, “Serving Information Needs” (LBSC 602), on October 3, 2017. He introduced concepts and skills typical to data librarian positions, and highlighted iSchool courses that would prepare students for work in data curation and management.

Visits

Fedora 4 Digital Repository Implementation Project

I would like to take this opportunity to formally announce the launch of the Fedora 4 Digital Repository project, which aims to implement a new system to replace our existing 7-year-old Fedora 2.2.2-based Digital Collections.  Fedora has come a long way in the last several years and we are very excited about the possibilities offered by the newest version.  Because the differences between our older version and the latest are so diverse, this is a more complicated project than a simple upgrade.

An initial project planning group consisting of myself, Ben Wallberg, Peter Eichman, and Bria Parker, have outlined our primary objectives for the project:

  • Leverage repository improvements provided by Fedora 4 application
  • Migrate selected existing services and applications
  • Develop new features

You may read more about Fedora 4 as an application here: http://duraspace.org/node/2394.  Our complete objectives document is also available for reading: Fedora 4 Objectives.

It is important to note that we are hoping that this new repository will reduce some silos in our portfolio, and be more than just a place to house metadata and access copies of select digital assets.   We are moving forward with an awareness of the importance of a system to not just house, but manage, our digital assets, and to allow for more flexibility over who, what, when, where, and how our staff and our users can work with our content.

At a practical level, some of the changes/improvements we hope to make include:

  • Replacement of existing Administrative Tools interface with a community-developed and maintained application, such as Islandora.
  • Batch ingest mechanisms that can be user-operated and integrated with the Administrative Tools
  • Replacement of current homegrown metadata schemas with standard schemas, such as MODS and PREMIS
  • More advanced content model, allowing description and control of objects down to the node level, rather than at the descriptive record level
  • Enhanced user-generated reporting
  • Flexible authentication and authorization controls

This is a major project, one that will take approximately a year although we have yet to set firm milestones or deadlines. In the meantime, we are ceasing any major developments on the existing Fedora repository, with exception of crucial maintenance issues. We have noted and categorized existing outstanding metadata sweeps and will handle those during the migration process.  We appreciate your patience as we work on the new system, which will be a most welcome improvement.

Stew of the Month: November-December 2014

Welcome to a new issue of Stew of the Month, a monthly blog from Digital Systems and Stewardship (DSS) at the University of Maryland Libraries. This blog provides news and updates from the DSS Division. We welcome comments, feedback and ideas for improving our products and services.

Digitization Activities

Robin Pike worked with Joanne Archer, at UMD Special Collections, to coordinate sending 40 wire recordings from the Arthur Godfrey Collection, 160 open reel audio tapes from the WAMU Archives, and 229 volumes from the Mid-Atlantic Regional Archives Conference (MARAC) archives, labor collections, university publications, broadcasting serials, political serials, and Maryland state documents to multiple vendors for digitization. Eric Cartier and students in the Hornbake Digitization Center digitized and uploaded a batch of 83 volumes of the MARAC newsletter to the Internet Archive.

Software Development

DSS has joined as a development partner in the creation of the DuraSpace supported Fedora 4, which we will used to replace our existing Fedora 2 based architecture for Digital Collections.  Though late arrivals to the multi-year development effort we intend to participate in the ongoing development of the core Fedora 4 platform in parallel with our own implementation.  Fedora 4.0.0 was released on November 27.  We have begun the process of setting up our own development server and investigating the technology options available to us.

User and System Support

The year 2014 was a very productive year for User and Systems Support (USS). This year, 7,155 service requests were created in Sysaid. The following projects were accomplished during the year:

  1. During this period, USS was involved with various major projects such as the creation of the Makerspace, & Laptop bar to the replacement of over 100 staff computers and public access computers.
  2. USS supported the Terrapin Learning Common (TLC) spaces in various branch libraries from the specification of the type of equipment to purchasing of equipment such as video cameras, Google Glass and Oculus Rift. Working with library TLC staff, USS has increased the loaner laptops from 45 to over 100 laptops. The additional laptops have significantly helped reduce student wait time.
  3. USS was able to convert a one-button studio created by the staff of Princeton University into a one-button cart for UMD. The one button cart is a portable recording station for students and faculty to create videos. It can be used anywhere by just plugging it into a power outlet. Once plugged, the students can use it without assistance
  4. Last year, USS experimented with a 3D printer from Makerbot. They expanded their horizons and worked with other departments such as Public Services to open 3D printing services to the student community. In the beginning, students sent in requests to print souvenirs such as shot glasses but are now using the 3D printers for class assignments and projects. In this year alone, USS has successfully printed over 300 items, which equates to over 2,274 hours of printing. USS staff also provided over 25 consultations to students that needed assistance with creation and printing of their items. Our next task is mastering 3D scanning and how to provide needed support to our patrons who need help scanning 3D objects.
  5. USS also compiled statistics from Sysaid for 2014. As previously mentioned, 7,155 service requests were created in Sysaid. This number includes all departments that used Sysaid too. The service requests ranged from installation, troubleshooting, and resolving of problem reports from different services such as Researchport issue, catalog issues and various online database related problems. Of the 7,155 opened requests, USS closed 5,665 service requests, which is 79% of all service requests opened this year. In comparison to 2013, that is a 28% increase.
  6. In 2014, USS also expanded its community outreach initiatives. On April 26th, 2014, better known as Maryland Day, USS showcased many of our new gadgets in the Presidential Suite, which included the 3D printer and Google Glass. Students and alumni were very excited and engaged by the opportunity to see and experiment with our newest technology offered by the Libraries. For UMD’s homecoming, we were selected to showcase some of the Libraries newest technology available to the campus community. We were a big hit among attendees and experienced a lot of interest and excitement about our various services.
  7. On December 13, 2014, USS hosted ProjectCSGirls, a national nonprofit, dedicated to closing the tech gender gap by cultivating a love for technology and introducing computer science to girls starting from adolescence. This program attracted over 55 girls of various ages and from different schools. USS staff provided technical support that made the program run smooth and was certainly a success.

We want to thank everyone for their support and we look forward to an impactive, collaborative and innovative 2015 as we move USS to the forefront library sphere/services.

I will like to thank all User and System Support staff for all their hard work in accomplishing the projects list above.

USMAI (University System of Maryland and Affiliated Institutions) Library Consortium

OLE status: The Consortial Library Applications Support (CLAS) team put a lot of effort into the OLE project in the period from mid-October to mid-November, and at the CLD meeting on November 20 delivered a detailed report of what the team had learned and accomplished to that point. Work on OLE continues: Mark Hemhauser has focused on identifying the necessary data elements for requisitions to advance to encumbered purchase orders, for invoices to appear paid in the ledger/budget summary report, budget structure functionality, and initial investigation of consortial use of acquisitions. Hans Breitenlohner and David Steelman have been working closely with the systems librarians to analyze and address problems as they are uncovered. David has been collaborating with Mark to document problems with purchase orders and invoices, and has taken the initiative to file a number of specific issue reports in the Kuali organization’s JIRA feedback portal. David and Hans have both worked on the problem reported by Linda Seguin regarding display of bibliographic records with Hebrew characters.

Following up from the November CLD meeting, Heidi Hanson and Ben Wallberg (DSS) met with Lea Messman-Mandicott and Betty Landesman of the USMAI Next Generation ILS Working Group, along with Chuck Thomas and David Dahl, to discuss strategies that will support the USMAI in understanding and evaluating the OLE system. Specifically, we are looking into how to give members of the Next-Gen ILS Working Group (and/or its sub-groups) access to our local “OLE sandbox” for testing at some time early in 2015.

Aleph support: From mid-October to mid-December, David Wilt has responded to requests for 20 ad hoc reports for 8 different campuses, 4 parameter change/notice text changes for 3 different campuses, and a request for a RapidILL extract for College Park. David also wrote specifications for multiple recurring reports for College Park, which Hans has now added to the reports schedule.

Linda Seguin worked on a number of requests related to bibliographic record loading and clean-up. For brittle Hebraica items that College Park is having digitized for HathiTrust, Linda created a new item process status (IPS), modified the HathiTrust extract program, updated items, loaded bibs and suppressed holdings/bibs as appropriate. For Health Sciences (HS), Linda loaded Springer ebook records using their old special loader. Since practices have changed since the last time this loader was used, considerable data cleanup was needed post-load. Since HS reported that this would be their second-to-last load of Springer records, we decided it was not worth updating the loader program itself. Linda also worked on Ebrary record cleanup for Towson, deleting all Ebrary PDA records that were for unpurchased titles. Catching up on a backlog of updates to the Ebrary Academic Complete collection, Linda loaded 27 files of new records and processed 22 files of deleted records. A complex deletion specification had to be developed in order to avoid deleting ebook titles that TU also holds in other packages.

Mark Hemhauser worked on creating a licensing database report for USMAI licenses; modification of serials claim letter address for College Park; continuing to advise Towson and UMBC on their move to shelf ready and loader issues related to it; update of USMAI page on the shelf ready loader. Mark also did some maintenance support for the College Park journal review web tool.

David Steelman, responding to an Aleph Rx request from UMBC, created a new version of the Equipment Availability page that would allow UMBC to generate their own page with any equipment that they want, by providing the system numbers for the equipment in the page request.

ResearchPort, SFX (FindIt), EZProxy support: In November, we upgraded EZproxy to the newly released version, 5.7.44, allowing us to disable SSL v3, which is vulnerable to Poodle attacks (the security exploit, not the dog). Ingrid Alie worked on correcting A-Z targets list for the Center for Environmental Science (CE) Research Port journal section so that it is in alphabetical order. Ingrid also worked on correcting the ScienceDirect database cross search for Health Sciences, Towson, UM Eastern Shore, Bowie, UM College Park, Salisbury, UM Law, Morgan State, and UMBC, because Elsevier is retiring their Federated Search platform. Cross search is now working for all of these campuses. Ingrid also generated a list of all database configurations from the proxy server for Towson.

Support for USMAI Groups and Committees: Linda established 13 new Listserv email lists in support of USMAI advisory groups and subgroups, and communities of interest/practice.

Mark Hemhauser served as Chair and Heidi Hanson served as a member of the search committee for the Director of the CLAS team. We were busy with interviews in October and early November, and were very fortunate to bring the search to a successful conclusion. David Dahl will begin as the new Director for CLAS on January 12, 2015.

CLAS team gets a nod: Elaine Mael at Towson wrote an article about the merger of Baltimore Hebrew University into Towson’s collection. “ITD” (DSS’s former name) gets mentioned quite a bit:

http://search.ebscohost.com/login.aspx?direct=true&db=aph&AN=99263274&site=ehost-live

Staffing

Peter Eichman joined SSDR as our new Senior Software Developer.  Peter’s academic background is in linguistics and philosophy. He is a UMD alumnus (B.A.s in Linguistics and Philosophy), and also holds an M.A. in Philosophy from USC. He is very familiar with UMD, not only as a student but also as a staff member, having worked for the ARHU Computing Services office and the National Foreign Language Center as a web application developer.  Peter’s first project will heading the major Digital Collections upgrade to Fedora 4.

Solr System

We are in the process of integrating Apache Solr to work with our Fedora-based digital repository on the back-end.  I am not going to pretend that I know and understand all of the technical details about Solr, as listed on their home page.  My layperson interpretation of its features are as follows:

  1. Solr is a standalone enterprise search server with a REST-like API. I think this means that Solr runs on its own and can be accessed via a URL in a web browser.
  2. You put documents in it (called “indexing”) via XML, JSON, CSV or binary over HTTP. At UMD, our “documents” are the FOXML xml files where Fedora stores our metadata
  3. You query it via HTTP GET and receive XML, JSON, or CSV results. We can use a web browser and a URL to construct queries.

At UMD, we ingest content into our Fedora repository via two methods: a home-grown web-based administrative interface for adding images and via batch ingest (which currently requires developer assistance). We use the administrative interface to manage the metadata for our digital objects. However, the administrative interface has always lacked robust reporting capabilities. Solr includes a robust administrative interface of its own that allows for the construction of complex queries and reporting outputs. For me, as a user, this is Solr’s greatest benefit for us. Our Software Systems Development and Research team try whenever possible to put as much knowledge in the hands of the users.  It is a win-win situation.  For them, it eliminates having to answer and investigate really basic questions for me, and for me, it enables me to achieve results and do my work without having to depend on others.

Solr requires first the development of a schema, which is essentially a file that explains what we want to index and how.  Understanding how to read and interpret the schema is a first step to understanding how Solr works. First, you define fields, and these fields are related to our metadata.  In a simple example, a “Title” field in Solr is an index on the <title> tag in our Fedora metadata.  Within a field, we can define how the field acts.  For example, we have defined a field type of “umd_default” that runs a series of filters on our data.  These filters are the key to understanding how searching works in Solr. I’m going to use the following piece of correspondence as an example: Letter by Truman M. Hawley to his brother describing Civil War battle. Includes envelope, September 26, 1862. When Solr indexes this title it does a number of things.  Many of these things are customizable, and this is what is important to understand.

  1. It separates and analyzes each word and assigns locations to them. “Letter” is in location 1 and takes up spaces 0-5 (the space at the end of the word is included in the word)
  2. It determines the type of word (is it alphanumeric? Or just a number? 1864 is just a number)
  3. It removes punctuation. Finally, a place where no one cares about commas.
  4. It removes stopwords. We apply a “StopFilterFactory” filter to remove stopwords. These can be customized. In our system, “by,” and “to,” are considered stopwords and we do not index them.
  5. It converts everything to lower case. Solr does not have to do this. We apply a “LowerCaseFilterFactory”  with the assumption that our users will not need to place emphasis or relevancy on case in searches.
  6. We apply an “AsciiFoldingFilterFactory” that converts alphabetic, numeric, and symbolic Unicode characters which are not in the “Basic Latin” Unicode block into their ASCII equivalents, if one exists. So, for example, a search on “Munoz” will match on “Muñoz”
  7. We apply the “PorterStemFilter” to the Title field.  This filter applies an algorithm that essentially truncates words based on assumptions about endings. In the example above, “describing” becomes “describ” and “battle” becomes “battl.”
What we are left with is indexing on the following terms:

letter truman m hawlei hi brother describ civil war battl includ envelop septemb 24 1864 This means that I could run the following query in Solr q= Title:(truman AND civil AND describ AND battl) and receive this letter as a hit.  Solr still allows for the capability of phrase queries (“Letter by Truman M. Hawley”), or for wildcard searches: (Truman AND Hawl*). Our implementation of Solr currently assumes a boolean “OR” as the default operator in a search string. So, if I thought to myself, I am interested in looking for content having to do with the Civil War in the month of September, I might type into a search box something like “civil september.” How this translates based on our configuration is “Search the Title field for anything containing the term “civil” OR “septemb.” Here are just a few examples out of my over 300 results:

  • The Greek beginning,Classical civilization
  • The Classical age,Classical civilization
  • Ancient civilizations The Vikings
  • Ancient civilizations The Aztecs
  • Ancient civilizations The Mayans
  • The Civil War in Maryland Collection
  • The Celts,Ancient civilizations
  • Acts of faith,Jewish civilization in Spain
  • Brick by brick: a civil rights story

How is this possible? Well, if I investigate how our PorterStemFilter analyses “civilization,” I discover that it becomes “civil.”  Also, as a user, in my brain, I am thinking that I want results that have to do both with the Civil War AND September, and Solr is returning results that have to do with either.  If I manually adjust my search to be a boolean “AND” search – Title:(civil AND September), I only see three relevant results. This might lead me to believe that we should instantly change our default search to “AND” instead of “OR” since obviously, if I type a search into a box and it has two terms, I want to see records with both those terms.  Our current default in our public interface is “AND.” And also, we should turn off the PorterStemFilter because all of those “civilization” hits are annoying. If I want to search for “Civil*” I will search for “Civil*.”

But is it so simple? What is best for the user? What default settings will be most useful for our users? This is a different discussion and I will be working with my colleagues on the Collections side of things to try to answer some of these questions. Solr is so robust, and can be used to fit so many different situations, that truly configuring it in the most effective way is overwhelming, but also exciting.

Where is all of our digital stuff?

I like to think that we, at the University of Maryland, are not unlike other university libraries, in that we have a lot of digital content, and, just like with books, we have it in a lot of different places.    Unfortunately, unlike our dependable analog collections, keeping track of all of this digitized content can sometimes be unwieldy.   One of my big goals is to reach the point where an inventory of these digital collections can provide me with the equivalent of a “Shelf location” and statistics at the push of a button.  One project I have been working on has involved documenting and locating all of the UMD Libraries’ digital content, in a first step towards this goal.  I am focusing right now on things that we create or that we own outright, vs. content that comes to us in the form of a subscription database, which is a whole issue in itself. We don’t have one repository to rule them all in a physical sense. Rather, I like to think of our “repository” at present as an “ecosystem.” Here are some parts of our digital repository ecosystem.

DRUM (DSpace) http://drum.lib.umd.edu

Stats: Close to 14,000 records.  Approximately 8,800 of these are University of Maryland theses and dissertations.

DRUM is the Digital Repository at the University of Maryland. Currently, there are three types of materials in the collections: faculty-deposited documents, a Library-managed collection of UMD theses and dissertations, and collections of technical reports.  As a digital repository, files are maintained in DRUM for the long term. Descriptive information on the deposited works is distributed freely to search engines. Unlike the Web, where pages come and go and addresses to resources can change overnight, repository items have a permanent URL and the UMD Libraries committed to maintaining the service into the future.  In general, DRUM is format-agnostic, and strives to preserve only the bitstreams submitted to it in a file system and the metadata in a Postgres database.  DSpace requires the maintenance of a Bitstream Format Registry, but this serves merely as a method to specify allowable file formats for upload; it does not guarantee things like display, viewers, or emulation.  DSpace does provide some conversion services, for example, conversion of Postscript format to PDF.  DRUM metadata may be OAI-PMH harvested, and portions of it are sent to OCLC via the Digital Collections Gateway. A workflow exists to place thesis and dissertation metadata into OCLC. Most of DRUM is accessible via Google Scholar.

Digital Collections (Fedora) http://digital.lib.umd.edu

Stats: 21,000 bibliographic units representing over 220,000 discrete digital objects.

Digital Collections is the portal to digitized materials from the collections of the University of Maryland Libraries.  It is composed primarily of content digitized from our analog holdings in Special Collections and other departments. The University of Maryland’s Digital Collections support the teaching and research mission of the University by facilitating access to digital collections, information, and knowledge.  Content is presently limited to image files (TIFF/JPG), TEI, EAD, and streaming audio and video.  Fedora manages the descriptive metadata, technical metadata, and the access derivative file.   While Fedora can be developed to accept any format, our implementation currently only easily accepts TIFF and JPG images, and TEI-encoded/EAD-encoded XML documents. We are not currently using Fedora to inventory/keep track of our preservation TIFF masters.  Audiovisual records are basically metadata pointers to an external streaming system.  Fedora metadata may be OAI-PMH harvested, and portions of it are sent to OCLC via the Digital Collections Gateway.  Google does crawl the site and many resources are available via a Google search.

Chronicling America (Library of Congress) http://www.chroniclingamerica.loc.gov

Stats: We have currently submitted approximately 25,000 newspaper pages to the Library of Congress, and anticipate a total of 100,000 pages by August 2014.

Chronicling America is the website that provides access to the files created and submitted as part of the National Digital Newspaper Project (NDNP) grants.  We submit all files (TIFF, JP2, PDF, ALTO XML) to the Library of Congress, and they archive a copy.  We are currently archiving a copy locally, in addition to the copies archived by LoC.  One complete copy of each batch is sent to UMD’s Division of IT for archiving. In addition, Digital Systems and Stewardship saves a copy of each batch to local tape backup, and retains the original batch hard drive in the server room in McKeldin Library.

HathiTrust http://www.hathitrust.org

Stats: Nothing yet! Plan to begin submitting content in 2014

HathiTrust provides long-term preservation and access services to member institutions.  For institutions with content to deposit, participation enables immediate preservation and access services, including bibliographic and full-text searching of the materials within the larger HathiTrust corpus, reading and download of content where available, and the ability to build public or private collections of materials. HathiTrust accepts TIFF images and OCR files in either ALTO XML or hOCR.  They provide conversion tools to convert TIFF masters into JPEG 2000 for access purposes.

Internet Archive http://www.archive.org

Stats: Almost 4,000 books, with over 840,000 pages

The Internet Archive is a 501(c)(3) non-profit that was founded to build an Internet library. Its purposes include offering permanent access for researchers, historians, scholars, people with disabilities, and the general public to historical collections that exist in digital format. The UMD Libraries contribute content to the Internet Archive in two ways.  First, we submit material to be digitized at a subsidized rate as part of the Lyrasis Mass Digitization Collaborative.  The material must be relatively sturdy, and either not be in copyright, or we should be able to prove that we have permission from the copyright holder.  We have also been adding content digitized in-house (usually rare or fragile), and upload the access (PDF) files and metadata to the Internet Archives ourselves.  The Internet Archive produces JPEG2000 and PDF files at the time of digitization.  They produce both cropped and uncropped JPEG2000 files for each volume. The UMD Libraries saves locally and archives to the UMD Division of IT the cropped JPEG2000 files and the PDFs.

***

I am already aware of other types of digital content that we will have to track.  Born-Digital records and personal files from our Special Collections and University Archives.  eBooks in PDF and other formats that we purchase for the collection and have to determine how to serve to the public.  Publications, such as journals, websites, and databases.  Research data.  I hope to return to this post in 2020 and smile at how confused, naive, and inexperienced we all were at all of this.  Until then, I will keep working to pull everything together as best I can.