Fedora 4 Update

On behalf of DSS, I’m pleased to announce that we have passed a major milestone with our digital repository upgrades: Fedora 4, our next-generation repository, is now officially “in production,” meaning we can begin adding digital resources to it for management. DPI is already working on a plan for adding materials to the new repository, and individual stakeholders will hear more soon about their collections.
What does Fedora 4 mean for UMD Libraries?
To end users, this upgrade is essentially invisible. Fedora 4’s release is an architectural improvement–essentially, it is the new foundation on which we can build first-class digital collections and efficient workflows for asset management and preservation. Implementing Fedora 4 gives us:
  • Flexible, standardized data modeling. We will be able to handle a wider array of simple and complex content types, as well as a greater range of file formats.
  • Scalability. We’re not far away from thinking of our digital assets in terms of petabytes of data; Fedora 4 will enable us to manage those assets responsibly.
  • The potential for increased automation. Fedora 4’s application “hooks” and workflow triggers give us the ability to develop new automation scripts and integrations.
  • New technology options to eventually improve the experience of both internal library users and repository visitors. Two exciting next steps with Fedora 4 include selecting and evaluating a new administrative interface for staff, and implementing a new image viewer for newspaper content (based on IIIF, a framework for speedy, flexible image delivery backed by a number of high-profile libraries).
  • Increased participation in a robust, open community of institutions using Fedora. Rather than creating our own special, customized installation (which would become difficult to maintain over time), our team contributed code enhancements and feedback to the Fedora project, taking an active role in shaping the software platform.
What’s next?
 
DSS is already working on a few key repository projects:
  • Preparing the repository for the Diamondback ingest. This will be the first substantial collection loaded into Fedora 4, and to prepare for it, we are working on methods for batch loading, as well as implementing a new and improved image viewer.
  • Selecting, testing, and implementing an administrative interface for staff. We are researching our options for a new staff interface for Fedora 4 items, and will have more to share in the fall.
  • Planning for scaling our storage to meet our needs. A small task force will evaluate options and costs for high-capacity storage this fall, and we should begin implementing recommended improvements in 2017.
  • Organizing the backlog of materials for ingest, planning for digital preservation, public user interface research and migration of assets from our old Fedora repository–this work will continue throughout 2016 and into 2017.
Finally, please join me in thanking the many team members involved in this release from DSS. In particular, Josh Westgard, Mohamad Abdul Rasheed, Peter Eichman, and Ben Wallberg spent an untold amount of hours sweating the details, squashing bugs, questioning assumptions, and drawing on whiteboards to get UMD Libraries to this point, and they all deserve a hearty congrats.

Fedora 4 Digital Repository Implementation Project

I would like to take this opportunity to formally announce the launch of the Fedora 4 Digital Repository project, which aims to implement a new system to replace our existing 7-year-old Fedora 2.2.2-based Digital Collections.  Fedora has come a long way in the last several years and we are very excited about the possibilities offered by the newest version.  Because the differences between our older version and the latest are so diverse, this is a more complicated project than a simple upgrade.

An initial project planning group consisting of myself, Ben Wallberg, Peter Eichman, and Bria Parker, have outlined our primary objectives for the project:

  • Leverage repository improvements provided by Fedora 4 application
  • Migrate selected existing services and applications
  • Develop new features

You may read more about Fedora 4 as an application here: http://duraspace.org/node/2394.  Our complete objectives document is also available for reading: Fedora 4 Objectives.

It is important to note that we are hoping that this new repository will reduce some silos in our portfolio, and be more than just a place to house metadata and access copies of select digital assets.   We are moving forward with an awareness of the importance of a system to not just house, but manage, our digital assets, and to allow for more flexibility over who, what, when, where, and how our staff and our users can work with our content.

At a practical level, some of the changes/improvements we hope to make include:

  • Replacement of existing Administrative Tools interface with a community-developed and maintained application, such as Islandora.
  • Batch ingest mechanisms that can be user-operated and integrated with the Administrative Tools
  • Replacement of current homegrown metadata schemas with standard schemas, such as MODS and PREMIS
  • More advanced content model, allowing description and control of objects down to the node level, rather than at the descriptive record level
  • Enhanced user-generated reporting
  • Flexible authentication and authorization controls

This is a major project, one that will take approximately a year although we have yet to set firm milestones or deadlines. In the meantime, we are ceasing any major developments on the existing Fedora repository, with exception of crucial maintenance issues. We have noted and categorized existing outstanding metadata sweeps and will handle those during the migration process.  We appreciate your patience as we work on the new system, which will be a most welcome improvement.

Stew of the Month: August 2014

Welcome to a new issue of Stew of the Month, a monthly blog from Digital Systems and Stewardship (DSS) at the University of Maryland Libraries. This blog provides news and updates from the DSS Division. We welcome comments, feedback and ideas for improving our products and services.

New Technologies

This past summer, User Services and Systems (USS) initiated a project with the Public Services Division to convert the former Reference desk space in the front of McKeldin Library into a “Laptop Bar” to provide seating and power for students using their personal laptops in the library.  USS acquired power surge protectors in the shape of pyramids to be placed on the tables for student use. PSD acquired bar-style chairs for the area. The Laptop Bar was completed by the beginning of the Fall 2014 semester and has been a major success. Students started using the space immediately. Below are before and after photos:

Before:

 b41b43 b42

After:

aft1 a2  a3af4

Collection-building

Statistics

In the process of gathering our ARL statistics for FY2014, we can note the following increases in our Digital Collections and DRUM holdings since June 30, 2013 (2013 numbers in brackets):

  • Images/Manuscript records in Digital Collections: 17,376  [13,990]
  • Film Titles in Digital Collections: 2,673 [2232]
  • Audio Titles in Digital Collections: 356 [200]
  • Internet Archive titles: 4,382 [3,906]
  • Prange Digital Children’s Book Collection: 7,936 [4,450]
  • DRUM (e-theses and dissertations): 9,511
  • DRUM (technical reports & other): 5,581
  • DRUM TOTAL: 15,092

Those numbers are the result of hard work from staff throughout DSS, as well as content selectors and creators from throughout the Libraries.

ArchivesSpace

ArchivesSpace is the open source archives information management application for managing and providing web access to archives, manuscripts and digital objects.The UMD Libraries has been running a sandbox version of ArchivesSpace for use by Special Collections and University Archives for many months.  In August, DSS completed a Service Level Agreement for the production version of ArchivesSpace, and Paul Hammer (SSDR) converted the existing sandbox server to a production instance.

Prange Digital Children’s Book Collection

We are proud to announce that all of the Prange Digital Children’s Books (8082 of them) have been loaded into our Fedora Digital Collections repository.  However, as is often the case, the final cleanup takes the longest amount of time.  Paul Hammer (SSDR) and Jennie Levine Knies (DPI) worked together with Amy Wasserstrom  and Kana Jenkins in the Prange Collection to troubleshoot the final 200 books that have load issues. Graduate Assistant Alice Prael (DPI) also assisted in cleaning up duplicates and comparing data lists in order to help identify the problem records.

Aeon

On August 1, Special Collections and University Archives officially began using a hosted version of Atlas System’s Aeon software. Aeon is automated request and workflow management software specifically designed for special collections, libraries and archives. Jennie Knies and Paul Hammer worked with Special Collections staff to implement request buttons in both ArchivesUM and Digital Collections to pass metadata to Aeon forms to automate the patron request process.

Digitization Activities

Robin Pike worked with vendors and collection managers to solidify digitization contracts for materials that will be sent to digitization vendors during FY15. The formats represented in the digitization projects include books, serials, pamphlets, photographs, microfilm, open reel audio tape, wire recordings, VHS tape, and 16mm film. The collection areas represented in the projects include Special Collections and University Archives (labor collections, university archives, mass media and culture, rare books, Prange collection materials), Special Collections in Performing Arts, Library Media Services, and Hebrew language materials from the general collection.

Digitization assistants completed projects for the campus community. Audrey digitized Athletics media guide covers that will be used to produce posters, which will be gifts for an upcoming alumni event. Several assistants digitized photos of Terrapin football players, which will be used in the new Terrapins in the Pros interactive exhibit at the Gossett Team House.

Abby digitized Mid-Atlantic Regional Archives Conference programs. Additional MARAC publications will be digitized this year, both in-house and through the Internet Archive, making this regional resource more available to archivists everywhere.

Software Development

Working with the Web Advisory Committee, Shian Chang and Cindy Zhao completed a refresh of the Libraries’ Website interface.  The update includes addition of the new UMD responsive wrapper, as required by a new campus brand integrity program (see http://brand.umd.edu/websitepresentation.cfm), change of the main menus seen on every page to a new “mega menu” dropdown style, enabling users to view more options with integrated explanatory text, and new social media image bar on the bottom of homepage.  This refresh is part of a general plan for constant, iterative improvements to the website and a specific plan to ultimately convert the entire site to a responsive design.

SSDR has been planning on adding Solr client capabilities to Hippo CMS for some time, but discovered recently that Hippo CMS 7.8 comes with a  Solr Integration feature out-of-the-box, supporting both index/search for internal Hippo documents and search for external documents.   Mohamed Abdul Rasheed reviewed the functionality and determined the external search feature capable of handling our needs.  He started work migrating our existing Digital Collections interfaces (Digital Collections, Jim Henson Works, World’s Fair) to the new Solr based search as well as adding new database searches for Special Collections in Performing Arts (SCPA) scores and recordings databases. The databases will continue to be maintained by SCPA staff in FileMaker Pro but exported to CSV, imported into Solr, and exposed through the Libraries’ Website for search and discovery.

Services

USMAI (University System of Maryland and Affiliated Institutions Consortium)

Kuali OLE (Open Library Environment) implementation: Consortial Library Applications Support (CLAS) team members have been participating in weekly teleconferences with University of Pennsylvania staff who are working on UPenn’s OLE implementation. Both groups are discovering that key implementation documentation necessary for bringing up a test instance is missing. At present, we have OLE software installed on a local server, but it is populated with demo data. We have not yet been able to load our own data for testing. We are hopeful that forthcoming teleconferences will provide the information and guidance we need to proceed.

USMAI Advisory Groups: As interim Chair of the Digital Services Advisory Group, Mark Hemhauser completed a first meeting with the Reporting and Analytics Subgroup and the Metadata Subgroup, where he shared the information from CLD about Advisory Group funds and reporting plans. Mark also shared information on membership terms and the group chairs with the USMAI Executive Director. The CLAS team also compiled a list of current email lists and reflectors supporting USMAI communications and sent it to the Executive Director. Linda Seguin revised the Groups page on the USMAI staff web site, added new group pages, and created and distributed editing logins to each advisory group/subgroup.

SFX support: Linda revised SFX parsers to get both Romanized and vernacular text in Aeon request form for College Park’s Prange collection. Linda revised the Aleph Source Parser to get publication information from the new(ish) MARC 264 field for use in SFX linking. Linda and Ingrid Alie added the HathiTrust local target to Salisbury University’s and the UM Health Sciences and Human Services library’s SFX instances.

Circulation support for USMAI: David Wilt set up new Item Statuses in Aleph for the University of Baltimore and College Park; produced ad hoc reports for Frostburg, Bowie, Towson, University of Baltimore, College Park, Saint Mary’s, and UMBC; and completing a patron load for Eastern Shore. David also worked on setting up the booking function in Aleph for Shady Grove.

Acquisitions/serials support for USMAI: Mark exported data from the USMAI licensing database for College Park’s licensing evaluation project; produced a variety of subscription reports for College Park as part of a database clean-up project; produced a special claims report for Morgan State; and helped staff at the University of Baltimore identify a problem with dirty order data after fiscal rollover and provided training on order closing procedures and order clean-up. Mark also flipped the budget code to make corrections on 75 orders, saving UB staff a lot of manual effort.

Aleph database support for USMAI: Linda and Hans Breitenlohner ran a new extract of College Park holdings for their participation in HathiTrust. Linda sent a sample file of book records to RapidILL for UMBC. Linda also deleted withdrawn/purged items for UMBC, College Park and Health Sciences, and with assistance from Heidi Hanson, loaded bibliographic record sets for UMBC, the Center for Environmental Science, and Health Sciences.

Aleph system support: The CLAS team and DSS staff are monitoring a recent pattern of Aleph slowdowns that have been occurring this month. We are currently restarting the Aleph server manually when slowness is reported.

Staffing

Peter Eichman joined DSS as a Contingent-I Systems Analyst in SSDR, providing broad software development support for UMD and Consortial applications. Peter is a UMD alumnus (B.A.s in Linguistics and Philosophy), and has also worked for the ARHU Computing Services office and the National Foreign Language Center as a web application developer.   Peter started on August 19 and is currently working on improvements to Aleph Rx, the DSS issue tracking tool for Aleph.

On August 22, Josh Westgard, graduate assistant in DPI, graduated from the iSchool’s MLS program in Curation and Management of Digital Assets.

Ann Levin, the DSS Project Manager, left the UMD Libraries in August.  Ann made a significant impact during her time with DSS, developing documentation procedures and working on several projects, most notable the Prange Digital Children’s Book Collection.

Amrita Kaur joined the DSS staff as the Coordinator. Amrita has worked for the University Libraries for many years, and was most recently in the Architecture Library. Welcome, Amrita!

Events

The Historic Maryland Newspapers Project hosted UMD Libraries’ first public Wikipedia edit-a-thon on August 18. 24 people attended, either in-person or virtually through an Adobe Connect meeting (recording available here https://webmeeting.umd.edu/p37wtrvy3iw/). We invited speakers from Wikimedia DC, the Library of Congress, as well as our own Doug McElrath, Jennie Knies, and Donald Taylor, to share information about resources to be used during the editing portion of the event. Participants enhanced and added articles related to Maryland newspapers and Wikimedia DC’s Summer of Monuments project and uploaded digitized images from our National Trust Library Historic Postcards Collection to WikiCommons.

Conferences and Workshops

Trevor Muñoz, Karl Nilsen, Ben Wallberg, and Joshua Westgard attended the Code4Lib DC 2014 conference at George Washington University on August 11-12.  Josh Westgard led a session on spreadsheets.  This was a topic he suggested at the start of the unconference planning, so the unconference protocol was for him to moderate the discussion.  The participants in the session talked about strategies and tools for managing data stored in spreadsheets, or data that must pass through a spreadsheet while migrating from one storage location to another.  One highlight of the discussion was the description of csvkit (https://csvkit.readthedocs.org), a Python module for the cleanup and manipulation of data stored in csv files. A breakout group split off in order to begin learning csvkit later in the conference.

Josh Westgard attended a one-day workshop on “Building Data Apps with Python” offered by District Data Labs (http://www.districtdatalabs.com).  The workshop covered application set up, best practices for application design and development, and the basics of building a matrix factorization application.

Jennie Knies, Liz Caringola, Robin Pike and Eric Cartier attended the Society of American Archivists annual conference in Washington, DC on August 11-16. Robin currently serves as the chair and Eric serves on the steering committee of the Recorded Sound Roundtable. Robin chaired and presented on the panel session Audiovisual Alacrity: Managing Timely Access to Audiovisual Collections. Eric contributed audiovisual clips from UMD’s collections for the first AV Archives Night, a networking event featuring content from attendees’ repositories, hosted by Audiovisual Preservation Solutions at the Black Cat. Liz Caringola was a panel speaker for the session “Taken for ‘Grant’ed: How Term Positions Affect New Professionals and the Repositories That Employ Them.” Karl Nilsen gave a talk on database curation and preservation as a part of a panel on stewarding complex objects. Download the slides from DRUM: http://hdl.handle.net/1903/15573. His talk was based on Research Data Services’ efforts to curate and preserve the Extragalactic Distance Database, an online data collection that was created by astronomers at UMD and other institutions.

Liz Caringola attended one of the weeklong Humanities in Learning and Teaching (HILT) workshops offered by MITH “Crowdsourcing Cultural Heritage.”  Karl Nilsen completed the HILT digital forensics course.

Why Hippo CMS?

In the Spring of 2011, Software Systems Development and Research (SSDR) and the Web Advisory Committee (WAC) were in the process of implementing a new Content Management System (CMS) to run the Libraries’ Website. Babak Hamidzadeh, newly-hired Associate Dean for Digital Systems and Stewardship (DSS), and I discussed the difficulty of supporting our then-diverse technology stack and possible alternatives.  At the same time, WAC was expressing dissatisfaction with the CMS in progress, a proprietary ASP.NET based application, which had been chosen somewhat unilaterally.  So we decided along with WAC to scrap that CMS and begin a joint selection and review process for a new CMS.

The selection process began in June with the creation of a requirements matrix containing author, editor, and technical requirements and desiderata.  After a process of seeding the matrix with CMS candidates we began with a documentation review, followed by installation and  testing of a subset of candidates, a gradual narrowing of the field, and finally selection of the chosen CMS, Hippo CMS, in October 2011.

We maintain several Drupal-based sites and there is wide Drupal adoption in the library community.  It has features which would have made it a good choice but was not selected.  Here are some of the reasons why we selected Hippo CMS.

Commercial Open Source: Hippo CMS is provided by Hippo B.V. as Commercial Open Source.  They maintain the code under a dual license: the community edition is released under the Apache 2.0 license which is great because we can trial and then run in production the fully functional application without committing any financial resources.  On the other hand an enterprise edition is available which provides add-on functionality and additional support services when we need them.  We originally brought the application into production using the community edition.  Once we established that Hippo CMS was working well for us, both for the developers and users, and that it had become a mission critical application we decided to pay for an enterprise edition support contract to help grow our Hippo based services into the future.

Java Enterprise: In thinking about a standard technology stack we realized that we already support Java Enterprise-based software like Fedora Commons and DSpace and in the future Kuali OLE.  Since we needed to build a development team with Java experience anyway we strongly desired to leverage our investment in Java training and hiring.  Hippo is implemented as standard Java Web Applications utilizing Apache Wicket for the CMS and the Spring Framework for the Hippo Site Toolkit (HST).  Apache Maven is utilized for dependency management, building  and for running the local server environment. These mean we can leverage our existing development and support environment with Java, Eclipse, Tomcat, etc.

Folder Based Hierarchy:  Hippo CMS content is stored as documents within a folder heirarchy, with author and editor permissions being configurable and able to apply to a folder and all child folders and content.  This was a firm user requirement for the CMS and the primary factor which knocked Drupal out of contention.  We have some experience implementing this model in our staff intranet using the Node Hierarchy  Drupal module.  However it has been kludgy and difficult to support so we decided that a CMS which natively supported this model would be best. The folder model  for content maintenance empowers (and reduces stress) for our users by providing an environment familiar to them from their desktop experience.

Multi-site, Multi-channel, Multi-lingual: Hippo CMS provides a highly configurable and scalable architecture.  All content is stored as documents (not pages) and the HST mapping system allows that content to be routed and transformed to various domains, websites, published modes (published v preview), and languages.  We use this mechanism to deliver the same content to standard desktop websites and mobile specific websites.  The out-of-the-box support for multi-lingual distribution was important as our Gordon W. Prange Collection website had previously existed in a custom JSP based webapp which was difficult to support and difficult for users to maintain.  Now they can easily maintain the English and Japanese versions of the site.

Development Lifecycle: Our preferred development platform is the developer’s workstation, standardized for us on Mac OS X, and then promote to Test, Assurance, and Production servers running RHEL with Apache Tomcat and Apache http server. The Java Entreprise features mentioned above make this an easy fit for us.  In addition Hippo provides a nice content bootstrap mechanism whereby repository content (documents, assets, and configuration) can be serialized to XML files.  These bootstrap files are used to both rebuild a repository from scratch, done frequently in development, and update an existing repository, as when promoted to server environments.  This also means that content and configuration changes can be included under version control instead of being trapped in the database and requiring manual recreation in each environment through an administrative interface.

Of course any technology has trade-offs and areas which are not as good a fit.  First is that Hippo CMS is not a trivial application and requires a lot of overhead to learn.  Unlike Drupal, Hippo implementation requires a software developer and cannot be accomplished with a less technical staff.  Second is that Hippo CMS is not as widely known and used as other CMSs and therefore we don’t have a very good chance of hiring a developer with previous Hippo experience, so everyone we hire will require the full course of training and lead time before they can begin contributing significantly.  In the same vein, Hippo utilizes the Apache Jackrabbit JCR implementation for its repository, rather than a standard relational database, which means we have a much smaller ecosystem of tools and support available and therefore increased overhead in learning and using the repository.

Despite these drawbacks we did select Hippo CMS.  Work on the new website began immediately in October 2011 and resulted in a soft release of the new home page, Library News blog, and Library Hours in May 2012.  From May until November  we ran the old and new sites in parallel, migrating content, and ending in the complete shutdown of the old site.  Since then we have realized the promise of Hippo CMS by providing a platform for diverse library staff to comfortably create and maintain content and by bringing online new features, such as:

Look for additional features coming in 2014.