Stew of the month: April 2015

Welcome to a new issue of Stew of the Month, a monthly blog from Digital Systems and Stewardship (DSS) at the University of Maryland Libraries. This blog provides news and updates from the DSS Division. We welcome comments, feedback and ideas for improving our products and services.

Digitization Activities

We have received files for the 159 1/4″ open reel audiotape recordings from the WAMU Archives that were digitized by a vendor as part of the DIC digitization project proposal process; quality assurance will be completed over the next month and the recordings will be uploaded to UMD Digital Collections.

More than 300 recordings from Arthur Godfrey’s 1949-1950 radio shows are now available online in UMD Digital Collections (restricted to campus or VPN log-in from off campus due to copyright restrictions). These recordings are a small part of the Arthur Godfrey collection held by Special Collections, Mass Media and Culture. In 2012, Robin Pike worked with Chuck Howell, Carla Montori, and other support staff to prepare the wire recordings for digitization by a vendor; Joanne Archer, her GAs, and Bria Parker enhanced the minimal metadata after all the files were received in 2014; and Eric Cartier and Josh Westgard recently completed the ingest. The same vendor is currently digitizing 40 additional recordings as part of the FY15 DIC digitization project proposal process.

Digitization assistants digitized and provided images for the student posters presented in the Hornbake Library lobby on Maryland Day, as part of an assignment about the history of campus for the “MAC to Millennium: History of the University of Maryland” class taught by Anne Turkos and Jason Speck.

Historic Maryland Newspaper Project

The Historic Maryland Newspapers Project sent its first production batch to the digitization vendor earlier this month. The Catoctin Clarion, first published in 1871 in Mechanicstown (modern-day Thurmont), Maryland, is the first title to be digitized during this grant cycle. We will digitize the run ending in 1922.

Several representatives from DSS and Doug McElrath from Special Collections met with staff at the Maryland State Archives on April 6 to discuss the future of the Historic Maryland Newspapers Project and to begin making plans for digitizing content outside of the current National Digital Newspaper Program (NDNP) grant.

Digital  Programs and Initiatives

Software Development

The project to update the Libraries’ Website to a Responsive Web Design based interface is progressing through the initial design phase.  We have completed wireframing and are now creating static HTML mockups using the Unify template.  These mockups are used as prototypes to select and refine the features and layout of the new site, in close coordination with the Web Advisory Committee.  You can follow our progress on the Website RWD Mockups page hosted in GitHub.

The first of two sprints to refactor the Exhibit website is complete.  We are converting the  Beyond the Battle: Bladensburg Rediscovered special collections exhibit into a generic Exhibit template which can be used to create multiple websites.  New, hosted websites for the Library of American Broadcasting Foundation and the Roshan Initiative in Persian Digital Humanities project are scheduled for release on June 1 using the new template.

After review of the Wufoo technical limitations we encountered for use in the online student application project, we have decided that trying to create a workaround using Wufoo will be too costly to create and maintain so we will implement the form in Drupal.  The disadvantages of this implementation are the increased developer time necessary to create the form and the inability of Human Resources staff to update the form at will.  This technical problem has put the project behind schedule so to make up time we will pull additional developers off of the Fedora 4 implementation in order to make up some ground.  Release of the production Fedora 4 instance will be delayed until June.  We did however fulfill our commitment to participate in community development of the new Fedora 4 Audit Service core feature.

User and System Support

Victoria Quartey with 3D printer
Victoria Quartey with 3D printer

User & Systems Support (USS) staff volunteered on Maryland Day 2015, showing Library visitors the “Maker” services that are available in the Libraries. In the lobby of Mckeldin, USS demonstrated 3D printing and 3D scanning. When visitors came into Mckeldin Library, they were welcomed by seeing miniature testudos printing from a 3D printer.  Many visitors were amazed and wanted to learn more about 3D printing. The printed testudos were handed out to the visitors which brought huge smiles to both parents and children. Many students were amazed that 3D printing is available in the Libraries. While some students started thinking about what they can send to have printed, other students were eager to learn how to get certified to use the Library 3D printers on their own.

3D scanning demo with Preston Tobery.
3D scanning demo with Preston Tobery.

The 3D scanning demo in the lobby was also very popular with the visitors of the library. Using a laptop and Xbox Kinect camera, approximately 80 visitors had 3D scans taken of them. Visitors were able to watch how the 3D scans were made, in real-time, on one of the lobby’s TV screens. Each visitor that was 3D scanned will receive a copy of their 3D scanned file through email. Another TV screen in the lobby featured a short video on the process of creating a 3D printed replica of the Jim Henson & Kermit statue that’s outside Stamp Student Union. A huge 3D printed model of the statue was displayed for all visitors to see.

USS staff were also present in the John & Stella Graves Makerspace, on the 2nd floor of Mckeldin Library, which was open from 10am -4pm during Maryland Day. Approximately 90 visitors stopped by that day. Many visitors were undergraduates in varying majors, such as special education, mechanical engineering, digital media and computer science. The diverse crowd of students, and other visitors, continued to support the idea of the non-exclusivity environment that’s in in the Libraries. The visitors were also interested in the other technologies and services the Libraries offered to the students and local community. There were discussions about the vinyl cutter, desktop 3D scanner, 3D printer and Oculus Rift that’s in the Makerspace. However, other services like the TLC Loner Program were discussed to let the visitors know that students could rent laptops, camcorders, iPads, and other equipment for their desired reason. Even though the Google Glass wasn’t included in the planned showcase, many visitors were still interested in it and wanted to try the device on. Many were amazed by the opportunity. Since, the 3D printer in the Makerspace was printing miniature testudos, some visitors were treated with a small training session and demo on the 3D printer. USS staff briefly showed them how to unload and reload the plastic filament, used the Makerbot desktop application, and how 3D prints are removed from the build plate.

The USS volunteers expressed that they enjoyed showing these Library services on Maryland Day. And the Library visitors seemed to enjoy it as well. Throughout the day, visitors and alumni not only expressed how surprised they were that the Libraries have these 3D Maker services, they also were surprised that they were currently available to all students. One visitor who works at a Library in California was surprised to see how advanced our Library is. And, one alumni even stated, “ I wish I would have stayed in school longer”.

USMAI (University System of Maryland and Affiliated Institutions) Library Consortium

MD-SOAR

The setup of the Maryland Shared Open Access Repository (MD-SOAR) continues to progress. DSS created a DSpace “sandbox” for the MD-SOAR institutional partners to begin getting familiar with the DSpace repository solution. Initial configuration of the production instance of DSpace was completed after two weeks of work by software developers. The completed repository is expected to be ready for individual institutions by June 15th for their own local “launches” of the repository. This is a 2-year pilot project that will provide a repository solution to many of the libraries within the consortium plus a few other Maryland academic libraries. It is very encouraging to see so many libraries working together in support of providing access to their collections in an open environment and for DSS to be able to support this kind of initiative based on our technical expertise and DSpace experience.

Support to USMAI

The CLAS team responded to 88 Aleph Rx submissions and 27 e-resource requests from across the consortium’s libraries in April. Amongst the service requests were continued work on setting up YBP shelf-ready orders from YBP and working with Morgan State on getting EBSCO Discovery Services configured for use at their campus.

Kuali OLE

Six members of USMAI libraries were nominated, and graciously agreed, to help with testing and evaluating Kuali OLE. The testers met with CLAS on April 23rd to begin discussing OLE and creating a plan for evaluating it on behalf of the consortium. The group will prepare a report to be presented to, and discussed with, the Council of Library Directors at their September meeting. Thank you to Audrey Schadt, Austin Smith, Betty Landesman, Conrad Helms, Virginia Williams, and Vicki Sipe for their participation.

CLAS continues to monitor and contribute to the progress of version 1.6 of OLE. Seven OLE tickets are currently assigned to members of the team. Once released, the team will install and configure the new version for local testing. Team member continue to attend weekly implementation meetings with other OLE partners as those institutions move closer to implementing OLE. There is a lot to be learned from these shared experiences!

Conferences, workshops and professional development

ETDG news: Eric Cartier will be rotating off as co-chair in May, having completed his one-year term. Eric and Liz solicited self-nominations for the next co-chair and will announce the new co-chair at the May 20 ETDG meeting.

Edible Book Fair: Hornbake Plaza
Edible Book Fair: Hornbake Plaza

Eric helped to organize the 3rd Annual Maryland Edible Book Festival on April Fool’s Day. The popular event occurred in front of Hornbake Library. DCMR staff contributed the following edible books: Things Fall Apart, Beer and Loafing in Las Vegas, and The Pound and the Curry

Heidi Hanson attended a program sponsored by USMAI’s User Experience subgroup. The program featured EBSCO’s VP of User Experience Kate Lawrence. She discussed UX tips and observations based on EBSCO’s ethnographic research on college students. Among the observations was “results are the new black”. Ask Heidi for details…

David Dahl presented at a Google Analytics program sponsored by the USMAI Reporting & Analytics subgroup. The program was well-received and also included a good discussion and lightning presentations from several others in the consortium. There is a lot of interest in making better use of web analytics amongst the consortium’s libraries.

Fedora 4 Digital Repository Implementation Project

I would like to take this opportunity to formally announce the launch of the Fedora 4 Digital Repository project, which aims to implement a new system to replace our existing 7-year-old Fedora 2.2.2-based Digital Collections.  Fedora has come a long way in the last several years and we are very excited about the possibilities offered by the newest version.  Because the differences between our older version and the latest are so diverse, this is a more complicated project than a simple upgrade.

An initial project planning group consisting of myself, Ben Wallberg, Peter Eichman, and Bria Parker, have outlined our primary objectives for the project:

  • Leverage repository improvements provided by Fedora 4 application
  • Migrate selected existing services and applications
  • Develop new features

You may read more about Fedora 4 as an application here: http://duraspace.org/node/2394.  Our complete objectives document is also available for reading: Fedora 4 Objectives.

It is important to note that we are hoping that this new repository will reduce some silos in our portfolio, and be more than just a place to house metadata and access copies of select digital assets.   We are moving forward with an awareness of the importance of a system to not just house, but manage, our digital assets, and to allow for more flexibility over who, what, when, where, and how our staff and our users can work with our content.

At a practical level, some of the changes/improvements we hope to make include:

  • Replacement of existing Administrative Tools interface with a community-developed and maintained application, such as Islandora.
  • Batch ingest mechanisms that can be user-operated and integrated with the Administrative Tools
  • Replacement of current homegrown metadata schemas with standard schemas, such as MODS and PREMIS
  • More advanced content model, allowing description and control of objects down to the node level, rather than at the descriptive record level
  • Enhanced user-generated reporting
  • Flexible authentication and authorization controls

This is a major project, one that will take approximately a year although we have yet to set firm milestones or deadlines. In the meantime, we are ceasing any major developments on the existing Fedora repository, with exception of crucial maintenance issues. We have noted and categorized existing outstanding metadata sweeps and will handle those during the migration process.  We appreciate your patience as we work on the new system, which will be a most welcome improvement.

Knight News Challenge: Libraries. Our application…

The Knight Foundation recently issued a news challenge: How might we leverage libraries as a platform to build more knowledgeable communities? Here at the University of Maryland Libraries, we felt that we had an idea.

Improving Discovery in Digital Newspapers through Crowdsourcing the Development of Semantic Models

“We will develop tools that enable users of digitized newspapers to intuitively create connections between the concepts, people, places, things, and ideas written about in the newspaper pages, which will facilitate further discovery and analysis by researchers at all levels.”
The process of working on this application was fun and inspiring.  Our Associate Dean for Digital Systems and Stewardship, Babak Hamidzadeh, had the original vision. He enlisted myself (Jennie Knies) and Liz Caringola, our Maryland Historic Newspapers librarian, to help flesh out some of the ideas.  The UMD Libraries’ Communications director, Eric Bartheld, and our Director of Development, Heather Foss, also contributed. Ed Summers (MITH) and Dr. Ira Chinoy (Journalism) provided excellent feedback and encouragement. Rebecca Wilson, the UMD Libraries’ graphic designer, created this compelling graphic under a very tight deadline.
 KnightProposalImage
The application itself had very strict word/character requirements, which was a fascinating challenge in itself.  750 characters (that includes spaces!) to communicate the entire idea?
We think that we are uniquely positioned to develop these types of tools – we have the enthusiasm, the content (thanks to the Maryland Historic Newspapers project and to Chronicling America), and the resources and expertise to make this a reality.  Fingers-crossed that we get a lot of “applause!” There are a lot of amazing proposals for the Knight Foundation to choose from, but I hope we get to be one of them.

UMD Libraries Join BitCurator Consortium as Charter Member

The University of Maryland Libraries are in the midst of working on policies, procedures, and workflows for managing born-digital content.  3 1/2″ and 5 1/4″ floppy disks, along with Zip disks, CD-ROMs, and DVDs already live within the archival and manuscript collections within Special Collections and University Archives.  The challenges involved in preserving these media and the content stored on them are numerous.  Often, equipment or software necessary to use older disks is obsolete or unavailable.  The disks themselves may become damaged due to misuse, or, simply, time. Law enforcement agencies who need to read hard drives and other media for forensic research have been at the forefront of developing hardware, software and other tools to work with older media.  Funded by the Andrew W. Mellon Foundation, BitCurator is a tool designed specifically for libraries and archives.  It is a fully-contained system that contains easy-to-use interfaces to allow for some standard activities necessary for copying, reading, and curating digital media. For the University of Maryland Libraries, the existence of BitCurator has saved us from having to reinvent the wheel when it comes to beginning our born-digital activities.  Our main installation lives in Hornbake Library, on our Forensic Recovery of Evidence Device (FRED).  This fall, two graduate assistants, Amy Wickner (Special Collections and University Archives) and Alice Prael (Digital Programs and Initiatives), will pick up where the UMD Libraries’ Born-Digital Working Group left off earlier this year to finalize some our basic born-digital workflows.

The BitCurator Consortium operates as an affiliated community of the Educopia Institute, a non-profit organization that advances cultural, scientific, and scholarly institutions by catalyzing networks and collaborative communities to facilitate collective impact. The University of Maryland Libraries have signed on as a charter member and are delighted to be involved in this endeavor.

“Managing born-digital acquisitions is becoming a top concern in research libraries, archives, and museums worldwide,” shares co-founder Dr. Christopher (Cal) Lee. “The BCC now provides a crucial hub where curators can learn from each other, share challenges and successes, and together define and advance technical and administrative workflows for born-digital content.” Co-founder Dr. Matthew Kirschenbaum adds: “Tools without actively invested communities wither on the vine, become dead bits. The BCC is not just an extension of BitCurator, in a very real sense it will now become BitCurator.”

Institutions responsible for the curation of born-digital materials are invited to become members of the BCC. New members will join an active, growing community of practice and gain entry into an international conversation around this emerging set of practices. Other member benefits include:

•    Voting rights
•    Eligibility to serve on the BCC Executive Council and Committees
•    Professional development and training opportunities
•    Subscription to a dedicated BCC member mailing list
•    Special registration rates for BCC events

BitCuratorConsortiumCharter-InvertNoAlpha-300

Cool Tools: High Performance Sound Technologies for Access and Scholarship (HiPSTAS!)

I was delighted and intrigued to read an article in the March 26, 2014 web edition of the Chronicle of Higher Education: Scholars Collaborate to Make Sound Recordings More Accessible.  It described a project spearheaded by Tanya Clement, former University of Maryland employee, creator of In Transition: Selected Poems by the Baroness Elsa von Freytag-Loringhoven, and now assistant professor at the University of Texas at Austin.

I am always on the lookout for “cool tools” that we may consider using some day for our own work, and their are a lot out there. The HiPSTAS Research and Development with Repositories (HRDR) project is funded by an NEH Institute for Advanced Topics in the Digital Humanities grant to develop and evaluate a computational system for librarians and archivists for discovering and cataloging sound collections.  From the HiPSTAS blog:

The HRDR project will include three primary products: (1) a release of ARLO (Automated Recognition with Layered Optimization) that leverages machine learning and visualizations to augment the creation of descriptive metadata for use with a variety of repositories (such as a MySQL database, Fedora, or CONTENTdm); (2) a Drupal ARLO module for Mukurtu, an open source content management system, specifically designed for use by indigenous communities worldwide; (3) a white paper that details best practices for automatically generating descriptive metadata for spoken word digital audio collections in the humanities.

 

I, for, one, am looking forward to the output of this project, and at the prospect of a faster way to increase access to our fragile sound recordings.

Solr System

We are in the process of integrating Apache Solr to work with our Fedora-based digital repository on the back-end.  I am not going to pretend that I know and understand all of the technical details about Solr, as listed on their home page.  My layperson interpretation of its features are as follows:

  1. Solr is a standalone enterprise search server with a REST-like API. I think this means that Solr runs on its own and can be accessed via a URL in a web browser.
  2. You put documents in it (called “indexing”) via XML, JSON, CSV or binary over HTTP. At UMD, our “documents” are the FOXML xml files where Fedora stores our metadata
  3. You query it via HTTP GET and receive XML, JSON, or CSV results. We can use a web browser and a URL to construct queries.

At UMD, we ingest content into our Fedora repository via two methods: a home-grown web-based administrative interface for adding images and via batch ingest (which currently requires developer assistance). We use the administrative interface to manage the metadata for our digital objects. However, the administrative interface has always lacked robust reporting capabilities. Solr includes a robust administrative interface of its own that allows for the construction of complex queries and reporting outputs. For me, as a user, this is Solr’s greatest benefit for us. Our Software Systems Development and Research team try whenever possible to put as much knowledge in the hands of the users.  It is a win-win situation.  For them, it eliminates having to answer and investigate really basic questions for me, and for me, it enables me to achieve results and do my work without having to depend on others.

Solr requires first the development of a schema, which is essentially a file that explains what we want to index and how.  Understanding how to read and interpret the schema is a first step to understanding how Solr works. First, you define fields, and these fields are related to our metadata.  In a simple example, a “Title” field in Solr is an index on the <title> tag in our Fedora metadata.  Within a field, we can define how the field acts.  For example, we have defined a field type of “umd_default” that runs a series of filters on our data.  These filters are the key to understanding how searching works in Solr. I’m going to use the following piece of correspondence as an example: Letter by Truman M. Hawley to his brother describing Civil War battle. Includes envelope, September 26, 1862. When Solr indexes this title it does a number of things.  Many of these things are customizable, and this is what is important to understand.

  1. It separates and analyzes each word and assigns locations to them. “Letter” is in location 1 and takes up spaces 0-5 (the space at the end of the word is included in the word)
  2. It determines the type of word (is it alphanumeric? Or just a number? 1864 is just a number)
  3. It removes punctuation. Finally, a place where no one cares about commas.
  4. It removes stopwords. We apply a “StopFilterFactory” filter to remove stopwords. These can be customized. In our system, “by,” and “to,” are considered stopwords and we do not index them.
  5. It converts everything to lower case. Solr does not have to do this. We apply a “LowerCaseFilterFactory”  with the assumption that our users will not need to place emphasis or relevancy on case in searches.
  6. We apply an “AsciiFoldingFilterFactory” that converts alphabetic, numeric, and symbolic Unicode characters which are not in the “Basic Latin” Unicode block into their ASCII equivalents, if one exists. So, for example, a search on “Munoz” will match on “Muñoz”
  7. We apply the “PorterStemFilter” to the Title field.  This filter applies an algorithm that essentially truncates words based on assumptions about endings. In the example above, “describing” becomes “describ” and “battle” becomes “battl.”
What we are left with is indexing on the following terms:

letter truman m hawlei hi brother describ civil war battl includ envelop septemb 24 1864 This means that I could run the following query in Solr q= Title:(truman AND civil AND describ AND battl) and receive this letter as a hit.  Solr still allows for the capability of phrase queries (“Letter by Truman M. Hawley”), or for wildcard searches: (Truman AND Hawl*). Our implementation of Solr currently assumes a boolean “OR” as the default operator in a search string. So, if I thought to myself, I am interested in looking for content having to do with the Civil War in the month of September, I might type into a search box something like “civil september.” How this translates based on our configuration is “Search the Title field for anything containing the term “civil” OR “septemb.” Here are just a few examples out of my over 300 results:

  • The Greek beginning,Classical civilization
  • The Classical age,Classical civilization
  • Ancient civilizations The Vikings
  • Ancient civilizations The Aztecs
  • Ancient civilizations The Mayans
  • The Civil War in Maryland Collection
  • The Celts,Ancient civilizations
  • Acts of faith,Jewish civilization in Spain
  • Brick by brick: a civil rights story

How is this possible? Well, if I investigate how our PorterStemFilter analyses “civilization,” I discover that it becomes “civil.”  Also, as a user, in my brain, I am thinking that I want results that have to do both with the Civil War AND September, and Solr is returning results that have to do with either.  If I manually adjust my search to be a boolean “AND” search – Title:(civil AND September), I only see three relevant results. This might lead me to believe that we should instantly change our default search to “AND” instead of “OR” since obviously, if I type a search into a box and it has two terms, I want to see records with both those terms.  Our current default in our public interface is “AND.” And also, we should turn off the PorterStemFilter because all of those “civilization” hits are annoying. If I want to search for “Civil*” I will search for “Civil*.”

But is it so simple? What is best for the user? What default settings will be most useful for our users? This is a different discussion and I will be working with my colleagues on the Collections side of things to try to answer some of these questions. Solr is so robust, and can be used to fit so many different situations, that truly configuring it in the most effective way is overwhelming, but also exciting.

Experimenting with 3D Printing

The University of Maryland’s student newspaper, the Diamondback, recently reported that the UMD Libraries have installed a 3D printer in the Terrapin Learning Commons.  Before new tools like this are installed, User and Systems Support (USS) conducts extensive research and testing. In this case, USS obtained a MakerBot 3D Printer.  The relatively small piece of desktop equipment is one of the most exciting we have seen in years.   A 3D printer works by feeding a 3D design into a computer program, which then sends the information to the printer. The printer builds the object from the bottom up, depositing a plastic (PLA) filament in horizontal layers onto a build platform, and resulting in an actual object that can be used however intended.

Libraries are increasingly making 3D printers available to patrons – they are excellent ways to create models or other products necessary for school work and design.  While USS staff have been having their own fun, they have also been experimenting with useful designs and thinking about ways to use the 3D printer to produce supplies, such as cable organizers:

Will and his purple mug
Will and his purple mug, created with the MakerBot 3D printer
Uche wih a nameplate and a UMD terrapin!
Uche wih a nameplate and a UMD terrapin, printed using the MakerBot 3D printer!