Reusing Newspaper Data from Chronicling America

The National Digital Newspaper Program’s (NDNP) goal in digitizing U.S. newspapers from microfilm isn’t to simply create digital copies of the film—it’s to make the content of the digitized newspapers more usable and reusable. This is made possible through the creation of different kinds of metadata during digitization. (You can read my post from 2013 for the nitty gritty details of NDNP metadata, or go straight to the source.) The addition of robust metadata means that the Library of Congress’ Chronicling America website isn’t just a digital collection of newspapers—it’s a rich data set—and our project’s contributions to Chronicling America represent Maryland in this data.

Newspaper data is being used in exciting ways by scholars, students, and software developers. Here are a few of my favorite examples:

Data Visualization: Journalism’s Journey West
Bill Lane Center for the American West, Stanford University

Map of Maryland showing newspapers that were publishing in the 1790s.
Image from

This visualization plots the 140,000+ newspapers that are included in Chronicling America’s U.S. Newspaper Directory. Read about the history of newspaper publication in the U.S., and watch as newspapers spread across the country from 1690 through the present.

An Epidemiology of Information: Data Mining the 1918 Influenza Pandemic
Virginia Tech

Excerpt from newspaper reads

The 1918 influenza pandemic, or Spanish flu, killed 675,000 in the U.S. and 50 million worldwide. An Epidemiology of Information used two text-mining methods to examine patterns in how the disease was reported in newspapers and the tone of the reports (e.g., alarmist, warning, reassuring, explanatory). Visit the project website for more information, or read the project’s January 2014 article in Perspectives on History.

Image from

The Cultural Observatory, Harvard University

Graph that shows the occurrence of the word
Image from

Bookworm is a tool that allows you to “visualize trends in repositories of digitized texts,” including Chronicling America. In the graph above, Tom Ewing of the aforementioned Epidemiology of Information project used Bookworm to visualize instances of the word “influenza” in the New York Tribune between 1911 and 1921. You can create your own visualizations of Chronicling America data using this tool.

Viral Texts: Mapping Networks of Reprinting in 19th-Century Newspapers and Magazines
NULab for Texts, Maps, and Networks, Northeastern University

A visualization of the networks that exist between newspapers based on how the poem
Image from

In the 19th century, the content published in newspapers was not protected by copyright as it is today. As a result, newspaper editors often “borrowed” and reprinted content from other papers. This project seeks to uncover why particular news stories, works of fiction, and poetry “went viral” using the Optical Character Recognition (OCR) text of the newspapers in Chronicling America and magazines in Cornell University Library’s Making of America.

Everyone is welcome to use Chronicling America as a dataset for their research. There’s no special key or password needed. Information about the Chronicling America API can be found here. For additional projects and tools that use Chronicling America data, see this list compiled by the Library of Congress.

If you reuse Chronicling America data, especially from Maryland newspapers, in your research, please leave a comment or drop us a line. We’d love to hear from you!

Stew of the month: July 2015

Welcome to a new issue of Stew of the Month, a monthly blog from Digital Systems and Stewardship (DSS) at the University of Maryland Libraries. This blog provides news and updates from the DSS Division. We welcome comments, feedback and ideas for improving our products and services.

Digitization Activities

Robin Pike submitted a National Endowment for the Humanities (NEH) Humanities Collections and Reference Resources Foundations grant proposal to hire a consultant company to perform an assessment survey of all the audiovisual materials in the Libraries. Dan Mack (CSS), Gary White (PSD), Steve Henry (MSPAL), Laura Schnitker (SCUA), and Trevor Munoz contributed to the narrative and final application. Tonita Smith Brooks provided much assistance in preparing the application for submission through the Office of Research Administration. The results of the audiovisual assessment survey will assist us in prioritizing monetary and financial resources in planning future digitization projects. We will find out if we received the grant in March 2016.

DCMR staff performed quality assurance on the remaining vendor projects funded through the DIC project proposal process: Hebraica (300 Hebrew and Yiddish volumes) and the Diamondback Photo Morgue. They hope to complete the work in August.

Digitization assistants Rachel Dook and Caroline Hayden digitized the Victor E. Delnore Papers, a manuscript collection within the Gordon W. Prange Collection. Prange staff and DCMR undertook the project to commemorate the 70th anniversary of the bombing of Nagasaki, Japan, where Lt. Colonel Delnore oversaw rebuilding efforts as a commander of the U.S. Occupational Forces. 

Digitization assistant Audrey Lengel digitized materials for Associate Professor of English and Associate Director of MITH Dr. Matthew Kirschenbaum’s upcoming book Track Changes: The Literary History of Word Processing. Kirschenbaum used some of the images in his plenary session at the Archival Education and Research Institute.

The Historic Maryland Newspapers Project staff began quality review of the newspaper titles sent to the digitization vendor earlier this year. They continued to add content to the Pinterest boards.

Alice Prael reviewed usage statistics on UMD Digital Collections gathered through Google Analytics with the goal of using the data to make informed decisions about the prioritization of digital projects and promoting our holdings; a report is forthcoming.

Digital  Programs and Initiatives

New DRUM Interface

Thanks to everyone in Software Systems Development and Research, DRUM has been upgraded and now has a new interface:  Frequent users of DRUM have access to the same features as before but in a different layout.  All of the navigation appears in the right column and the structure for individual records has been totally revamped.  Take it for a test drive and we hope you enjoy the new layout.

Spring 2015 ETDs Now Available in DRUM
All of the electronic theses and dissertations from the spring 2015 semester have been loaded in DRUM. Researchers now have access to 10,460 UMD theses and dissertations dating back to 2003. Of the 405 documents deposited in DRUM from the spring semester, 196 students or 48% requested either a 1-year or 6-year embargo; an all-time high since we started tracking embargo requests in 2006. On average, 39% of UMD students have requested an embargo since 2006. Subject librarians can contact Terry Owen ( for a breakdown of embargo requests for their departments.

Gemstone Projects
Eleven Gemstone projects from the spring 2015 semester have recently been added to DRUM bringing the total to 84. Many of our subject librarians provided support to the Gemstone teams throughout the 4-year project. More information about the Gemstone program is available here. Check out some of their current research:
Evaluating the Feasibility of Implementing a Green Roof Retrofit on Pitched Residential Roofs
A Kinect Based Indoor Navigation System for the Blind
Fabrication of Poly (D,L-Lactic-Co-Glycolic Acid) microparticles for Improved Human Papillomavirus Vaccine Delivery

Open Access Fund
The UMD Libraries Open Access Publishing Fund closed out another successful year in June. We funded 31 articles with an average cost of $1,240 per article. A majority of the 2014-2015 applicants were faculty and all disciplines were represented. Most of the applicants were from either the School of Public Health or the College of Computer, Mathematical & Natural Sciences. Depending on the availability of funds for 2015-2016, we anticipate that the fund will reopen in September.

CRMS Project Update
Last year the University of Maryland Libraries joined a prestigious group of institutions to assist in making copyright determinations for books in HathiTrust. Using the Copyright Review Management System (CRMS) developed at the University of Michigan, UMD has determined the copyright status for more than 1,850 books since January 2015. We would like to thank the library staff that volunteered to participate in the program: Paul Bushmiller, Leigh Ann DePope, Donna King, Yeo-Hee Koh, Audrey Lengel, Terry Owen, and Loretta Tatum. And special thanks to Tonita Brooks for processing monthly reports for the grant.

Software Development

The Database Finder feature in the Libraries’ Website has been updated to include database categories.  The categories are for now only visible on the database detail page and can be used as search terms.  Future enhancements are planned to use the categories for improved discovery: a) faceted browse by category and sub-category; and b) context sensitive linking to Subject Specialists.

The migration to the new XMLUI/Mirage2 theme for DRUM was completed and has been installed in production.  This theme provides a Responsive Web Design which allows all device sizes to view and use DRUM.  It also allows us to more easily add newer DSpace features which were unavailable in the old interface implementation.

We’ve made progress in upgrading Hippo CMS  to version 7.9 and plan to begin user testing and promotion to production in August.  The major new features for users are a much improved HTML Editor and a new Channel Manager feature for previewing your pages in desktop, tablet, and phone sizes, which will be important as we implement the new Responsive Web Design Libraries’ website this Fall.

Improvements have been completed for the Online Student Application system, based on initial staff feedback.  User testing and promotion to production will take place in August.

User and System Support

User and Systems Support (USS) participated in several 3D printing events. On July 20, 2015 and July 30, 2015, the Discovering Engineering Summer Program visited the John & Stella Graves Makerspace. USS demonstrated the different equipment that’s available in the Makerspace. The two groups were a mixture of rising 11th and 12th grade students who attended a week-long program to learn more about the University of Maryland and the Clark School of Engineering. A total of 60 students attended two events. They all had a strong interest in the field. Some have even been involved in engineering-related coursework, research and extracurricular activities. Each group that came through had lots of really great questions about 3D printing and seemed to thoroughly enjoy learning about the equipment.

On July 28, 2015, USS participated in the “LKA’s Teens in Technology Workshop Series” program. This program was organized by LKA Computer Consultants and was held at LKA’s office. The program was limited to 15 teens with the goal to expose the teens to the world of C++ Programming, Cyber Security, Web Development, and Project Management. In addition, they also provided the teens with desktop fabrication (3D printing). Sandra, Victoria, and Preston took three 3D printers and a hand help 3D scanner to the workshop. . The students were shown how 3D scanning works by scanning one of the teen students in real time. They were also guided through the creation of their very own 3D nametag model using an online program called TinkerCad. From start to finish, the students were very engaged in creating their nametag model.  Soon after, they were split into small groups and started 3D printing small models on the provided printers. Victoria’s group strategically picked small shurikens where each group member would be able to get one. Preston’s group printed the batman symbol. And Sandra’s group printed a red and black Porshe. The two 3D prints from Preston and Sandra’s groups were raffled to a teen in their respective groups.  The students were clearly excited to see the entire process from a 3D model on a computer to it being printed and able to hold in their hands in very little time. At the end of the event, the teens were left inspired and grateful for the opportunity to learn and create 3D prints that they were also able leave with.

USMAI (University System of Maryland and Affiliated Institutions) Library Consortium

Support to USMAI

The CLAS team responded to 122 Aleph Rx submissions and 43 e-resource requests from across the consortium’s libraries in July. Who said summer was slow?

The consortium began subscribing to two new resources this fiscal year: Academic Search Complete and ProQuest Dissertations & Theses. CLAS configured EZproxy, ResearchPort, and SFX to work with both of these collections.

The team has also been reviewing the submitted responses for the Aleph enhancements initiative and will be meeting with the USMAI Shared Platforms & Applications subgroup in early August to discuss the list and develop recommendations for the Council of Library Directors.

Kuali OLE

CLAS is finishing its work with the consortium’s testing group. The combined groups have started working on a report for the Council of Library Directors about their testing experiences. While that work develops, CLAS has also continued to attend weekly imlementation meetings with other Kuali OLE adopters.


Partners in the Maryland Shared Open Access Repository were given the “green light” this month to begin loading content into the shared repository. Each institution has the flexibility to establish their own implementation timeline. Many campuses have set up their repository structure and begun loading materials. MD-SOAR is a 2-year pilot funded by the consortium. DSS is the service provider for the repository, making use of our DRUM experience and DSpace expertise to help the consortium build out their vision for a shared institutional repository.


Alice Prael leaves in August; her last day in the office is August 13th. She will be leaving for Boston to join the National Digital Stewardship Residency at the JFK Presidential Library.

Conferences, workshops and professional development

Robin Pike attended the International Council on Archives-Section on University Archives (ICA-SUV) conference from July 13-15 in Chapel Hill, NC with Bria Parker (MSD) and Vin Novara (SCPA); they presented a paper titled “‘Is This Enough?’ Digitizing Liz Lerman Dance Exchange Archives Media.”

Robin gave a guest-lecture for The Catholic University of America’s CLSC 747 “Special Collections” on July 20 titled “Management of Digital Programs in Special Collections.”

Eric Cartier attended the seventh annual Archival Education and Research Institute (AERI) from July 13-17, an international conference held at the UMD Libraries.

Robin and Eric wrote articles for the Society of American Archivists Recorded Sound Roundtable newsletter Recorded Sound

The Historic Maryland Newspapers Project team visited colleagues around the state to speak about the project at regional meetings organized by Digital Maryland. Attendees at these meetings were from a variety of institutions that hold cultural heritage materials, including public libraries, local historical societies, museums, and churches. The goal of the meetings was to share information about digital initiatives across the state and to hear what the priorities and needs are for making collections available digitally. Liz Caringola attended meetings in Ellicott City and Hagerstown, and Doug McElrath (SCUA) attended the meeting in Easton. Two additional meetings in Aberdeen and Prince Frederick are scheduled in August.


Eric Cartier gave three tours of the Hornbake Digitization Center to 20 attendees during the Archival Education and Research Institute (AERI).