The National Digital Newspaper Program’s (NDNP) goal in digitizing U.S. newspapers from microfilm isn’t to simply create digital copies of the film—it’s to make the content of the digitized newspapers more usable and reusable. This is made possible through the creation of different kinds of metadata during digitization. (You can read my post from 2013 for the nitty gritty details of NDNP metadata, or go straight to the source.) The addition of robust metadata means that the Library of Congress’ Chronicling America website isn’t just a digital collection of newspapers—it’s a rich data set—and our project’s contributions to Chronicling America represent Maryland in this data.
Newspaper data is being used in exciting ways by scholars, students, and software developers. Here are a few of my favorite examples:
Data Visualization: Journalism’s Journey West
Bill Lane Center for the American West, Stanford University
This visualization plots the 140,000+ newspapers that are included in Chronicling America’s U.S. Newspaper Directory. Read about the history of newspaper publication in the U.S., and watch as newspapers spread across the country from 1690 through the present.
An Epidemiology of Information: Data Mining the 1918 Influenza Pandemic
The 1918 influenza pandemic, or Spanish flu, killed 675,000 in the U.S. and 50 million worldwide. An Epidemiology of Information used two text-mining methods to examine patterns in how the disease was reported in newspapers and the tone of the reports (e.g., alarmist, warning, reassuring, explanatory). Visit the project website for more information, or read the project’s January 2014 article in Perspectives on History.
The Cultural Observatory, Harvard University
Bookworm is a tool that allows you to “visualize trends in repositories of digitized texts,” including Chronicling America. In the graph above, Tom Ewing of the aforementioned Epidemiology of Information project used Bookworm to visualize instances of the word “influenza” in the New York Tribune between 1911 and 1921. You can create your own visualizations of Chronicling America data using this tool.
Viral Texts: Mapping Networks of Reprinting in 19th-Century Newspapers and Magazines
NULab for Texts, Maps, and Networks, Northeastern University
In the 19th century, the content published in newspapers was not protected by copyright as it is today. As a result, newspaper editors often “borrowed” and reprinted content from other papers. This project seeks to uncover why particular news stories, works of fiction, and poetry “went viral” using the Optical Character Recognition (OCR) text of the newspapers in Chronicling America and magazines in Cornell University Library’s Making of America.
Everyone is welcome to use Chronicling America as a dataset for their research. There’s no special key or password needed. Information about the Chronicling America API can be found here. For additional projects and tools that use Chronicling America data, see this list compiled by the Library of Congress.
If you reuse Chronicling America data, especially from Maryland newspapers, in your research, please leave a comment or drop us a line. We’d love to hear from you!