Leave a comment

Reusing Newspaper Data from Chronicling America

The National Digital Newspaper Program’s (NDNP) goal in digitizing U.S. newspapers from microfilm isn’t to simply create digital copies of the film—it’s to make the content of the digitized newspapers more usable and reusable. This is made possible through the creation of different kinds of metadata during digitization. (You can read my post from 2013 for the nitty gritty details of NDNP metadata, or go straight to the source.) The addition of robust metadata means that the Library of Congress’ Chronicling America website isn’t just a digital collection of newspapers—it’s a rich data set—and our project’s contributions to Chronicling America represent Maryland in this data.

Newspaper data is being used in exciting ways by scholars, students, and software developers. Here are a few of my favorite examples:

Data Visualization: Journalism’s Journey West
Bill Lane Center for the American West, Stanford University
http://www.stanford.edu/group/ruralwest/cgi-bin/drupal/visualizations/us_newspapers

This visualization plots the 140,000+ newspapers that are included in Chronicling America’s U.S. Newspaper Directory. Read about the history of newspaper publication in the U.S., and watch as newspapers spread across the country from 1690 through the present.

An Epidemiology of Information: Data Mining the 1918 Influenza Pandemic
Virginia Tech
http://www.flu1918.lib.vt.edu/

Excerpt from newspaper reads

The 1918 influenza pandemic, or Spanish flu, killed 675,000 in the U.S. and 50 million worldwide. An Epidemiology of Information used two text-mining methods to examine patterns in how the disease was reported in newspapers and the tone of the reports (e.g., alarmist, warning, reassuring, explanatory). Visit the project website for more information, or read the project’s January 2014 article in Perspectives on History.

Image from http://www.flu1918.lib.vt.edu/wp-content/uploads/2012/11/NLM-Presentation-Ewing-30April2013.pdf

Bookworm
The Cultural Observatory, Harvard University
http://bookworm.culturomics.org/ChronAm/

Bookworm is a tool that allows you to “visualize trends in repositories of digitized texts,” including Chronicling America. In the graph above, Tom Ewing of the aforementioned Epidemiology of Information project used Bookworm to visualize instances of the word “influenza” in the New York Tribune between 1911 and 1921. You can create your own visualizations of Chronicling America data using this tool.

Viral Texts: Mapping Networks of Reprinting in 19th-Century Newspapers and Magazines
NULab for Texts, Maps, and Networks, Northeastern University
http://viraltexts.org/

In the 19th century, the content published in newspapers was not protected by copyright as it is today. As a result, newspaper editors often “borrowed” and reprinted content from other papers. This project seeks to uncover why particular news stories, works of fiction, and poetry “went viral” using the Optical Character Recognition (OCR) text of the newspapers in Chronicling America and magazines in Cornell University Library’s Making of America.

Everyone is welcome to use Chronicling America as a dataset for their research. There’s no special key or password needed. Information about the Chronicling America API can be found here. For additional projects and tools that use Chronicling America data, see this list compiled by the Library of Congress.

If you reuse Chronicling America data, especially from Maryland newspapers, in your research, please leave a comment or drop us a line. We’d love to hear from you!

About liz_caringola

Liz Caringola is the Historic Maryland Newspapers Librarian at the University of Maryland Libraries. Prior to her current position, she managed digitization projects for the Department of Anthropology at Kenyon College and was a technician with the NARA/Ancestry digitization partnership at the National Archives in College Park, MD.

Discuss!

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: