Archelon 1.0 Release

We are pleased to announce the 1.0 release of Archelon, the new staff-only administrative interface for content in the UMD Libraries’ Fedora 4 repository. The Fedora 4 repository was released in production last August and Archelon is the first staff interface for content in Fedora 4.

The 1.0 release enables library staff in Special Collections and University Archives, Digital Conversion and Media Reformatting, Digital Programs and Initiatives, and other departments to search, browse, and download content in Fedora 4. In addition, a basic, embedded newspaper and image viewer is included in the 1.0 release. Archelon will be developed in an agile fashion, with many incremental releases over time, and upcoming releases will include more features in the newspaper viewer. Eventually, full content add/update capabilities will be available to users, beginning with version 2.0.

Digital Content in Archelon

The launch of Archelon coincides with the import of the digitized Diamondback student newspapers, which will be the first collection available on the new platform. Batch loading of 130,000 objects is currently underway, representing 3,500 issues spanning 1910 to 1971.  Working closely with stakeholders in Special Collections and University Archives, we have established that it is a priority to load content that is currently not available online, such as Katherine Anne Porter correspondence and Diamondback photos.  We will be working on loading this content through 2017 as well as migrating content from the existing Digital Collections repository.

Archelon’s Technology

Archelon 1.0 is built using Ruby on Rails and the Blacklight discovery interface.  The supporting infrastructure behind Archelon includes the previously released Fedora 4 repository and new additions of a Newspaper batch loader, IIIF image server using Loris, search/index service using Apache Solr 6,  and IIIF manifest server using pcdm-manifests.  The newspaper viewer is built using the Mirador IIIF image viewer.

What’s Up with the Name?

Archelon is named for the ancient genus of giant sea turtles Archelon, whose name means “ruler turtle” in Greek. Archelon lived approximately 80.5 million years ago in the shallow seas that covered most of North America at the time. It is the largest ever recorded species of turtle by size, and second largest by weight.

Acknowledgements

The Fedora 4 repository implementation team would like to thank everyone in Digital Systems and Stewardship and throughout the University Libraries whose technical and administrative support made this effort possible.

National Digital Newspaper Program: 2016-2018 Selection

Introduction

The UMD Libraries were awarded a National Endowment for the Humanities (NEH) $250,000 grant for the third phase of the Historic Maryland Newspaper Project, beginning September 1, 2016. Between 2016-2018, the project will digitize approximately 100,000 pages of newspapers published in the State of Maryland, adding to the over 200,000 pages from Maryland already in Chronicling America, the Library of Congress digitized newspaper database. The state partners contributing content for the third grant are the Maryland State Archives, also a partner on the second grant, and Frostburg State University Library. UMD’s theme for the third award is to include newspapers of greater diversity, including one Polish language paper and several labor papers, as well as newspapers with contrasting political viewpoints of those digitized during the first two grant cycles.

Title Selection

Project staff consulted with the Advisory Board to select the list of titles that may be selected during the 2016-2018 phase:

  • The Baltimore County Union (1865-1909), Towsontown, MD
  • Catoctin Clarion (1923), Mechanicstown, MD
  • The Citizen (1895-1922), Frederick, MD
  • Czas Baltimorski (1940-1941), Baltimore, MD
  • Democratic Messenger (1881-1922), Snow Hill, MD
  • Evening Capital, Evening Capital and Maryland Gazette (1884-1922), Annapolis, MD
  • Frostburg Mining Journal (1871-1917), Frostburg, MD
  • The Frostburg Forum (1897-19??), Frostburg, MD
  • The Frostburg Gleaner (1899-19??), Frostburg, MD
  • The Frostburg Herald (1903-19??), Frostburg, MD
  • The Frostburg News (1897-18??), Frostburg, MD
  • The Frostburg Spirit (1913-1915), Frostburg, MD
  • Greenbelt Cooperator (1937-1943), Greenbelt, MD
  • Maryland Independent (1874-1934), Port Tobacco, MD
  • The Midland Journal (1885-1946), Rising Sun, MD
  • Voice of Labor (1938-1942), Cumberland, MD
  • Worcester Democrat and Ledger-Enterprise (1921-1953), Pocomoke City, MD

The list may be modified as the project student assistants collate the microfilm and discover that the images may be of too poor quality for digitization.

Mutilated pages from the Maryland Independent
Mutilated pages from the Maryland Independent

Copyright Research

In July, NEH announced the expansion of date ranges for the NDNP program, to include 1690-1963. For newspapers published between 1923-1963, project staff need to perform copyright research to determine whether the newspaper issue was registered with the copyright office, and if it was registered, whether the copyright was renewed 28 years later, according to the law. Project staff decided to utilize the resources available through the Copyright Office to determine whether these titles are in the public domain:

  • Catoctin Clarion (1923), Mechanicstown, MD
  • Czas Baltimorski (1940-1941), Baltimore, MD
  • Greenbelt Cooperator (1937-1943), Greenbelt, MD
  • Maryland Independent (1874-1934), Port Tobacco, MD
  • The Midland Journal (1885-1946), Rising Sun, MD
  • Voice of Labor (1938-1942), Cumberland, MD
  • Worcester Democrat and Ledger-Enterprise (1921-1953), Pocomoke City, MD

With guidance from the Library of Congress on how to perform copyright research, Doug McElrath (SCUA) and Robin Pike developed instructions for Doug, Robin, Judi Kidd, and Amy Wickner (SCUA) to perform the research and track their results, providing evidence to the Library of Congress and NEH that the titles are in the public domain. The project staff will primarily be searching in the pre-1978 Catalog of Copyright Entries, but may also have to search in the Copyright Catalog (1978-Present) for renewed registrations. Unlike a book which is a single entity, newspapers are copyrighted by the issue, so project staff will have to ensure that they do title searches across the entire date range of publication to ensure the issues are in the public domain.

You’re Invited to the Historic Maryland Newspapers Wikipedia Edit-a-thon on May 2!

Today’s post is by Amy Wickner, student assistant and iSchool field study for the Historic Maryland Newspapers Project.

As part of an ongoing initiative to connect digital collections with Wikipedia, the Historic Maryland Newspapers Project (HMNP) will co-host a  Wikipedia Edit-a-thon (May 2, 1-4pm) focusing on Maryland newspapers. We’ve set up an event page and advance registration form (strongly recommended) with all the details.

Photo from HMNP’s last edit-a-thon on August 18, 2014, at UMD Libraries.

Liz Caringola and I are working with special collections staff at the Maryland State Archives in Annapolis, who have been kind enough to provide space, computers, and guided tours of their collections. Maria Day and Allison Rein from MSA will highlight historic newspapers in their collections, while Liz will introduce edit-a-thon participants to Chronicling America and HMNP’s ongoing work. I’ll give short tutorials on editing Wikipedia and adding images to Wikimedia Commons. We’re hoping to draw participants from across the state and DC / Baltimore metro areas. All are welcome, and word-of-mouth promotion would be much appreciated.

Many edit-a-thon pages have a Goals section, conventionally a list of articles needing to be drafted, added, or improved. Our page has such a list, but we’d also like to help participants depart with at least some impulse to continue editing Wikipedia. (We’ll have a day-of participant survey of some kind to get at what brings people to our event.) Sparking a lifelong passion for editing Wikipedia using archival material as evidence would of course be fire, but growing sustainable participation more realistically involves a lot of small steps. Which is why it’s exciting to see that this is just one of many DC-area Wikipedia events this spring, with themes ranging from accessibility to labor to #ColorOurHistory.

Chronicling America surpasses 10 million pages!


The University of Maryland Libraries joins the Library of Congress and the National Endowment for the Humanities in celebrating a major milestone for Chronicling America, a free, searchable database of historic U.S. newspapers. The Library of Congress announced on October 7 that more than 10 million pages have been posted to the site. This number includes 117,082 pages of Maryland newspapers digitized by the Historic Maryland Newspapers Project and its content partners, the Maryland State Archives and Maryland Historical Society, from the following titles:

Titles are added on a rolling basis, so check back often, or subscribe to Chronicling America’s RSS feed to receive alerts when new titles are added.

For more information about the Historic Maryland Newspapers Project, please visit our website: http://ter.ps/newspapers.

Reusing Newspaper Data from Chronicling America

The National Digital Newspaper Program’s (NDNP) goal in digitizing U.S. newspapers from microfilm isn’t to simply create digital copies of the film—it’s to make the content of the digitized newspapers more usable and reusable. This is made possible through the creation of different kinds of metadata during digitization. (You can read my post from 2013 for the nitty gritty details of NDNP metadata, or go straight to the source.) The addition of robust metadata means that the Library of Congress’ Chronicling America website isn’t just a digital collection of newspapers—it’s a rich data set—and our project’s contributions to Chronicling America represent Maryland in this data.

Newspaper data is being used in exciting ways by scholars, students, and software developers. Here are a few of my favorite examples:

Data Visualization: Journalism’s Journey West
Bill Lane Center for the American West, Stanford University
http://www.stanford.edu/group/ruralwest/cgi-bin/drupal/visualizations/us_newspapers

Map of Maryland showing newspapers that were publishing in the 1790s.
Image from http://www.stanford.edu/group/ruralwest/cgi-bin/drupal/visualizations/us_newspapers

This visualization plots the 140,000+ newspapers that are included in Chronicling America’s U.S. Newspaper Directory. Read about the history of newspaper publication in the U.S., and watch as newspapers spread across the country from 1690 through the present.

An Epidemiology of Information: Data Mining the 1918 Influenza Pandemic
Virginia Tech
http://www.flu1918.lib.vt.edu/

Excerpt from newspaper reads

The 1918 influenza pandemic, or Spanish flu, killed 675,000 in the U.S. and 50 million worldwide. An Epidemiology of Information used two text-mining methods to examine patterns in how the disease was reported in newspapers and the tone of the reports (e.g., alarmist, warning, reassuring, explanatory). Visit the project website for more information, or read the project’s January 2014 article in Perspectives on History.

Image from http://www.flu1918.lib.vt.edu/wp-content/uploads/2012/11/NLM-Presentation-Ewing-30April2013.pdf

Bookworm
The Cultural Observatory, Harvard University
http://bookworm.culturomics.org/ChronAm/

Graph that shows the occurrence of the word
Image from https://twitter.com/1918FluSeminar/status/577082239479115776

Bookworm is a tool that allows you to “visualize trends in repositories of digitized texts,” including Chronicling America. In the graph above, Tom Ewing of the aforementioned Epidemiology of Information project used Bookworm to visualize instances of the word “influenza” in the New York Tribune between 1911 and 1921. You can create your own visualizations of Chronicling America data using this tool.

Viral Texts: Mapping Networks of Reprinting in 19th-Century Newspapers and Magazines
NULab for Texts, Maps, and Networks, Northeastern University
http://viraltexts.org/

A visualization of the networks that exist between newspapers based on how the poem
Image from http://networks.viraltexts.org/1836to1860-Inquiry/

In the 19th century, the content published in newspapers was not protected by copyright as it is today. As a result, newspaper editors often “borrowed” and reprinted content from other papers. This project seeks to uncover why particular news stories, works of fiction, and poetry “went viral” using the Optical Character Recognition (OCR) text of the newspapers in Chronicling America and magazines in Cornell University Library’s Making of America.

Everyone is welcome to use Chronicling America as a dataset for their research. There’s no special key or password needed. Information about the Chronicling America API can be found here. For additional projects and tools that use Chronicling America data, see this list compiled by the Library of Congress.

If you reuse Chronicling America data, especially from Maryland newspapers, in your research, please leave a comment or drop us a line. We’d love to hear from you!

Knight News Challenge: Libraries. Our application…

The Knight Foundation recently issued a news challenge: How might we leverage libraries as a platform to build more knowledgeable communities? Here at the University of Maryland Libraries, we felt that we had an idea.

Improving Discovery in Digital Newspapers through Crowdsourcing the Development of Semantic Models

“We will develop tools that enable users of digitized newspapers to intuitively create connections between the concepts, people, places, things, and ideas written about in the newspaper pages, which will facilitate further discovery and analysis by researchers at all levels.”
The process of working on this application was fun and inspiring.  Our Associate Dean for Digital Systems and Stewardship, Babak Hamidzadeh, had the original vision. He enlisted myself (Jennie Knies) and Liz Caringola, our Maryland Historic Newspapers librarian, to help flesh out some of the ideas.  The UMD Libraries’ Communications director, Eric Bartheld, and our Director of Development, Heather Foss, also contributed. Ed Summers (MITH) and Dr. Ira Chinoy (Journalism) provided excellent feedback and encouragement. Rebecca Wilson, the UMD Libraries’ graphic designer, created this compelling graphic under a very tight deadline.
 KnightProposalImage
The application itself had very strict word/character requirements, which was a fascinating challenge in itself.  750 characters (that includes spaces!) to communicate the entire idea?
We think that we are uniquely positioned to develop these types of tools – we have the enthusiasm, the content (thanks to the Maryland Historic Newspapers project and to Chronicling America), and the resources and expertise to make this a reality.  Fingers-crossed that we get a lot of “applause!” There are a lot of amazing proposals for the Knight Foundation to choose from, but I hope we get to be one of them.

Historic Maryland Newspapers Project receives funding for Phase 2

It’s our pleasure to announce that the Historic Maryland Newspapers Project at the University of Maryland Libraries has received funding for Phase 2 and will continue through August 2016 thanks to a generous $290,000 National Digital Newspaper Program (NDNP) grant from the National Endowment for the Humanities.

The Historic Maryland Newspapers Project was first awarded an NDNP grant in 2012 to digitize 100,000 pages of newsprint published between 1836 and 1922. To date, approximately 107,375 pages of Maryland newspapers have been digitized and nearly 86,000 are available on the Library of Congress database Chronicling America. The bulk of these pages is from the prominent German-language Baltimore paper Der Deutsche Correspondent. The time frame of the digitized Correspondent spans 1858 to 1913.The following titles were also digitized during Phase 1 of the project:

Baltimore

  • The American Republican and Baltimore Daily Clipper, 1844-1846
  • The Baltimore Commercial Journal, and Lyford’s Price-Current, 1847-1849
  • Baltimore Daily Commercial, 1865-1866
  • The Daily Exchange, 1858-1861
  • The Pilot and Transcript, 1840-1841

Western Maryland

  • Civilian and Telegraph (Cumberland), 1859-1865
  • The Maryland Free Press (Hagerstown), 1862-1868

During Phase 2, we will complete digitization of Der Deutsche Correspondent (1914-1918) and will digitize a variety of English papers that reflect the regional diversity of Maryland. We look forward to collaborating with our colleagues at the Maryland State Archives during the second phase of the project.

See the press release from NEH: http://www.neh.gov/news/press-release/2014-07-21.