You’re Invited to the Historic Maryland Newspapers Wikipedia Edit-a-thon on May 2!

Today’s post is by Amy Wickner, student assistant and iSchool field study for the Historic Maryland Newspapers Project.

As part of an ongoing initiative to connect digital collections with Wikipedia, the Historic Maryland Newspapers Project (HMNP) will co-host a  Wikipedia Edit-a-thon (May 2, 1-4pm) focusing on Maryland newspapers. We’ve set up an event page and advance registration form (strongly recommended) with all the details.

Photo from HMNP’s last edit-a-thon on August 18, 2014, at UMD Libraries.

Liz Caringola and I are working with special collections staff at the Maryland State Archives in Annapolis, who have been kind enough to provide space, computers, and guided tours of their collections. Maria Day and Allison Rein from MSA will highlight historic newspapers in their collections, while Liz will introduce edit-a-thon participants to Chronicling America and HMNP’s ongoing work. I’ll give short tutorials on editing Wikipedia and adding images to Wikimedia Commons. We’re hoping to draw participants from across the state and DC / Baltimore metro areas. All are welcome, and word-of-mouth promotion would be much appreciated.

Many edit-a-thon pages have a Goals section, conventionally a list of articles needing to be drafted, added, or improved. Our page has such a list, but we’d also like to help participants depart with at least some impulse to continue editing Wikipedia. (We’ll have a day-of participant survey of some kind to get at what brings people to our event.) Sparking a lifelong passion for editing Wikipedia using archival material as evidence would of course be fire, but growing sustainable participation more realistically involves a lot of small steps. Which is why it’s exciting to see that this is just one of many DC-area Wikipedia events this spring, with themes ranging from accessibility to labor to #ColorOurHistory.

Chronicling America surpasses 10 million pages!


The University of Maryland Libraries joins the Library of Congress and the National Endowment for the Humanities in celebrating a major milestone for Chronicling America, a free, searchable database of historic U.S. newspapers. The Library of Congress announced on October 7 that more than 10 million pages have been posted to the site. This number includes 117,082 pages of Maryland newspapers digitized by the Historic Maryland Newspapers Project and its content partners, the Maryland State Archives and Maryland Historical Society, from the following titles:

Titles are added on a rolling basis, so check back often, or subscribe to Chronicling America’s RSS feed to receive alerts when new titles are added.

For more information about the Historic Maryland Newspapers Project, please visit our website: http://ter.ps/newspapers.

Reusing Newspaper Data from Chronicling America

The National Digital Newspaper Program’s (NDNP) goal in digitizing U.S. newspapers from microfilm isn’t to simply create digital copies of the film—it’s to make the content of the digitized newspapers more usable and reusable. This is made possible through the creation of different kinds of metadata during digitization. (You can read my post from 2013 for the nitty gritty details of NDNP metadata, or go straight to the source.) The addition of robust metadata means that the Library of Congress’ Chronicling America website isn’t just a digital collection of newspapers—it’s a rich data set—and our project’s contributions to Chronicling America represent Maryland in this data.

Newspaper data is being used in exciting ways by scholars, students, and software developers. Here are a few of my favorite examples:

Data Visualization: Journalism’s Journey West
Bill Lane Center for the American West, Stanford University
http://www.stanford.edu/group/ruralwest/cgi-bin/drupal/visualizations/us_newspapers

Map of Maryland showing newspapers that were publishing in the 1790s.
Image from http://www.stanford.edu/group/ruralwest/cgi-bin/drupal/visualizations/us_newspapers

This visualization plots the 140,000+ newspapers that are included in Chronicling America’s U.S. Newspaper Directory. Read about the history of newspaper publication in the U.S., and watch as newspapers spread across the country from 1690 through the present.

An Epidemiology of Information: Data Mining the 1918 Influenza Pandemic
Virginia Tech
http://www.flu1918.lib.vt.edu/

Excerpt from newspaper reads

The 1918 influenza pandemic, or Spanish flu, killed 675,000 in the U.S. and 50 million worldwide. An Epidemiology of Information used two text-mining methods to examine patterns in how the disease was reported in newspapers and the tone of the reports (e.g., alarmist, warning, reassuring, explanatory). Visit the project website for more information, or read the project’s January 2014 article in Perspectives on History.

Image from http://www.flu1918.lib.vt.edu/wp-content/uploads/2012/11/NLM-Presentation-Ewing-30April2013.pdf

Bookworm
The Cultural Observatory, Harvard University
http://bookworm.culturomics.org/ChronAm/

Graph that shows the occurrence of the word
Image from https://twitter.com/1918FluSeminar/status/577082239479115776

Bookworm is a tool that allows you to “visualize trends in repositories of digitized texts,” including Chronicling America. In the graph above, Tom Ewing of the aforementioned Epidemiology of Information project used Bookworm to visualize instances of the word “influenza” in the New York Tribune between 1911 and 1921. You can create your own visualizations of Chronicling America data using this tool.

Viral Texts: Mapping Networks of Reprinting in 19th-Century Newspapers and Magazines
NULab for Texts, Maps, and Networks, Northeastern University
http://viraltexts.org/

A visualization of the networks that exist between newspapers based on how the poem
Image from http://networks.viraltexts.org/1836to1860-Inquiry/

In the 19th century, the content published in newspapers was not protected by copyright as it is today. As a result, newspaper editors often “borrowed” and reprinted content from other papers. This project seeks to uncover why particular news stories, works of fiction, and poetry “went viral” using the Optical Character Recognition (OCR) text of the newspapers in Chronicling America and magazines in Cornell University Library’s Making of America.

Everyone is welcome to use Chronicling America as a dataset for their research. There’s no special key or password needed. Information about the Chronicling America API can be found here. For additional projects and tools that use Chronicling America data, see this list compiled by the Library of Congress.

If you reuse Chronicling America data, especially from Maryland newspapers, in your research, please leave a comment or drop us a line. We’d love to hear from you!

Stew of the Month: March 2015

Welcome to a new issue of Stew of the Month, a monthly blog from Digital Systems and Stewardship (DSS) at the University of Maryland Libraries. This blog provides news and updates from the DSS Division. We welcome comments, feedback and ideas for improving our products and services.

Digitization Activities

We have received files for the remaining volumes of the University of Maryland Schedule of Classes that were digitized from microfilm; quality assurance will be completed over the next month and they will be uploaded to the Internet Archive. Eric Cartier uploaded 24 volumes of the AFL-CIO News (see photo below for an example with interesting metadata) and 29 volumes University of Maryland Schedules of Classes to the Internet Archive, both held in Special Collections and University Archives (SCUA), and digitized from print and received back last month. These digitization projects were funded through the DIC proposal process.

GeorgeMeaneyJerryLewis1967
Jerry Lewis presenting a plaque to AFL-CIO president George Meaney in 1967

Elizabeth Caringola submitted the sample batch of digitized microfilm for the 2014-2016 NDNP grant. After this sample is approved, she will start production batches of around 10,000 pages. Babak and Liz also submitted the first grant report for the 2014-2016 cycle to NEH and the Library of Congress detailing the progress with the project.

Liz has also been working with her students to promote interesting digital images from digitized Maryland newspaper pages available on Chronicling America by starting a Pinterest board.

Robin worked with Joanne Archer, Anne Turkos, and other SCUA staff to ship 3,446 photographs from the Diamondback newspaper photo morgue to a digitization vendor. This shipment is the first half of the first phase of the two-year project to digitize nearly 18,000 photographs. The project is funded through the DIC proposal process.

Digital Programs and Initiatives

Alice Prael has begun work on updating the current Best Practices for Digital Collections. The new Best Practices will improve the organization and functionality by moving from a standard document to a wiki platform and will be updated to include our newest projects, initiatives, and processes.

Early in the month, Josh Westgard attended the DuraSpace summit in Washington, DC, where discussion focused on Duraspace’s three main products, Fedora, DSpace, and VIVO, all of which are of interest to, or currently in use by the Libraries. He also participated in the community-driven Fedora 4 development process, including helping to draft the requirements for an audit service, and attending, along with colleagues from SSDR and Metadata Services, the DC Area Fedora Users Group meeting at the National Agricultural Library.

Software Development

Development of the new online student application submission form and supervisor database has continued. We have hit a technical snag in our new Wufoo form caused by a limit of 100 fields per form and the way that “fields” are counted so will need to create a workaround.  Implementation has begun on the supervisor database and workflow implemented in the Staff Intranet, Libi, implemented in Drupal.

Working with the Library Web Advisory Committee, we have established high-level objectives and major milestones for the Responsive Web Design (RWD) project for the Libraries’ Website. The timeline calls for planning during the Spring, implementation over the Summer, final testing and content updates in the Fall, and release scheduled for January, 2016. We have completed selection of Bootstrap as the RWD framework and Unify as our starting template, based in part on our successful use of both tools in the Beyond the Battle: Bladensburg Rediscovered special collections exhibit. The next step of creating wireframes for key page layouts is in progress.

Hippo CMS received improvements to its Solr Database feature, currently used only by the  SCPA Scores Database, laying the groundwork for several new databases, such as SCPA Recording, Maryland Digitized Newspapers, and Plant Patents.  Databases are in general chosen to be disseminated using this feature when they have simple metadata and little to no content requirements.  This is a lighter weight alternative to full ingest into Digital Collections.

We are finalizing preparations for bringing online the new Fedora Commons Repository version 4.  This soft release will target minimal services only, with no data migrated from the existing Fedora 2. By bringing the service up in production well before the full release, we will be able to incrementally test and add new procedures. This will increase reliability and confidence in the service when it comes time to bear the full weight of our digital collections.

User and System Support

In late February, the John and Stella Graves MakerSpace was asked to assist with making a few 3D printed items for an exhibit at the Shady Grove (Priddy) Library in March. Eileen Harrigton requested the 3D printed models of human and hominid skulls as a part of an interactive exhibit on evolution. By 3D printing actual scans of the fossils, attendees were able to pick up the models and get a better and closer look at the skulls.

Interestingly, Archeology and 3D printing/scanning have some things in common. Both utilize careful planning on removal of debris from the item. For 3D printed item, sometimes supports are printed and need to be removed after the printing is finished, a lot like the removal of debris and dirt around fossils.

1
Preston removing supports and rough edges on the 3D printed skull

3D scanning is also used in archeological dig sites. It is used to quickly record accurate positional details and measurements before removal, and full 3D scans after the item is removed from the ground.

2
A technician 3D scanning a human skeleton using a handheld 3D scanner
3
The actual 3D scan of the skeleton above

After the scan is complete, it can be imported into a modeling program like Autodesk Design to clean up the scan and make it ready for 3D printing. After the initial cleanup, the file can be exported to a .stl file (stereolithography) and printed.

4
A 3D scanned Homo Erectus skull being processed in Autodesk Design

 

The files that were requested came from a website that has many 3D scanned fossils. (http://africanfossils.org/) The models took approximately 20 hours in total to print and one hour to do finishing details like support removal.

5
The finished 3D printed skulls for the event. From left…Homo Sapien, Homo Habilis, Homo Erectus.

 

USMAI (University System of Maryland and Affiliated Institutions) Library Consortium

DSS has been working on an exciting opportunity with the consortium and a few other Maryland academic libraries to put together a shared institutional repository (IR). DSS presented a proposal to the consortium for a 2-year pilot, which was accepted. The IR will be named Maryland Shared Open Access Repository (MDSoar, for short). The partners of the shared IR will rely on DSS’ 10+ years of experience managing DRUM. Similar to DRUM, MDSoar will use DSpace as its repository platform. DSS staff are currently working with the IR partners to configure the IR with an anticipated launch date of June 15, 2015.

The CLAS team continues its work on the Kuali OLE initiative, participating in weekly meetings with other OLE implementation partners from around the world and developing a sandbox environment in support of College Park’s and USMAI’s testing and evaluation of OLE. In April, the team will welcome in six consortium volunteers to test and evaluate OLE for its potential as the next ILS for the consortium.

Members of the team also attended the USMAI Next Gen ILS Working Group meeting on March 11th to discuss OLE, Aleph, and the next steps for moving to a new ILS over the course of the next several years.

The CLAS team responded to 101 Aleph Rx submissions and 32 e-resource requests. Additionally, members of the team have worked with campuses on such initiatives as implementing single sign-on at Salisbury, enhancing workflows for reporting library fines and fees to the Bursars’ Office at University of Baltimore, and assisting with the UMBC’s transition to shelf-ready orders from YBP.

Staffing

Mark Hemhauser’s last day in the office was March 13th. He is heading to University of California at Berkeley to fill a role as Head of Acquisitions. We wish him the best and hope that he’ll send some good weather our way!

Conferences, workshops and professional development

Eric Cartier was interviewed by the hosts of Lost in the Stacks, “the one and only Research Library Rock’n’Roll show” on WERK 91.1 FM at Georgia Tech. The episode discussing audio digitization, the WMUC radio station and digitization project, and personal digital archiving aired on April 3.

Robin Pike co-proposed a pre-conference workshop called “Managing Audiovisual Digitization Projects” with consultant Joshua Ranger from AV Preserve and vendor George Blood from George Blood Audio, Video, and Film to the Society of American Archivists. She received confirmation that the workshop will be held on Monday, August 17, 2015 in Cleveland, OH as part of the annual conference pre-conference program.

Graduate Assistants Alice Prael (Digital Programs and Initiatives) and Amy Wickner (SCUA) found out they will be presenting their student poster “Getting to Know FRED:  Introducing Workflows for Born Digital Content” at the Society of American Archivists annual conference in August.

Liz Caringola recently achieved certification as a Digital Archives Specialist, a program is administered by the Society of American Archivists. Over the past two years, Liz has taken a variety of workshops and webinars on different aspects of digital archives and sat for the cumulative exam on February 24 in College Park.

Peter Eichman, Bria Parker, Ben Wallberg, and Joshua Westgard attended the Washington D.C. Fedora User Group Meeting on March 31 and presented to the group on the status of our Fedora 4 implementation.

Eric Cartier and Liz Caringola attended the Spring 2015 MARAC/NEA Joint Meeting in Boston from March 19-21.

David Dahl attended the ACRL 2015 Conference in Portland, OR from March 25-28. He presented as part of a panel entitled “A Tree in the Forest: Using Tried-and-True Assessment Methods from Other Industries”.

 

Historic Maryland Newspapers Project receives funding for Phase 2

It’s our pleasure to announce that the Historic Maryland Newspapers Project at the University of Maryland Libraries has received funding for Phase 2 and will continue through August 2016 thanks to a generous $290,000 National Digital Newspaper Program (NDNP) grant from the National Endowment for the Humanities.

The Historic Maryland Newspapers Project was first awarded an NDNP grant in 2012 to digitize 100,000 pages of newsprint published between 1836 and 1922. To date, approximately 107,375 pages of Maryland newspapers have been digitized and nearly 86,000 are available on the Library of Congress database Chronicling America. The bulk of these pages is from the prominent German-language Baltimore paper Der Deutsche Correspondent. The time frame of the digitized Correspondent spans 1858 to 1913.The following titles were also digitized during Phase 1 of the project:

Baltimore

  • The American Republican and Baltimore Daily Clipper, 1844-1846
  • The Baltimore Commercial Journal, and Lyford’s Price-Current, 1847-1849
  • Baltimore Daily Commercial, 1865-1866
  • The Daily Exchange, 1858-1861
  • The Pilot and Transcript, 1840-1841

Western Maryland

  • Civilian and Telegraph (Cumberland), 1859-1865
  • The Maryland Free Press (Hagerstown), 1862-1868

During Phase 2, we will complete digitization of Der Deutsche Correspondent (1914-1918) and will digitize a variety of English papers that reflect the regional diversity of Maryland. We look forward to collaborating with our colleagues at the Maryland State Archives during the second phase of the project.

See the press release from NEH: http://www.neh.gov/news/press-release/2014-07-21.

Now Hiring: Wikipedian-in-Residence

The Historic Maryland Newspapers Project is hiring a Wikipedian-in-Residence for the summer months. Our overall goal in bringing a seasoned Wikipedian on board is to improve the quality of Wikipedia articles by increasing the number of relevant citations and links to the rich newspaper content of Chronicling America.

This position will be a little different from the typical Wikipedian-in-Residence gig. Most Wikipedians are brought into an organization in order to teach the staff how to edit Wikipedia, to edit and upload content to Wikipedia or Wikimedia, or to hold edit-a-thons–at least this is what I’ve gleaned while perusing other Wikipedian job listings. Our Wikipedian may do a little of this, but their work will mostly be research-based and will result in a written report of recommendations for our project and other National Digital Newspaper Program (NDNP) awardees to implement.

First, our Wikipedian will complete an analysis of how Chronicling America is currently being represented in Wikipedia. Linkypedia is one tool that could be used during the analysis. It will be important that our Wikipedian can utilize this and other tools–perhaps even tweak these tools–in order to gather relevant statistics.

The next step will be analyzing these statistics. This step is crucial because the conclusions drawn will guide the Wikipedian’s most significant responsibility–to explore different scenarios, tools, or methods for how we might effectively increase Chronicling America‘s presence on Wikipedia. For example, they may be as simple and low tech as authoring a comprehensive guide for NDNP awardees to start editing Wikipedia; or they could require developers to add some code to the open source application behind Chronicling America in order to automatically generate wiki markup needed to cite a newspaper page in Wikipedia. (The National Library of Australia has built this functionality into their digital repository, Trove.)

Screencap of a newspaper page from Trove, showing the site's ability to generate wiki markup to cite the newspaper page.

The Wikipedian will also have to investigate the cost and resources needed to realize their proposed solutions. The Wikipedian will prioritize and make recommendations for which tools should be implemented in upcoming months based on their feasibility and estimated effectiveness.

In order to accomplish all this in four short months, the Wikipedian will have to have experience conducting research and analyzing data; knowledge of existing tools and APIs for Wikipedia; and a firm understanding of the written–and more importantly, the unwritten–rules of editing Wikipedia. This is a part-time, paid position and cannot be performed remotely.

To view the complete job posting and apply, see https://ejobs.umd.edu/postings/25127. We hope to hear from you soon!