NewberryProposal

From Digital Scholarship Group
Jump to navigation Jump to search

Participants

  • Doug Knox

Project description

I am the director of a new NEH-funded project to create a digital transcription of the Chicago Foreign Language Press Survey, a 1930s selection and compilation of articles translated from ethnic newspapers published in Chicago from the 1860s to the 1930s. The collection consists of (images of) approximately 120,000 sheets of 5x8" paper representing around 50,000 articles. We are working with a vendor who over the summer will transcribe digital images to produce simple structural TEI that will represent the text at the paragraph level and enable us to capture essential article-level metadata. The Newberry project will conduct quality control and essential editing tasks (investigating missing pages, correcting evident metadata errors in the original, etc). In this phase of the project we have committed only to making the information available with an article database and full-text search, and we do not have funding to create a full index of contextual information to the degree the resource merits.

But this is rich historical material that will be enormously more useful to many researchers when we do mark up personal names, organizational names, geographic administrative entities around the world, and street addresses within Chicago. The method for such markup would require careful balance of human editorial judgment and automated support for bulk editing, review, and correction. These texts may be well served by judicious automated exploration to find areas where contextual markup would be beneficial and cost-effective. For instance, we could begin by finding patterns in mentions of what we already know to be major streets, such as Halsted Street. Contextual markup will bring together articles in ways that the existing metadata would miss, and will also provide points of connection with other information resources, such as historical maps and biographical dictionaries.

Background and expertise

I taught myself SGML/XML in the late 1990s in order to deliver the text? of the Encyclopedia of Chicago in electronic form for both print and? electronic publishing. Though our schema was not TEI, I got a lot out of ?the Gentle Introduction and became interested in TEI. I became a? subscriber for a year, attended the 2002 members meeting in Chicago, and ?purchased (and read) a paper copy of the P4 guidelines. Although I ?played with marking up some documents of interest to me, I had not found? a way to start a TEI project formally within the Newberry Library until? this year, when the Press Survey project received funding. I attended ?the TEI workshop at Northwestern in January 2009.