WUSTLProposal

From Digital Scholarship Group
Revision as of 23:37, 18 May 2014 by Stomaykopeters (talk | contribs) (→‎Background and expertise)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Participants

  • Andrew Rouner, Director of the Digital Library
  • Tim Lepczyk, Metadata Librarian
  • Shannon Showers, Digital Access Librarian
  • Perry Trolard, Assistant Director, Humanities Digital Workshop
  • Steve Pentecost, Programmer Analyst, Humanities Digital Workshop

Project description

The Washington University Libraries and the School of Arts & Sciences' Humanities Digital Workshop are the primary collaborators for an IMLS Advancing Digital Resources grant application that is currently under review. We believe the training offered in this workshop would fit perfectly with one major emphasis (the "third layer" below) of this proposed project. While the project obviously is contingent on funding by the IMLS, we plan to move forward with a scaled-down version of the project if external funding is not available, so the training from this workshop would be relevant regardless of the outcome of the grant application.

The goal of the IMLS project is to remediate and significantly expand, both in content and functionality, the St. Louis Circuit Court Historical Records collection (which currently consists of page-images and metadata). The majority of this collection consists of the little-known but legally and historically significant "freedom suits," brought by enslaved persons in the St. Louis Circuit Court, the most famous of which were brought by Dred and Harriet Scott. The project has several "layers" of deliverables, the most fundamental of which is the transcription and encoding in TEI XML of all the documents, along with an expansion of the collection through the addition of related legal materials and contextual data (i.e. city directories).

This idea for the project arose largely through the experience of the DLS in attempting to move one part of the St. Louis Circuit Court Historical Records project (the Dred Scott collection which had already been transcribed) from HTML to TEI XML. As part of this remediation, we wanted to encode the documents with markup that would reflect the structure and function of the legal documents, but found no suitable standard. While an ad hoc solution was developed, but we became convinced we had found a significant gap in markup standards. The second layer of the IMLS proposal is the development of extensions to the TEI for the encoding of legal documents, which will then be used to encode the legal documents in the collection.

The third layer of the project is the one most directly related to the workshop on contextual information. While it would be valuable to researchers to identify, for instance, persons in their legal roles (i.e., as litigants, judges, witnesses etc.) in this collection via the legal encoding, we believe it would be even more valuable to provide a means of representing relationships between the named entities of persons, places and organizations. We plan to use named entity recognition (N.E.R.) software to identify these named entities in the majority of the resources as a basis for additional encoding. This data then will be exported into RDF/OWL files, in which relationships between all persons in the records can be represented. Our understanding is that the TEI P5 revision of, i.e., <persName> and related elements was informed by possible semantic application. Our hope is that the workshop on contextual information will address this and related issues that would help us to assess and implement this aspect of the project.

Background and expertise

All applicants have multiple (ranging from 2-9) years of experience working with SGML and XML, and specifically with the TEI. Two of the applicants have worked with TEI on previous IMLS-funded projects; one, Perry Trolard, worked on the grant-funded project to develop the TEI "tite" application. Previous institutional affiliations for team members in relation to TEI work include: the University of Virginia Library (Electronic Text Center), the University of Richmond Library (in collaboration with the Perseus Project) and the University of Tennessee Library. The Humanities Digital Workshop is currently engaged in several TEI-based projects, including an NEH-funded project to create a digital archive of the works of Edmund Spenser, and the Race and Children's Literature in the Gilded Age project. DLS is similarly engaged in several projects, many of which (including the Unreal City, Eyes on the Prize transcripts, and the previously mentioned Revised Dred Scott project) are available on the Washington University Digital Gateway website (http://digital.wustl.edu/). Other projects currently in development include the Red Brush project, which is an archive of the original Chinese (encoded in TEI) of the translations found in the book Red Brush: Women Writers of Imperial China (Harvard Asia Center, 2004) by Wilt L. Idema and Beata Grant (http://digital.wustl.edu/r/red/index_eng.html) the Eyes on the Prize Transcripts II project, and another archive of Chinese legal texts, treated by Professor Robert Hegel in his recent book, True Crimes.