Routine Encoding Procedures

From Digital Scholarship Group
Jump to navigation Jump to search
This page documents routine encoding procedures. If there is a procedure that you would like documented, simply add it to the list below and we'll write up a brief set of instructions.

Acquiring a new text to encode

When you are ready to begin encoding a new text, visit the Text Tracking Trello Board (you may need to log in to see the board). There you will find a list of texts at all stages of transcription, proofreading, and review. Texts that are available for first capture will be listed in the first column, titled "Unencoded Texts," and will have a green label.


To claim a given text, add yourself as a "member" of the card, change the label to blue, and move the card to the appropriate column (in most cases, the "Capture" column). Now you can begin your work!

Document analysis

Before you start encoding the text, skim through it to familiarize yourself with its overall structure and contents. Then, on a piece of scratch paper or on the computer, do the following:

  • Sketch the overall structure of the text in outline form, including any front and back matter.
  • In the front matter section, list all of the main sections (title page, table of contents, errata, prefatory sections, etc.)
  • In the body section, list all of the main divisions and think about what type of division they would be (chapters? poems? letters? look in the WWP internal documentation for a list of options). Do they contain sub-divisions?
  • In the back matter section (if present), list all of the main sections (advertisements? index? colophon?)

If this is the first text you've encoded, go over your outline with Julia before you start encoding the text.

Starting the encoding process

To begin encoding your text, find the "tadpole" file for the text in the under_construction directory on the server. (A "tadpole" is like a stub for an encoded text: it has a TEI header but not much body.) In the Subversion client, "update" the tadpole file. Then open the tadpole in Oxygen, and begin your transcription and encoding of the text. A good approach is often to:

  • start by encoding the overall structure of the text, using <front>, <body>, <back>, and <div>, without actually transcribing the words
  • insert the page break and milestone information (which is easier at this stage)
  • then start from the beginning of the main text, transcribing the content
  • leave the table of contents, title page, and other exceptional stuff for last

Be sure to validate your text regularly and save often.

When you complete your work each day:

  • enter a <change> element in the <revisionDesc> for your file, with a brief description of what you did. This enables us to transfer the text to someone else if you stop working on it. 
  • in the Subversion client, "commit" your file, remembering to add your encoder key and a brief description of changes made. Committing your changes makes them official and will also avoid conflicts with changes made by other people.

Finishing the transcription and encoding process

Once you have finished encoding your text, you should run through a series of checks designed to catch typographical errors, misnumbered pages, and a variety of other problems that can creep into the transcription process. Following this checklist should help you produce a clean, relatively error-free text for proofreading.

  1. Add genre information to the <catRef> element in the <teiHeader>. To do this, remove the id="G-NONE.fix-me" and change the target= attribute from G-NONE.fix-me to the appropriate value listed in the file \~/tb/distribution/taxonomy.xml. If your text falls into more than one generic category, you may use multiple <catRef> elements, with type="other" to indicate these additional genres.
  2. Validate your file (Command-shift-v in Oxygen, C-c C-v in emacs). Fix all remaining validation errors before proceeding.
  3. Remove comments that are in your file, with the exception of the comment at the very beginning of the document (the one that begins <\!-\- $Header ...). This comment should NOT be removed. All other comments, including those that are part of the <teiHeader> that comes in your tadpole file, should be removed.

Some additional error checking tools exist for emacs which are no longer used in Oxygen; see the list if you need to refer to these. Once you have completed each of these tasks, you are finished with the transcription and encoding process for your text. Create a change log entry in your text that indicates you have completed validation, general cleanup, and supravalidation of your file. Then check your file back in. You're all done!

Important: Be sure to indicate that you have completed capture for your text on the Trello Text Tracking Board, by simply dragging and dropping your text's card into the next column. Trello will automatically record the date and time of this move. Finally, change your text's Trello card label to green, which indicates that the text is now free for other encoders to pick up. This is very important, because until you indicate that you have finished, it is assumed that you are still working on your text.

Proofreading

All encoded texts are proofread multiple times before they are published. Generally, we perform two proofreading rounds, though in some cases three or more are required to ensure a given text has been accurately encoded and transcribed. Proofreading requires meticulous attention to detail; when proofreading, your priority should be accuracy rather than speed.

To locate a text to proofread, look for cards labeled green in the "1st Proof" or "2nd Proof" columns in the Text Tracking Trello Board. To claim a text, simply change the label color to blue and add a short note on the card (ex. Edie Guarez began proofing.). Trello will automatically record the date and time of each note you create.

Important: Except in extremely rare circumstances, you should never proofread a text that you were responsible for encoding! If the only texts available for proofreading are texts that you encoded, check with a WWP staff member before claiming one of them.

A complete printout of every text ready for proofreading can be found in the basket on top of the filing cabinet in the encoding room (to your right as you enter the room). The printout will have a cover sheet that indicates the status of file (e.g. first or second proofing), the author's name, the title, and the OT number. Before beginning to proofread, be sure to fill in the "Proofreader" field on the cover sheet with your current name.

When noting errors, try to use standard proofreading marks. A list of common proofreading marks/symbols can be found *here.*

First proofing round

The first proofreading stage should cast a wide net, looking for errors in both transcription and encoding throughout the document --- including the {{<teiHeader>}} and all encoded content. This means that you should be looking not just for typographical errors in the transcription, but also for more systematic errors and omissions in the way the document has been encoded --- in other words, everything from renditional information to structural markup.

Once you have claimed a text to proofread, you should retrieve the printout and the OT that was used to encode it. You will use the OT throughout the proofreading process as the master text against which to compare the printout.

When you encounter an error, you should mark the using red pen at the point where it occurs. You should also indicate what correction you believe should be made. This applies to both transcription and encoding errors.

For example, consider the following line as it appears in the original OT compared to the printout:

WHO has seen and has not admired our beautiful Savan­
<p><hi rend="case(smallcaps)">Who</hi> has seen and has not admired our beautyful <placeName>Savan­

In this instance, the word "beautiful" has been incorrectly transcribed as "beautyful." The error should be marked and corrected in the printout.

Similarly, encoding errors need to be marked and corrected as well. Consider the case of a <head> element that is center-aligned and rendered in capital letters in the OT, but that has been encoded as follows:

<head>The British Partizan,<lb/>A tale of the times of old</head>

Clearly this encoding does not capture the proper rendition for the <head> element. The first thing to do in a case like this is to check whether a renditional default has been set in the <tagsDecl> within the <teiHeader>. If there is not renditional default specifying that all headings should be center-aligned and capitalized, you should mark the printout and indicate the proper rendition.

For the sake of consistency in proofreading and corrections entry, you should use standard proofreading marks when they are appropriate. The list of proofreading marks you can use can be found in the attachments section of this page.

For errors that appear consistently throughout an encoded text (i.e. personal names that have been encoded only with <name>, the frequent use of <emph> where <mcr> would be more appropriate, etc.) you should make a note on the proofreading cover sheet indicating that this is a global problem. You should continue to mark the error wherever you find it in the printout, but you need not add a correction every time --- since whoever enters corrections later will see from the cover sheet that she/he will need to fix these particular errors wherever they occur.

In addition to looking for errors in the encoding on the page, you should also pay attention to what isn't on the page -- the things that might not have been encoded at all but should have been (catch words, page numbers, or signature marks that were omitted; names that were not encoded using <name>, <persName>, or <placeName>; renditional information that was not captured, etc.).

Once you have finished proofreading the entire transcribed text, indicate that you have completed proofreading on the cover sheet (be sure to include the date!). Enter any final comments or notes in the "Notes" field on the cover sheet, then enter your initials and the date in the appropriate field on the Text Tracking page. Place the printout of the file in the basket marked "Corrections Entry" on the top of the file cabinet, and return the OT to its proper location.

Second proofing round

Like the first proofing round, the second round of proofreading aims to catch all errors in transcription and encoding -- with the added burden that this is, in many cases, the last full round of proofreading that will take place. For that reason it is especially important to be as careful and thorough as possible. The accuracy of our published texts depends on the accuracy of this second proofreading round!

In recent years, we have experimented with "tagless" printing for the second proofreading round -- that is, printing encoded files so that they look like they will appear on the Web following publication, without visible XML tags. This sometimes makes it easier to spot typographical errors and problems with the encoding of rendition. At the same time, it means that there is no way to check that the basic structural encoding of a file is correct.

For the time being, this means that you may encounter both "tagless" and "tagged" second proofing files, and that you should be prepared to proofread either kind.

The basic procedure for second proofreading is the same as that for the first proofreading round. In general, texts that enter the second proofing round should be fairly clean, with minimal errors. If you encounter a second proofing text that seems to have extensive errors, it's probably a good idea to check with John to see if there is something special about that text.

Corrections entry (first and second rounds)

Corrections entry (a.k.a. "correx entry" or just "correx") is the process of fixing the errors uncovered during each proofreading round. There is one round of corrections entry immediately following every round of proofreading.

Important: Except in extremely rare circumstances, you should never enter corrections for a text that you were responsible for proofreading! If the only texts available for corrections entry are texts that you proofread, check with a WWP staff member before claiming one of them.

To claim a text to enter corrections, simply add your initials and the date to the appropriate field, marked with a green star ((*g) ), in the table on the Text Tracking page. The printout from which you will be entering corrections can be found in the filing cabinet closest to the printer, in the second drawer from the bottom. Corrections/proofing printouts are typically in the right section of the drawer.

Once you have claimed your text, open the corresponding file. Enter whatever corrections are marked on the proofreading printout or proofreading cover page, paying particular attention to any notes or comments that the proofreader may have left. Once you have entered all of the corrections, validate and save your file, enter a change-log comment, and commit your changes. Then enter your name and the date in the appropriate column of the Text Tracking page.

When you have finished entering all corrections, return the OT to its normal location and then file the proofreading printout in the bottom drawer of the right-hand file cabinet (closest to the printer).

Renovation

Renovation is the process for updating files that were originally encoded using markup that is no longer valid in the present era of P4/P5 TEI. Renovation typically involves examining a file for obsolete or deprecated encoding, changing this encoding in keeping with our current practices, and then validating and supravalidating the file to catch additional errors.

As of September, 2008, we are no longer actively renovating texts.