Difference between revisions of "Routine Encoding Procedures"

From Digital Scholarship Group
Jump to navigation Jump to search
Line 115: Line 115:
  
 
When you have finished entering all corrections, return the OT to its normal location, and then file the proofreading printout in the same filing cabinet in the section labeled "Completed."
 
When you have finished entering all corrections, return the OT to its normal location, and then file the proofreading printout in the same filing cabinet in the section labeled "Completed."
 +
 +
==Checking Round (first and second rounds)==
 +
 +
Select a text from Trello that is currently in the Checking Round for either the first or second round. Compare the handwritten proofing marks with the electronic version of the text to ensure that they match. Be sure to keep track of how far you have read on the proofing cover sheet. When you are finished with this round, move the text to the next phase on Trello, and place the physical proofread copy in the appropriate place in the filing cabinet. 
  
 
==Renovation==
 
==Renovation==

Revision as of 09:53, 27 June 2014

This page documents routine encoding procedures. If there is a procedure that you would like documented, simply add it to the list below and we'll write up a brief set of instructions.

Acquiring a new text to encode

When you are ready to begin encoding a new text, visit the Text Tracking Trello Board (you may need to log in to see the board). There you will find a list of texts at all stages of transcription, proofreading, and review. Texts that are available for first capture will be listed in the first column, titled "Unencoded Texts," and will have a green label.


To claim a given text, add yourself as a "member" of the card, change the label to blue, and move the card to the appropriate column (in most cases, the "Capture" column). Now you can begin your work!

Document analysis

Before you start encoding the text, skim through it to familiarize yourself with its overall structure and contents. Then, on a piece of scratch paper or on the computer, do the following:

  • Sketch the overall structure of the text in outline form, including any front and back matter.
  • In the front matter section, list all of the main sections (title page, table of contents, errata, prefatory sections, etc.)
  • In the body section, list all of the main divisions and think about what type of division they would be (chapters? poems? letters? look in the WWP internal documentation for a list of options). Do they contain sub-divisions?
  • In the back matter section (if present), list all of the main sections (advertisements? index? colophon?)

If this is the first text you've encoded, go over your outline with Julia before you start encoding the text.

Starting the encoding process

To begin encoding your text, find the "tadpole" file for the text in the under_construction directory on the server. (A "tadpole" is like a stub for an encoded text: it has a TEI header but not much body.) In the Subversion client, "update" the tadpole file. Then open the tadpole in Oxygen, and begin your transcription and encoding of the text. A good approach is often to:

  • start by encoding the overall structure of the text, using <front>, <body>, <back>, and <div>, without actually transcribing the words
  • insert the page break and milestone information (which is easier at this stage)
  • then start from the beginning of the main text, transcribing the content
  • leave the table of contents, title page, and other exceptional stuff for last

Be sure to validate your text regularly and save often.

When you complete your work each day:

  • enter a <change> element in the <revisionDesc> for your file, with a brief description of what you did. This enables us to transfer the text to someone else if you stop working on it. 
  • in the Subversion client, "commit" your file, remembering to add your encoder key and a brief description of changes made. Committing your changes makes them official and will also avoid conflicts with changes made by other people.

Finishing the transcription and encoding process

Once you have finished encoding your text, you should run through a series of checks designed to catch typographical errors, misnumbered pages, and a variety of other problems that can creep into the transcription process. Following this checklist should help you produce a clean, relatively error-free text for proofreading.

  1. Add genre information to the <catRef> element in the <teiHeader>. To do this, remove the id="G-NONE.fix-me" and change the target= attribute from G-NONE.fix-me to the appropriate value listed in the file \~/tb/distribution/taxonomy.xml. If your text falls into more than one generic category, you may use multiple <catRef> elements, with type="other" to indicate these additional genres.
  2. Validate your file (Command-shift-v in Oxygen, C-c C-v in emacs). Fix all remaining validation errors before proceeding.
  3. Remove comments that are in your file, with the exception of the comment at the very beginning of the document (the one that begins <\!-\- $Header ...). This comment should NOT be removed. All other comments, including those that are part of the <teiHeader> that comes in your tadpole file, should be removed.

Some additional error checking tools exist for emacs which are no longer used in Oxygen; see the list if you need to refer to these. Once you have completed each of these tasks, you are finished with the transcription and encoding process for your text. Create a change log entry in your text that indicates you have completed validation, general cleanup, and supravalidation of your file. Then check your file back in. You're all done!

Important: Be sure to indicate that you have completed capture for your text on the Trello Text Tracking Board, by simply dragging and dropping your text's card into the next column. Trello will automatically record the date and time of this move. Finally, change your text's Trello card label to green, which indicates that the text is now free for other encoders to pick up. This is very important, because until you indicate that you have finished, it is assumed that you are still working on your text.

Proofreading

All encoded texts are proofread multiple times before they are published. Generally, we perform two proofreading rounds, though in some cases three or more are required to ensure a given text has been accurately encoded and transcribed. Proofreading requires meticulous attention to detail; when proofreading, your priority should be accuracy rather than speed.

To locate a text to proofread, look for cards labeled green in the "1st Proof" or "2nd Proof" columns in the Text Tracking Trello Board. To claim a text, simply change the label color to blue and add a short note on the card (ex. Edie Guarez began proofing.). Trello will automatically record the date and time of each note you create.

Important: Except in extremely rare circumstances, you should never proofread a text that you were responsible for encoding! If the only texts available for proofreading are texts that you encoded, check with a WWP staff member before claiming one of them.

A complete printout of every text ready for proofreading can be found in the top drawer of the short filing cabinet in the WWP main office. The printout will have a cover sheet that indicates the status of file (e.g. first or second proofing), the author's name, the title, and the OT number. Before beginning to proofread, be sure to fill in the "Proofreader" field on the cover sheet with your current name.

When noting errors, try to use standard proofreading marks. A list of common proofreading marks/symbols can be found here.

First proofing round

The first proofreading stage should cast a wide net, looking for errors in both transcription and encoding throughout the document --- including the {{<teiHeader>}} and all encoded content. This means that you should be looking not just for typographical errors in the transcription, but also for more systematic errors and omissions in the way the document has been encoded --- in other words, everything from renditional information to structural markup.

Once you have claimed a text to proofread, you should retrieve the printout and the OT that was used to encode it. You will use the OT throughout the proofreading process as the master text against which to compare the printout.

When you encounter an error, you should mark it using a red pen (available in the WWP main office on top of the short filing cabinet) at the point where it occurs. You should also indicate what correction you believe should be made. This applies to both transcription and encoding errors.

For example, consider the following line as it appears in the original OT compared to the printout:

WHO has seen and has not admired our beautiful Savan­
<p><hi rend="case(smallcaps)">Who</hi> has seen and has not admired our beautyful <placeName>Savan­

In this instance, the word "beautiful" has been incorrectly transcribed as "beautyful." The error should be marked and corrected in the printout.

Similarly, encoding errors need to be marked and corrected as well. Consider the case of a <head> element that is center-aligned and rendered in capital letters in the OT, but that has been encoded as follows:

<head>The British Partizan,<lb/>A tale of the times of old</head>

Clearly this encoding does not capture the proper rendition for the <head> element. The first thing to do in a case like this is to check whether a renditional default has been set in the <tagsDecl> within the <teiHeader>. If there is not renditional default specifying that all headings should be center-aligned and capitalized, you should mark the printout and indicate the proper rendition.

For the sake of consistency in proofreading and corrections entry, you should use standard proofreading marks when they are appropriate.

For errors that appear consistently throughout an encoded text (i.e. personal names that have been encoded only with <name>, the frequent use of <emph> where <mcr> would be more appropriate, etc.) you should make a note on the proofreading cover sheet indicating that this is a global problem. You should continue to mark the error wherever you find it in the printout, but you need not add a correction every time --- since whoever enters corrections later will see from the cover sheet that she/he will need to fix these particular errors wherever they occur.

In addition to looking for errors in the encoding on the page, you should also pay attention to what isn't on the page -- the things that might not have been encoded at all but should have been (catch words, page numbers, or signature marks that were omitted; names that were not encoded using <name>, <persName>, or <placeName>; renditional information that was not captured, etc.).

Once you have finished proofreading the entire transcribed text, indicate that you have completed proofreading on the cover sheet (be sure to include the date!). Enter any final comments or notes in the "Notes" field on the cover sheet. Next, add a note to the text's card on Trello, move the card to the appropriate "Correx" column on the Trello board, and change the card's label to green to indicate the text is now available for other encoders. Place the printout of the file in the short filing cabinet in the correct section ("Correx 1" or "Correx 2"), and return the OT to its proper location.

Second proofing round

Like the first proofing round, the second round of proofreading aims to catch all errors in transcription and encoding -- with the added burden that this is, in many cases, the last full round of proofreading that will take place. For that reason it is especially important to be as careful and thorough as possible. The accuracy of our published texts depends on the accuracy of this second proofreading round!

In recent years, we have experimented with "tagless" printing for the second proofreading round -- that is, printing encoded files so that they look like they will appear on the Web following publication, without visible XML tags. This sometimes makes it easier to spot typographical errors and problems with the encoding of rendition. At the same time, it means that there is no way to check that the basic structural encoding of a file is correct.

For the time being, this means that you may encounter both "tagless" and "tagged" second proofing files, and that you should be prepared to proofread either kind.

The basic procedure for second proofreading is the same as that for the first proofreading round. In general, texts that enter the second proofing round should be fairly clean, with minimal errors. If you encounter a second proofing text that seems to have extensive errors, it's probably a good idea to check with a WWP staff member to see if there is something special about that text.

Corrections entry (first and second rounds)

Corrections entry (a.k.a. "correx entry" or just "correx") is the process of fixing the errors uncovered during each proofreading round. There is one round of corrections entry immediately following every round of proofreading.

Important: Except in extremely rare circumstances, you should never enter corrections for a text that you were responsible for proofreading! If the only texts available for corrections entry are texts that you proofread, check with a WWP staff member before claiming one of them.

To find a text whose corrections need entering, look for texts in the "1st Correx" or "2nd Correx" columns in Trellothat have a green label. To claim a text, simply add a note to the card (e.g. Billy Johnson began correx). Trello will automatically record the date and time of the note. The printout from which you will be entering corrections can be found in the short filing cabinet in the WWP main office in the appropriately labeled sections.

Once you have claimed your text, open the corresponding file. Enter whatever corrections are marked on the proofreading printout or proofreading cover page, paying particular attention to any notes or comments that the proofreader may have left. Once you have entered all of the corrections, validate and save your file, enter a change-log comment, and commit your changes. Then move the Trello card to the next column and change the label to green to indicate that the text is available for other encoders to pick up.

When you have finished entering all corrections, return the OT to its normal location, and then file the proofreading printout in the same filing cabinet in the section labeled "Completed."

Checking Round (first and second rounds)

Select a text from Trello that is currently in the Checking Round for either the first or second round. Compare the handwritten proofing marks with the electronic version of the text to ensure that they match. Be sure to keep track of how far you have read on the proofing cover sheet. When you are finished with this round, move the text to the next phase on Trello, and place the physical proofread copy in the appropriate place in the filing cabinet.

Renovation

Renovation is the process for updating files that were originally encoded using markup that is no longer valid in the present era of P4/P5 TEI. Renovation typically involves examining a file for obsolete or deprecated encoding, changing this encoding in keeping with our current practices, and then validating and supravalidating the file to catch additional errors.

As of September, 2008, we are no longer actively renovating texts.