Routine Encoding Procedures

From Digital Scholarship Group
Jump to navigation Jump to search
This page documents routine encoding procedures. If there is a procedure that you would like documented, simply add it to the Ideas board and we'll write up a brief set of instructions.

Acquiring a new text to encode

When you are ready to begin encoding a new text, visit the Text Tracking Trello Board (you may need to log in to see the board). There you will find a list of texts at all stages of transcription, proofreading, and review. Texts that are available for first capture will be listed in the first column, titled "Unencoded Texts," and will have a green label.


To claim a given text, add yourself as a "member" of the card, change the label to blue, and move the card to the appropriate column (in most cases, the "Capture" column). Now you can begin your work!

Document analysis

Before you start encoding the text, skim through it to familiarize yourself with its overall structure and contents. Then, on a piece of scratch paper or on the computer, do the following:

  • Sketch the overall structure of the text in outline form, including any front and back matter.
  • In the front matter section, list all of the main sections (title page, table of contents, errata, prefatory sections, etc.)
  • In the body section, list all of the main divisions and think about what type of division they would be (chapters? poems? letters? look in the WWP internal documentation for a list of options). Do they contain sub-divisions?
  • In the back matter section (if present), list all of the main sections (advertisements? index? colophon?)

If this is the first text you've encoded, go over your outline with the project manager before you start encoding the text.

Starting the encoding process

To begin encoding your text, find the "tadpole" file for the text in the under_construction directory on the server. (A "tadpole" is like a stub for an encoded text: it has a TEI header but not much body.) In the Subversion client, "update" the tadpole file. Then open the tadpole in Oxygen, and begin your transcription and encoding of the text. A good approach is often to:

  • start by encoding the overall structure of the text, using <front>, <body>, <back>, and <div>, without actually transcribing the words
  • insert the page break and milestone information (which is easier at this stage)
  • then start from the beginning of the main text, transcribing the content
  • leave the table of contents, title page, and other exceptional stuff for last

Be sure to validate your text regularly and save often.

When you complete your work each day:

  • in the Subversion client, "commit" your file, remembering to add your encoder key and a brief description of changes made. Committing your changes makes them official and will also avoid conflicts with changes made by other people.

When you reach a major milestone in encoding:

  • enter a <change> element in the <revisionDesc> for your file, with a brief description of what you did. This enables us to transfer the text to someone else if you stop working on it.

Looking up a language code for @xml:lang

Short answer: look up codes (which are called “subtags”) in the official registry.

Long answer: @xml:lang is an attribute defined by the XML specification, and described for us by the TEI. Thus to find out how to use this attribute, you could read its documentation in the TEI Guidelines. But that documentation points out that, for TEI (and thus for us), the datatype of the value of @xml:lang is teidata.language. And the documentation for that datatype actually tells you how to construct a proper language tag. So this is probably the best place to look.

Finishing the transcription and encoding process

Once you have finished encoding your text, you should run through a series of checks designed to catch typographical errors, misnumbered pages, and a variety of other problems that can creep into the transcription process. Following this checklist should help you produce a clean, relatively error-free text for proofreading.

  1. Validate your file (Command-shift-v in Oxygen). Fix all remaining validation errors before proceeding.
  2. Remove comments that are in your file, with the exception of the comment at the very beginning of the document (the one that begins <!-- $Header ...). This comment should NOT be removed. All other comments, including those that are part of the <teiHeader> that comes in your tadpole file, should be removed.

Some additional error checking tools exist for emacs which are no longer used in Oxygen; see the list if you need to refer to these. Once you have completed each of these tasks, you are finished with the transcription and encoding process for your text. Create a change log entry in your text that indicates you have completed validation and general cleanup of your file. Then check your file back in. You're all done!

Important: Be sure to indicate that you have completed capture for your text on the Trello Text Tracking Board, by simply dragging and dropping your text's card into the next column. Trello will automatically record the date and time of this move. Finally, change your text's Trello card label to green, which indicates that the text is now free for other encoders to pick up. This is very important, because until you indicate that you have finished, it is assumed that you are still working on your text.

Digital Proofing processes

In the fall of 2020, the WWP is moving to digital proofing processes, at least for the duration of the pandemic. The section below documents these digital processes; it should be read in combination with the sections that follow on best practices for proofreading, corrections entry, and checking more broadly. As we get a better sense of how the digital proofing goes, we may move update our processes and documentation more permanently.

Digital Proofreading

  • When you start, put in a change log entry, i.e. a <change> element in the <revisionDesc>.
  • Switch to “Author” mode with the buttons at the bottom center of your Oxygen window. Make sure that you are viewing “Full Tags with Attributes”; you can check this by clicking on the yellow triangles in the left of your screen, just above the edit window and document tabs (click here for screenshot).
  • Minor changes (i.e., typos or anything short that’s related just to the content, not the markup) can be made directly in the document.
  • Other changes, ones related to the markup or significant in nature, should be indicated with comments:
    • To insert a comment, use the <soCalled>insertion pop-up</soCalled> which can be accessed on Mac OS X and GNU/Linux by just typing ENTER or RETURN; then select “comment” from the pop-up menu.
    • preface each comment with your initials
    • place the comment on the right of the text, on the same line as the change you are marking
  • If you’re not sure whether something is a major or minor change, it’s always better to err on the side of marking it with a comment.
  • Any systematic changes (for example, “review all <hi> elements and see if any need to be <mcr>”) can be described in comments near the top of the document, right after the <text> start tag.
  • See more on our proofreading processes in the sections below; please do read through these carefully, especially the tips and best practices. There are two key changes to our proofing from what is described below: first, our new proofing is done electronically and second, we typically now do just one round of proofing.
  • When you’re done with a session, mark your progress in the document with a comment that says: <!-- PROOFED2HERE -->
  • You should also make sure to indicate your progress in your Subversion commit messages (e.g. “proofed to XML line 24,601”). Please also specify in your messages when you start and begin proofing. This is how the correx entry person will know which versions of the text to review. And, of course, don’t forget your key!
  • When you finish proofing the whole document, put in another change log entry.

Digital Correx Entry

The correx entry person first reviews the minor changes.

  • When you start, put in a change log entry, i.e. a <change> element in the <revisionDesc>.
  • You can work in the usual text view (no need to work in Author mode).
  • Go to Subversion, select your file, then switch to the “History” tab.
  • In the “History” view, click on the most recent commit from the proofer and then control/command-click (Mac users, be sure to command-click) on the last commit from the encoder (usually this will say something about having completed Supravalidation). You might need to read the commit messages carefully to make sure you have the right ones—you're looking for the last changes made by both the encoder and the proofer.
  • With both commits selected, click on the ‘compare’ button near the middle of the top bar (click here for screenshot).
  • This will bring up a comparison view in the bottom of the window.
  • Before you begin comparing, if you haven’t done so before, you will want to tell the comparison tool to ignore comments and whitespace:
    • Hit the gear near the top left (click here for screenshot).
    • At the top, make sure “Ignore Whitespaces” is selected; scroll down to the the XML Diff section and also make sure that processing instructions and comments are selected under the “Ignore” list (click here for screenshot).
    • After you hit “OK” you will need to re-run the comparison with the red play triangle in the box two items over from the gear (click here for screenshot). It might also tell you that you need to quit and re-launch Oxygen; if you need to do this, you can just follow the steps above to get back to the comparison and it should then ignore comments.
  • You can use the arrows in the middle of the comparison window to navigate through the changes. If you agree with the changes, there’s nothing you need to do, but if there are any you disagree with, you should go back into the file and add a comment on why you don’t agree with the change. Remember to preface any comments you add with your initials.
  • When you're done with a session, mark your progress in the document with a comment that says: <!-- CORREX2HERE -->
  • Also, for all of your correx entry, make sure to indicate your progress in your Subversion commit messages (e.g. "correx entry to XML line 314"). And, of course, don't forget your key!

The corex entry person then reviews and resolves the major changes marked in comments.

  • First, check for any systematic/global comments at the top of the document and resolve those.
  • Search through the file for comments and resolve each one.
  • If you have additional thoughts or questions yourself, add a comment, prefaced with your initials.
  • Bring questions to encoding meetings, as usual.
  • You can also resolve any issues you’d marked from the initial review of the minor changes; if any seem complex or in need of discussion, bring those to a meeting.
  • When you finish, put in a change log entry.

Digital Checking

The checking person looks at the commits from the correx entry person and makes sure that they agree with all the changes and that changes were made correctly

  • When you start, put in a change log entry.
  • Go to Subversion, and click on your file, then click on the “History” tab.
  • Where you want to ignore comments in corrections entry, you do want to see those in checking, so go to the section above to see how to toggle checking of comments off and on (you always want to ignore whitespace, though).
  • In the History view, click on the most recent commit from the correx person and then control/command-click (Mac users, be sure to command-click) on the last commit from the proofing person.
  • With both commits selected, click on the ‘compare’ button near the middle of the top bar (click here for screenshot).
  • In the comparison view, use the arrows to navigate between the changes and make sure that all of the major corrections were input properly and that you agree with all of the revisions.
  • If you see any small errors, go ahead and fix them directly in the file; for things that need discussion, mark them with a comment (make sure to preface with your initials).
  • If for some reason there are a lot of irrelevant changes between the proofing and correx entry commits, you can also check each of the correx commits separately, by control-clicking on each one in the history view (go down to the specific file) and then selecting “Compare with previous version.” (Click here for screenshot.)
  • When you're done with a session, mark your progress in the document with a comment that says: <!-- CHECKED2HERE -->
  • And also, make sure to indicate your progress in your Subversion commit messages (e.g. "checked to XML line 42"). And, of course, don't forget your key!
  • When you finish, put in a change log entry.

Proofreading

All encoded texts are proofread multiple times before they are published. Generally, we perform two proofreading rounds, though in some cases three or more are required to ensure a given text has been accurately encoded and transcribed. Proofreading requires meticulous attention to detail; when proofreading, your priority should be accuracy rather than speed.

To locate a text to proofread, look for cards labeled green in the "1st Proof" or "2nd Proof" columns in the Text Tracking Trello Board. To claim a text, simply change the label color to blue and add a short note on the card (ex. Edie Guarez began proofing.). Trello will automatically record the date and time of each note you create.

Important: Except in extremely rare circumstances, you should never proofread a text that you were responsible for encoding! If the only texts available for proofreading are texts that you encoded, check with a WWP staff member before claiming one of them.

A complete printout of every text ready for proofreading can be found in the top drawer of the short filing cabinet in the WWP main office. The printout will have a cover sheet that indicates the status of file (e.g. first or second proofing), the author's name, the title, and the OT number. Before beginning to proofread, be sure to fill in the "Proofreader" field on the cover sheet with your current name.

When noting errors, try to use standard proofreading marks. A list of common proofreading marks/symbols can be found here.

Getting started with proofing

First Steps, Paper Proofing

  • The first you need to do is check out the OT and the proofing printout folders (and find a red pen!)
  • Keep track of your progress on the proofing on the cover page (for digital proofing, use comments)

First Steps, General

  • Look for the renditional defaults and write them on a notecard or post-it for reference as you are proofing
  • Take a minute or two to get a sense of the text so that you are going into your proofing informed
  • You are looking for: typos, misused mark-up, and missing mark-up


Best Practices, Paper Proofing

  • Be consistent in the way you identify errors in the text (arrows, stars, circles)
  • In longer texts, it is helpful to circle the page numbers (in the top corner) to help the checker identify which pages contain errors
  • Make sure to circle any line numbers with edits (particularly with smaller or less-obvious ones)
  • It is also helpful to sign your name on proofing edits that are a little more complicated or confusing so that later encoders can find you if they have questions or can see where multiple people are commenting on the text
  • Be very unambiguous when you cross out or change your mind about a proofing edit
  • For consistent and repeated errors—mark them on the cover sheet and then circle the errors in the text rather than writing out the correction over and over
  • Write out rend defaults on back of proofing sheet
  • Leave clear notes for the corrections entry person and checker on the cover page
  • Try to use the same ink color for proofing, consistency makes things more clear and easy to follow for correx entry and checking
  • Make sure if you circle or underline something, it is clear what change needs to be made
  • As the proofer, avoid making changes in the file at all. Ideally the proofer isn't making changes in xml file, especially once something has been marked in the print-out copy, it is best not to fix it in the file because it can add an additional layer of confusion for correx entry and checking.

Best Practices, General

  • Err on the side of being overly detailed with your edits, so that it is very clear for the checker
  • Pay attention to what the rendition is saying and make sure that is actually reflected in the text, especially where this is complex
  • There's no need to write out any whole missing lines lines, just indicate which lines are missing example: "2-3 lines missing here; see OT"


Common Issues

  • Incorrect spacing around and inside of elements
  • Missing lines
  • Incorrect line breaks
  • Watch out for a divisionally weird text—keeping track on where divisions begin and end
  • Watch out for pages that have a lot of renditional information like title pages

First proofing round

The first proofreading stage should cast a wide net, looking for errors in both transcription and encoding throughout the document. This means that you should be looking not just for typographical errors in the transcription, but also for more systematic errors and omissions in the way the document has been encoded—in other words, everything from renditional information to structural markup.

For example, consider the following line as it appears in the original OT compared to the printout:

WHO has seen and has not admired our beautiful Savan­
<p><hi rend="case(smallcaps)">Who</hi> has seen and has not admired our beautyful <placeName>Savan­

In this instance, the word "beautiful" has been incorrectly transcribed as "beautyful." The error should be marked and corrected in the printout.

Similarly, encoding errors need to be marked and corrected as well. Consider the case of a <head> element that is center-aligned and rendered in capital letters in the OT, but that has been encoded as follows:

<head>The British Partizan,<lb/>A tale of the times of old</head>

Clearly this encoding does not capture the proper rendition for the <head> element. The first thing to do in a case like this is to check whether a renditional default has been set in the <tagsDecl> within the <teiHeader>. If there is not renditional default specifying that all headings should be center-aligned and capitalized, you should mark the printout and indicate the proper rendition.

For the sake of consistency in proofreading and corrections entry in paper proofing, you should use standard proofreading marks when they are appropriate.

For errors that appear consistently throughout an encoded text (i.e. personal names that have been encoded only with <name>, the frequent use of <emph> where <mcr> would be more appropriate, etc.) you should make a note on the proofreading cover sheet or add a comment at the top of the document indicating that this is a global problem.

In addition to looking for errors in the encoding on the page, you should also pay attention to what isn't on the page -- the things that might not have been encoded at all but should have been (catch words, page numbers, or signature marks that were omitted; names that were not encoded using <name>, <persName>, or <placeName>; renditional information that was not captured, etc.).

For paper proofing: once you have finished proofreading the entire transcribed text, indicate that you have completed proofreading on the cover sheet (be sure to include the date!). Enter any final comments or notes in the "Notes" field on the cover sheet. For digital proofing: add all your notes as comments at the top of the document, just after the <text> start tag, and make sure that you're marking that you've finished both in the change log and in the commit message.

Next, add a note to the text's card on Trello, move the card to the appropriate "Correx" column on the Trello board, and change the card's label to green to indicate the text is now available for other encoders. For paper proofing: place the printout of the file in the short filing cabinet in the correct section ("Correx 1" or "Correx 2"), and return the OT to its proper location.

Second proofing round

Like the first proofing round, the second round of proofreading aims to catch all errors in transcription and encoding—with the added burden that this is, in many cases, the last full round of proofreading that will take place. For that reason it is especially important to be as careful and thorough as possible. The accuracy of our published texts depends on the accuracy of this second proofreading round!

The basic procedure for second proofreading is the same as that for the first proofreading round. In general, texts that enter the second proofing round should be fairly clean, with minimal errors. If you encounter a second proofing text that seems to have extensive errors, it's probably a good idea to check with a WWP staff member to see if there is something special about that text.

Corrections entry (first and second rounds)

Corrections entry (a.k.a. "correx entry" or just "correx") is the process of fixing the errors uncovered during each proofreading round. There is one round of corrections entry immediately following every round of proofreading.

Important: Except in extremely rare circumstances, you should never enter corrections for a text that you were responsible for proofreading! If the only texts available for corrections entry are texts that you proofread, check with a WWP staff member before claiming one of them.

To find a text whose corrections need entering, look for texts in the "1st Correx" or "2nd Correx" columns in Trellothat have a green label. To claim a text, simply add a note to the card (e.g. Billy Johnson began correx). Trello will automatically record the date and time of the note. The printout from which you will be entering corrections can be found in the short filing cabinet in the WWP main office in the appropriately labeled sections.

Once you have claimed your text, open the corresponding file. Enter whatever corrections are marked on the proofreading printout or proofreading cover page, paying particular attention to any notes or comments that the proofreader may have left. Once you have entered all of the corrections, validate and save your file, enter a change-log comment, and commit your changes. Then move the Trello card to the next column and change the label to green to indicate that the text is available for other encoders to pick up.

When you have finished entering all corrections, return the OT to its normal location, and then file the proofreading printout in the same filing cabinet in the section labeled "Completed."

Checking (first and second rounds)

Select a text from Trello that is currently in the Checking Round for either the first or second round. Compare the handwritten proofing marks with the electronic version of the text to ensure that they match. Be sure to keep track of how far you have read on the proofing cover sheet. When you are finished with this round, move the text to the next phase on Trello, and place the physical proofread copy in the next section in the filing cabinet.

Renovation

Renovation is the process for updating files that were originally encoded using markup that is no longer valid in the present era of P4/P5 TEI. Renovation typically involves examining a file for obsolete or deprecated encoding, changing this encoding in keeping with our current practices, and then validating and supravalidating the file to catch additional errors.

As of September, 2008, we are no longer actively renovating texts.