Getting started with encoding

From Digital Scholarship Group
Jump to navigation Jump to search

Helpful Links

The Encoding Process

  • Document Analysis should be the first thing you do
  • Encoding Steps
    • Update your working directory in Subversion:
      • launch Oxygen
      • in the Tools menu, choose “SVN Client”
      • Control-click on the WWPtextbase/ directory and choose “Update” from the menu that pops up
      • It should either say “no incoming changes” or “Operation successful”.
    • Find your tadpole
      • navigate to your ~/Documents/WWPtextbase/ folder
      • in the under_construction/ directory, locate your tadpole
      • open it in Oxygen (control-click on the file and choose to open with Oxygen right from Subversion, or navigate to the file in the "documents" folder and open it from there)
    • Encode the basic document structure:
      • if your text doesn’t have any front or back matter, delete <front> and <back>
      • enter the divisions and other major structural elements inside the <front>, <body>, and <back> elements as appropriate
    • Start transcribing
      • omit the title page and table of contents for now
      • enter paragraphs and container elements like <quote> or <lg> before typing in their contents
      • fix any validation errors as soon as you see them
      • save often
    • Commit your changes
      • at the end of your session, check to make sure your document is well-formed
      • if you're at a milestone, fill in a change log
      • then, go back into the SVN client
      • update the WWPtextbase directory again
      • navigate to your tadpole file
      • control-click on it and choose “Refresh” (⌘-R); you should see a little star appear next to your file
      • control-click on your file again and choose “Commit” (⌘-M); enter a message that has at least your personal key if not a useful message, and then approve the dialog box.

Encoding Tips

  • Thinking about Encoding
    • Know that encoding can get very detailed: you’re marking up not just the basic organization of your text, but also a lot of other phenomena; you can expect that there will be some things you didn’t know you needed to encode that you will have to add in once you learn about them
    • Think about what you’re doing as building a set of layers, rather than writing a stream; XML is really a set of enclosures, not a linear flow of details—and know that getting your brain wrapped around this way of thinking can be challenging!
    • Don’t be afraid to say “I have no idea what I’m doing”
    • Map out the book you’re working on before you get started; revisit document analysis as needed
    • Keep an eye out for the tendency to miss spaces around elements
    • Pay attention to the ways that punctuation should be used around elements; know that it might feel weird/different from non-encoded texts you’re used to working with
    • Try to get some genre variety as you’re starting to encode; don’t be stuck in prose forever
  • Best Practices
    • Do a review/read through of the documentation once you feel like you have your feet under you; it helps if, rather than thinking of the internal documentation as a thing you go to when you have a problem, you treat it as something you can learn from and read through just to see what’s there and build on your knowledge
    • Look at other people’s texts as well, think about why people made the decisions that they made
    • Continually validate (⌘-shift-V); it is OK to check in an invalid file, but problematic to check in an ill-formed file, thus check well-formedness (⌘-shift-W) before you commit [That said, if you really have to leave before you can fix an ill-formedness error, go ahead and check it in—we try to avoid checking-in ill-formed files, but leaving a file not checked in at all is usually worse.]
    • Don’t forget to fill out your change logs for major milestones in encoding or proofing
    • Left margin tidiness makes things much easier to read; the convention is that documents go in/right as you get further into the hierarchy
  • Process Suggestions
    • Skip the title pages if it’s your first text. You can do those later, when you’re more comfortable with encoding.
    • It’s helpful to paste in a bunch of <lb>s at once and then arrow down for each new line
    • In the same way, you can fill in groups of <l>s with poetry
    • Make yourself a template for page breaks and then copy-paste that in as you need it; it’s often easiest to set yourself up with a bunch of page breaks at once, rather than interrupting your encoding. Your template might look something like this:
<mw type="sig"></mw>
<mw type="catch"></mw>
 <milestone unit="sig" n=""/>
 <mw type="pageNum"></mw> 

What to do if you don't know what to do

  • Look for error messages. For tips on how to read error messages, see: this guide
  • Check the internal documentation
  • Search in files to find example documents that have similar situations (knowing that it’s best to look at recently-encoded files, since some practices have changed). Start with the list of texts that were added since the WWP came to Northeastern
  • Put your problem in a comment so you’re not stuck if you can’t figure out what to do. To surround with a comment, select the text you want and hit: ⌘-shift-, (command-shift-comma)
  • Bring your question to a meeting
  • Ask your mentor
  • Check the listserv to see if your question has been discussed in the past (DSGTAG-L)
  • Know that you can email people with questions; you don’t have to wait to grab someone in person

Let Oxygen help you work

  • Pay attention to color in Oxygen. Element start and end tags are blue; attributes are orange, values are brown, comments are bright green, regular text is black, errors are red.
  • To add a new element, type a left pointy bracket (<) and wait a second. Oxygen will give you a list of elements to choose from (only ones that would be valid in your current location in the document). You can start typing the name of the element you want to narrow down the list and then hit enter or double-click to pick the right element.
  • To add an attribute to an element, put your cursor in the start tag, but not inside the element name or within another attribute, and hit the spacebar. You’ll get a list that will work the same as with elements and will be able to choose from that.
  • To surround text you already have with start and end tags, highlight the text you want and then hit ⌘-E. Again, Oxygen is smart enough to only give you options that are valid.
  • A lot of the time, you can get what you want from Oxygen by typing the beginning of what you need and then giving the program a second to think. So, if you want to add a comment, type <! and wait a second, then choose the “XML Comment” option from the drop down.
  • Check out the Oxygen tips for more information.

Workflow and Tracking Processes

  • Trello
    • The Text Tracking board in Trello shows you all of the texts we are currently working on, who has been working on them, and their status in the encoding & publication processes.
    • When you start working on a text:
      • make sure to add yourself as a member to its card (click on the card, click “Members” from the right-hand list of options, then click on your initials/icon)
      • change the label to “Text is in the Hands of an Encoder” (click on “Labels” right below “Members” and click on the label you want; to turn a label off, just click it a second time)
      • if necessary, move the text to the appropriate list (so, if you are the first person to claim a text, move it from “Unencoded Texts” to “Capture” when you begin encoding)
      • You can use the comments and checklist options to record questions that you have about your text or anything that you think will be useful to people who work on it later.
    • When you’re done with a text, take yourself off as a member of the card and move it over to the next list. You should also change the label (to green/“unassigned” for texts moving into corrections or checking; or to red/“needs admin attention” if a text is moving into proofing or final review). It’s helpful, though not necessary, to add a comment requesting that the text be printed, if printing is needed.
    • There are also projects available for people to claim on the left-hand side of the Text Tracking Board.
    • We also have an “Ideas” board on Trello, which you can use to add cards for ideas that you have or to see (and vote) on the ideas that other people have suggested. These are things like workshops people would like to have or requested changes to the internal documentation.
  • Change Logs
    • Record major milestones directly in the TEI file; these are <change>s and they go in the <revisionDesc> in the <teiHeader>. Each change log should include the date of the change on the @when attribute (YYYY-MM-DD), the person making the change on the @who attribute (use your key), and a quick description of the change.
    • You don’t need to do one every time you do any encoding; but they’re an important part of our record-keeping, so please remember to complete them when they’re needed (for example, when you begin encoding, when you finish encoding, when you finish entering corrections, etc.).
    • Here is an example entry: <change when="2014-02-10" who="personography.xml#jflanders.lfw">Began encoding.</change>
  • Sign-out Sheets
    • If you take any office texts or proofing printouts out of the DSC, please fill out the appropriate sign-out sheet (the sheets live in a folder on the top of the left-hand short filing cabinet, where the proofing printouts are). The office texts are not precisely irreplaceable, but they can be expensive to get, so please be careful with them, if you do take them out of the DSC.
  • Cover Sheets
    • Once you get to the point where you’re working on proofing, you should use the cover sheets on the proofing printout to track your progress and record any global issues you discover.