EmacsTools

From Digital Scholarship Group
Jump to navigation Jump to search

Tools for error checking and correction in Emacs

These tools are no longer used now that our encoding is done in oXygen, but this list is kept for reference.

1. Check rendition ladders by issuing the emacs command C-c v or by selecting from the emacs menu PostProc > Validation > Check Rendition. This will generate output in a new buffer that looks something like this:

renditionchecker:unknown.changingscenes2.xml:88:90:E: keyword )

renditionchecker:unknown.changingscenes2.xml:88:90:E: bestow keyword without gi list

If you see errors like these, be sure to correct them.

2. Check page numbers by selecting PostProc > check page #'s from the emacs menu. This will look for duplicated or skipped numbers within the content of <mw type="pageNum">. If the script finds an error, you will need to correct the page number and then run the script again to continue checking the remaining pages. When you reach the end of the file, all page numbers have been checked.

3. Check for typographical errors that may have been introduced during transcription. You can do this by selecting PostProc > typo-check from the emacs menu. If you are asked whether you want to begin at the beginning of the buffer, choose "Yes." The typo-check script looks for a number of possible signs that words have been misspelled or otherwise typed incorrectly, but it is not always correct in the assumptions that it makes. For this reason, you need to read each of its warnings carefully and check the appropriate point in the file. Output from the script might look like this:

typocheck:unknown.changingscenes2.xml:47:55:W: (p. ) Letter on its own?

should nevertheless be tagged using the q element for the

The first line of output indicates the nature of the error (an unusual single letter), and the second line of output reproduces the line in the text where the error occurred. In this case, the typo-checker is noting that the letter "q" appears by itself in a line of text. Here's another example:

typocheck:unknown.changingscenes2.xml{:1739:39:W: (p. 56) Mistakenly doubled consonant?

<lb/>nipotent, though ofttimes the leaders of peo

In this instance, the script is noting that the letter "t" is repeated twice in a row in the word "ofttimes," and is asking you to check the spelling in the OT.

Note: Just because typo-check thinks it has found an error does not mean there is an error in your transcription of the text, nor does it mean that "misspellings" in the source text should be corrected with </code<sic>. In many cases, variant spellings and standard early modern spellings (e.g. "compleat") are not, in fact, errors, and should be transcribed without comment exactly as they appear in the OT.

4. Supravalidate your file using C-c C-xor by selecting PostProc > Validation > Supravalidate from the emacs menu. Supravalidation is a process that checks many different aspects of your file for common encoding errors, rendition problems, inconsistent sequencing of page signature, catchwords, and a slew of other potential problems. Because it checks for so many different things, supravalidation output is difficult to summarize, but the following are examples of a few things you will routinely see:

onsgmls:unknown.changingscenes2.xml:509:20:E: general entity "inverted1" not defined and no default entity

Warning'nxmls' (i.e., 'onsgmls') reports that this file is invalid.

You can generally disregard any error that lists "onsgmls" as its source, since we are no longer using "onsgmls" to validate files (the default is currently "xmllint").

supraValidate.perl:unknown.changingscenes2.xml:77-78:E: invalid format as content of EXTENT

This is a complaint about the value that has been supplied in the <teiHeader></code for the <extent> element--the element that allows you to specify the size of the original source text. In the case of this particular file, the value "20 cm." is actually correct, and requires no change.

supraValidate.perl:unknown.changingscenes2.xml:1926-1928:W:

Blank or open-paren immediately before <p> end-tag.

This error is essentially the result of extra -- and unnecessary -- whitespace formatting in your file; in this case, the problem is that the content of a paragraph ends on one line, followed by a carriage return, and then the </p>. This can be fixed simply by removing the carriage return so that the content is followed immediately by the paragraph end-tag. In general, you will find that many of the errors you see will have to do with extra whitespace. Although most of these errors will be false positives --- that is, they are not actually errors in your file --- you will need to check all of them to be certain.

supraValidate.perl:unknown.changingscenes2.xml:W: No catchwords?

If your text does not include catchwords, you can safely disregard this error. However, if your printed text does have catchwords, you will need to check their appearance in the encoded text. If you have forgotten to encode the catchwords, you will need to do so now.

Here are your idealized signatures; check them by eye:

1(1)r, 1(1)v, 1(2)r, 1(2)v

2(1)r, 2(1)v, 2(2)r, 2(2)v

etc...

Near the end of its output, the supravalidation script will generate a formatted list of all the _idealized_ signatures for your text. (Remember, you should have an idealized signature value, encoded using the <milestone> element, for every single page in your text.) Quickly scan this list to see if there are any errors in the sequence of milestones.

Here are your rendition ladders; check their syntax yourself or have someone help you:

31 align(center)

5 align(center)break(no)

1 align(center)case(mixed)slant(italic)

etc...

This is a list of every rendition ladder that appears in your encoded text. You should inspect each item on the list carefully to make sure that you haven't introduced any errors into your rendition ladders (e.g. extra spaces, non-existent renditional keywords, misspelled keywords).

Note: Supravalidation distinguishes major errors from warnings using the letter "E" for errors and the letter "W" for warnings (these will appear immediately after the line number where the error occurs). As a general rule, errors are problems that must be addressed before you finish work on the file. Because warnings do not always signal actual problems with your file, you may sometimes find that there are multiple warnings remaining even after you have fixed all the problems in your file. So long as you have checked each one carefully, this is not a cause for concern.