Finding examples in the WWP textbase

From Digital Scholarship Group
Jump to navigation Jump to search

When you’re not sure how to handle a particular encoding question, it can be helpful to look for previous examples in the textbase. However, when you’re looking through the collection, it’s important to keep a few things in mind: first, some of our encoding practices have shifted over time, so it’s better to look for files that have been published more recently. You can see a list of the recently-published texts here.

And second, despite the herculean efforts of WWP staff, there are sometimes inconsistencies among the encoded texts—or between the encoding and the documentation. So, it’s better to look for overall patterns than individual instances, and if you do encounter any inconsistencies, please bring those up at an encoding meeting. It’s definitely better to flag and discuss any cases where you aren’t sure what to do—and if you find that the documentation isn’t answering your questions, that’s probably a place where we could be adding more content.

With that long disclaimer—and because prowling around the textbase is fun and educational—here are a few tips for locating examples in the textbase.

Keyword Searching

  • Sometimes a simple keyword search will do the trick. So, for example, if you’re wondering if you should encode “New Testament” with <title>, you can look for previous cases where that phrase came up. A few notes on keyword searching:
    • To search across a set of files, go to the “Find” menu in Oxygen, select “Find/Replace in Files” and then go down to “Specified Path” and navigate to the folder you want. It’s usually a better idea to start with the files in distribution, rather than under_construction, since those have been proofed.
    • Remember to factor in potential long esses (ſ); there is probably a clever way to check both at once, but I usually just run the search twice.
    • Remember all of the other helpful options that the Oxygen search gives you. For example, you might want to choose “Enable XML searching options” and then choose to search only in “Element contents”—just remember to check this option back off, or it can be very confusing!
    • The same goes for case sensitivity; it can definitely narrow things down in helpful ways, but don’t forget that you have it checked the next time you search.
    • Some really simple XPath can be helpful here. So, for example, if I did run a simple search for “New Testament,” I’d find that it’s encoded with <rs> and an @type of “title”. But, what if I had inconsistent results? I might want to see how many cases of New Tesstament are in <rs> and how many aren’t. So, I’d run another search to see how many of the total cases of “New Testament” are in <rs> by adding //rs to the “Restrict to XPath” box. There are a lot of other useful ways that you might use XPath to narrow down a search: see the Xpath searches page for more of these.

Looking for Specific Elements

  • In other cases, you might know which element you need to use and just want to see a few examples of encoding with that element. XPath is very useful here as well.
    • For an initial search, I often just look for all cases of the element I’m interested in, just to get a sense of what the range of uses are. I usually like to cast a wide net at first, especially because there may be aspects of the encoding I haven’t anticipated yet. So, I’ll start by just typing, say //docAuthorization in the XPath search box in Oxygen (using the set of files I’m interested in as my working set—for more on configuring working sets and using the XPath search box see this page).
    • Then I might want to get more precise; so, for example, say I have a copyright statement on a title page, so I want to take other document authorization encodings out of my results set. I just tell the XPath search box to look for any <docAuthorization> elements that are anywhere in <titleBlock>s: //titleBlock//docAuthorization
    • Or, I might want to look at how another element is encoded when it’s in a <docAuthorization>; maybe I have the name of a printer inside of the copyright statement, and I want to see if it gets any encoding to indicate that role. So, I add the specification that I’m looking for <docRole> inside of <docAuthorization>: //titleBlock//docAuthorization//docRole
    • And, sometimes, your search won’t have any results and you’ll have to make sure you’re looking in the right element. For example, you might be working with a copyright statement that is formatted as a letter, with an <opener>, <dateline>, <closer>, etc. Adding these elements inside of the <docAuthorization> element will immediately give you a validity error warning; you can confirm that this encoding is incorrect by searching: //docAuthorization//closer and seeing that there are no results. So, what do you do? This is where switching to a more general keyword search can help—go to the “Find/Replace in Files” box to see if there are other cases of encoding document authorizations you haven’t anticipated. You can use the “Enable XPath search options” box to restrict your search to element names, attribute values, and attribute names, so you’re only looking in the markup, not the contents of the texts. Scanning down the results will show you that we also use <div> with an @type of “docAuthorization” for cases exactly like this one.
    • It’s worth noting that the search above wouldn’t be necessary, since all of this information is already in the internal documentation—and, in fact, if you have to do a search like this one because what you’re looking for isn’t in the documentation, that’s probably a case where the documentation should be improved
    • This kind of searching can also be useful if you know one of the elements that you’ll need to use, but want to see what else you need to be thinking about. So, you might just want to look through a few cases of encoded drama. There are a lot of ways you might do this, but one of the very simplest ones is simply to look for the <sp> element (or any other element that is common in drama) across the textbase. This will give you a lot of results (and the search will take a while), but it’s a place to start. Or, in a more useful (and likely case), you might look for //floatingText to see how those are encoded. You could even narrow your search down to look for letters by having: //floatingText[@type="letter"]
    • As that last example shows, it's often helpful to specify attributes and values as well as elements—so, if you're encoding an index, you can start by looking for: //div[@type="index"]