Conversion task chart

From Digital Scholarship Group
Jump to navigation Jump to search

TTD for P5 Migration

conversion of instances

Last column, "scope", indicates:

  • req = simple, straightforward, required automatble conversion
  • req+ = required possibly automatable conversion, may not be so simple (e.g., what to do w/ attributes of an element that is being deleted?) The number of plus signs is a (very) rough indication of difficulty
  • maybe = not required by P4 → P5 conversion, but other encoding projects likely to be interested
  • WWP = a WWP-specific change, generally not useful to others


task automatable pre-hand hand post-hand scope
id= to xml:id= TEI_id2uri.xslt req
id= of <language> to ident=[3] TEI_id2uri.xslt req
method= of <normalization>: "tags"→"markup" TEI_generic.xslt req
TEI.2 to TEI root_ns.xslt req
namespace root_ns.xslt req
update <editionStmt> WWP_specific.xslt WWP
ensure <floatingText> in proper environ maybe?
<change> TEI_generic.xslt req+
IDREF to URI bare name TEI_id2uri.xslt req
remove <imprint>s in <bibl> TEI_generic.xslt req+
<namespace> in <tagsDecl> TEI_generic.xslt req
<ent> to <name type="ent"> TEI_generic.xslt req
<desc> not desc= no occurences[6] req
anchored=yes/no to true/false TEI_generic.xslt req
add ref= (or key= temporarily) to all names and <rs> no hand WWP
embedded <text> to <floatingText> TEI_generic.xslt req
eliminate part= on <quote> and except in poetry WWP_specific.xslt WWP
convert part= to next/prev on <quote> and in poetry WWP_departed.xslt fix-up WWP
move from child of <lg> to children of <l> WWP_metQuot.xslt fix-up WWP
convert <sic> encoding to <choice> TEI_generic.xslt req
convert <orig> encoding to <vuji> WWP_specific.xslt WWP
convert encoding to <choice>, with <am> and <ex> for letter-level [8] req+
duplicate <lb> and other milestone elements within <choice> [7] post maybe
eliminate play-specific portion of who= prefix no hand WWP
eliminate <docTitle> WWP_specific.xslt WWP
change &ornament; encoding to "pre(deco)" and "post(deco)" or similar WWP_specific.xslt [5] WWP
change half-titles to <head> where permitted yes[2] WWP
eliminate use of <ptr> in TOCs that lack page numbers WWP_specific.xslt WWP
move target= attribute in TOCs from <ref> to enclosing <item> WWP_specific.xslt WWP
change <mcr> within name elements to <hi> WWP_specific.xslt WWP
<handList> → <handNotes> TEI_generic.xslt req
<hand> → <handNote> TEI_generic.xslt req++
<ps> → <postscript> WWP_specific.xslt WWP
<xref>, <xptr> → <ref>, <ptr> not worth[1] syd[1] req+++
<figure> yes? req
date attributes partially hand req++
lang= to xml:lang= TEI_id2uri.xslt done req++
move 's inside name elements partially hand WWP
move punctuation outside name elements? partially WWP
require <docImprint>, add where necessary partially hand WWP
add cit= attribute to <quote> yes WWP
add name encoding for titles of nobility no hand WWP
add <placeName> encoding inside <persName> no hand WWP
add <rs type="properAdjective"> no hand WWP
add metaRef= to metaphorical names no yes WWP
review all name encoding for category correctness no hand WWP
convert long narrative quotations to <floatingText> no yes maybe
add type= to <floatingText> no yes maybe
add <docRole> with type= encoding for all roles no

hand

WWP
eliminate <publisher>, <printer> if present, convert to <docRole> [4] WWP
add type="referenceList" as a possible kind of <list> no hand WWP
delete our subtitles in headers WWP_specific.xslt WWP
alter how <extent> is handled WWP_specific.xslt maybe
<hi type="dic"> → <hi rend="type(#DIC)"> WWP_specific.xslt WWP
use rendition= not rend= iff PUA char WWP_PUA_chars.xslt [9] WWP
to= → spanTo= on <addSpan> & <delSpan> TEI_generic.xslt req
type= → type=, subtype= of <lg> WWP_specific maybe
fix errata list notes & refs in cowley.dramas subfiles no hand

Questionable

remove highlighting of 's where it exists?

Remaining to be decided

values for type= on <text> and <floatingText>
use <text> for poems??
encoding of <tables>?
whitespace rendition?

Larger and prerequisite tasks

Harvest placenames and orgnames and create place/orgography
Harvest bibls and create bibliography
Harvest persnames and supplement personography

post-conversion

  • do stuff in chart above
  • generate DTD (/)
  • compile DTD (/)
  • ensure Emacs/psgml invokes proper DTD on TB files (/)
    • currently done by using DOCTYPE declaration
    • investigate using some other method, so we can drop DOCTYPE
  • find place for custom documentation (i.e., HTML from ODD) (?) (/opt/local/share/doc/wwpstore/, but this is not web-accessible --- still need to symlink or copy it to web area)
  • find place for customization (i.e., ODD file) (/) (/opt/local/share/xml/wwpstore/odd/)
  • finish writing custom documentation (i.e., prose of ODD)
  • update C-c C-v to validate files properly (/)
  • add Schematron validation (to C-c C-v?) (/)
  • update Emacs registers, if needed (/)
  • update wwp-smart-return-context-alist and any needed parts of wwp-smart-return-default-functions (/) --- seems OK, but not thoroughly tested
  • update C-c C-L (/)
  • update wwp-ignore-markup-regexp, if needed (/) (not needed)
  • add '_' to list of NAME characters in our SGML declaration, because P5 DTD uses it in parameter entity names (/)
  • update run-xslt-on-TB (low priority, as it currently works)
  • update create_WWP_tb_and_corpus.bash
  • overhaul supraValidate.perl
  • decide on proper place to store schemas & move them there
  • create a TB version of gxp.bash?
  • consider adding well-formedness check to save-buffer
  • rewrite transcriptionData2wwp-store.xslt and if needed update tadpole.perl
  • rewrite validate_these_files.bash --- maybe use a Makefile, instead?
  • strip P4 stuff out of internal documentation

Notes

[1] Only 2 files use <xref>, and only 1 of 'em has extended pointers. I will probably just convert these by hand using query-replace-regexp, rather than trying to automate.

[2] There are very few of these. See e-mail half Titles --> headings from Syd to JF & JM 2008-09-13 17:04. As there has been no further discussion, we will be deferring any such change until after P5 conversion.

[3] Pretty silly thing to do, as we just nuked all <langUsage> and <language> in TEI_generic.xslt, anyway.

[4]  There are no <printer> elements in the textbase at all, and all of the <printer> elements are within the <teiHeader> or in a <bib>, and thus should not be changed. So there is nothing to do. See e-mail exchange on <publisher> --> <docRole> of 2008-10-26.

[5]  At least, WWP_specific.xslt handles all those that are in <mw> by spitting out the <mw> with a type=border-ornamental. As for those not in <mw>, I hope to hear back from John soon.

[6]  other than the 865 on <gap> and 38 on <unknown> that we've decided will remain desc=.

[7]  This is, in theory, automatable. But it would be quite hard, and there are only 16 occurrences, so we're just going to do them by hand.

[8]  This is essentially the same problem pre- and post- P5, non-essential, and quite hard to do. Thus I'm deferring this for now.

[9]  We may want to go through resulting generated <rendition> elements looking for redundancies, and resolving them. This is probably something that could be automated, but would be faster to do by hand, as there are only some 287 occurences in 7 files.

WWP_departed.xslt is basically the same as huntingGathering_to_pointing_quotations.xslt, except that it looks for or <quote> only when a descendant of an <lg>.

The root_ns.xslt program is basically the same as the Dot-two.xslt on the TEI wiki, except that it spits out a document in the WWP textbase storage namesapce (instead of the main TEI namespace).