Share WWP XML

From Digital Scholarship Group
Jump to navigation Jump to search

How to bundle up the WWP’s WWO XML for export or sharing

At the moment (2018-10-08) this does not work entirely on wwp-test

Saxon 9.4 on wwp-test will not load an .xq from GitHub. Thus the word lists are not generated. (Rest is OK, but that means the README lies.) Maybe just upgrading to 9.8+ will fix this.

Anyway…

These instructions are for creating an archive of distribution/ that works — i.e., the paths to the schemas are correct, etc. This is useful if you want a “local” copy of the textbase (i.e., all files in distribution/ and their supporting files) either to play with on your system or to send to someone interested in our texts.

  1. Log in to wwp-test.
  2. Issue /var/www/html/WWP/utils/bin/createWWPcorpus.bash.
  3. Read the lovely progress messages as you wait perhaps 3 full minutes.
  4. Done. Your output is in /tmp/WWO.tar.gz.

Note: it is likely you will get the error messages:

mkdir: cannot create directory `/tmp/WWO': File exists
ERROR: could not create temporary directory /tmp/WWO.

because I rarely erase that directory once I've created it. To delete it, issue rm -fr /tmp/WWO/; then re-run createWWPcorpus.bash. Be very careful with the rm -fr command, by the way. It doesn't ask, it just deletes.

When createWWPcorpus.bash is done, the directory /tmp/WWO/ has the textbase-to-go in it, and the file /tmp/WWO.tgz is a compressed archive thereof. Send the latter to your friends & family. Actually, it's pretty big (~18 MiB), so much better would be to put it somewhere on the web and send your friends & family a pointer to it.

P.S. In order to decompress the archive and extract the files from it, cd to the directory where you put the archive file, and issue tar xzvf WWO.tar.gz.

Getting a different set of files

You can specify either the directories to use or supply a list of the actual files to use. So, if you wanted all the textbase files, including under_construction/ and on_deck in addition to distribution, specify DIRS on the commandline with

$ DIRS="/path/to/TB/distribution/ /path/to/TB/under_construction/ /path/to/TB/on_deck/" /var/www/html/WWP/utils/bin/createWWPcorpus.bash

(It does not matter a whit whether you include the trailing slash or not.) To specify a corpus of only Eleanor Davies specify files on the commandline with

$ files="/path/to/TB/distribution/davies.*.xml" /var/www/html/WWP/utils/bin/createWWPcorpus.bash

Note that you cannot mix-and-match. If both files= and DIRS= are specified, files= wins and DIRS= is ignored. If you specify a directory in the value of files= it is ignored. (Actually slightly worse than ignored — an empty file with the name of the directory is created and included.) I do not know what happens if you specify a file or glob in the value of DIRS=, but it is probably not good.