2015-10-15

From Digital Scholarship Group
Jump to navigation Jump to search

WWP have released MARC records, made from their TEI, very slick. Have own OCLC code! Will write blog post about process.

Update on DRS: new projects coming soon, hopefully this semester before December. Ideally these are all completed before component upgrade in Spring.

Groupr display has been improved, not just ID number, actual label. People who upload and edit can add embargo dates on own, don't have to contact Sarah. Metadata display now includes subtitle for results. Sarah can upload files larger than 500MB -- no limits! Do not ask her to upload TB.

New projects coming soon, hopefully this semester before December. Ideally these are all completed before component upgrade in Spring. Statistics. Have been recording stats, wanting to make sure were "true" statistics. IRis counted bot visits same as person visits, inflating counts. DRS will distinguish between bot visits and person visits. Bot visits = being indexed, but that's different than counting at person visits. IRis has OAI built in, with people using actively. Not native part of Hydra so had to develop/plug in. File sizes now recorded, used internally for billing and space planning. ????? Sets will be opened up to group creators, Existing spreadsheet loaders for MarComm and EMSA that pull in images en masse and grab metadata from ITCP. Working on Catskills loader, and next arbitrary loader that can handle any MODS. Library RADS metadata folks update 1-30 items per day, using XML editor. Started on one server, through process of blowing up boxes, now have 5.

Some of the features we've developed in Hydra will be rolled in to Fedora 4. We were ahead of the game.

193 gems in Rails -- modules we use that others wrote. 76 gem dependencies -- can't upgrade any of those 76 without affecting others. What we would call tightly coupled.

Developers in Standford and Penn State are the bleeding edge in Hydra gems. We are a few versions behind, for some major gems. We are starting to encounter things that have been solved elsewhere, but we can't take advantage of, because we aren't on newest versions of Hydra, Fedora, Rails, Blacklight, Bootstrap.

Will try to do big upgrade January to July 2016.

Want to do this now, when we have stability and growing user base, before really opening the gates.

Fedora 4: better RDF implementation. Moves to Portland Common Data model. File projection: this file lives somewhere, pretend like you own it.

How do we do this? Our setup involves multiple servers, so have to currently do feature freeze: we will fix bugs, but not new features until upgrade done. Feature freeze from January to July just to get system ready. Fedora 4 migration after July.

Are there differences that will be perceptible to the user? Especially with move to RDF? Not much difference to user. RDF will provide easier data for visualizations, re-use of data.

Providing XML in the raw to edit has worked well for human-readable editors (RADS, students, etc.), RDF is less human-editable.

RDF and SPARQL provides much faster way to query large datasets, as opposed to long "or" statements with Solr.

Portland Common Data model = Fedora/Hydra agreement on standards on relationship between objects. Makes things much more interoperable re: gems, Hydra creation.

With spreadsheet loader, is there a size limit? How do, say, giant video files work?

Individual file upload and download no longer in server memory, no longer causes server to crash.

Derivative creation also now done on separate server, which helps, or for video may look into other offline-ish ways to create video derivatives, putting derivative creation on user.

May need to define a window where large items have a right of way, secondary processing box provides queued jobs so can develop orders that balance server load.

For next round of Pilots: Toolkit lives with API, so any API work can continue. As long as we choose to stick with API we're fine. Any large changes are grant funded, that wouldn't get funding and start until after Fedora 4 anyway.

Sets and loaders are partially on the list because they are vital to Toolkit.

For bulk downloads in CSV format (as Bahare needs for GIS work), for right now that's internal-only. Eli has done for Bahare using API. Think about limits where people could request csv formats, possibly put more automated request into place.

Want to develop "curator's workbench" where depositors (Sarah and Joey, say) get more technical info like stats, types of file formats, that would make inventory and digital preservation easier. File format queries can currently be run on bespoke basis.

Unicode: Hydra works with XML through Nakagiri gem, plus other components. All supported UTF-8, but for some reason Hydra stripping out prologue (definition specifying UTF-8); without explicit statement of UTF-8 Nakagiri would display HTML, screwing things up. Basic foundation -- Fedora itself, Solr, Hydra -- all support Unicode. It's the combo of gems that can cause trouble. Now translate smart quotes to simple quotes -- dumbs it down -- to make unified, avoid other translation problems.

With specialized Cherokee needs: as long as we can download the specialized font and provide for users, we can display it. Worst case: we may need to purchase and serve it up.

GIS update form Bahare: busy with outreach. Gave presentation to information visualization class, gave one full workshop for urban design class, and upcoming presentation for Open Access October.

Toolkit news: Archives seems pleased with system, launching sites for Arader, Sadow, Archives soon.