Difference between revisions of "TextAnalysis"
Jump to navigation
Jump to search
Line 41: | Line 41: | ||
*[http://omekasites.northeastern.edu/ECDA/ Early Caribbean Digital Archive (ECDA)] | *[http://omekasites.northeastern.edu/ECDA/ Early Caribbean Digital Archive (ECDA)] | ||
− | ===Encoded=== | + | ===TEI-Encoded=== |
*[http://www.wwp.northeastern.edu/wwo/ Women Writers Online] | *[http://www.wwp.northeastern.edu/wwo/ Women Writers Online] | ||
*[http://www.textcreationpartnership.org/tcp-ecco/ Eighteenth Century Collections Online (ECCO-TCP)] | *[http://www.textcreationpartnership.org/tcp-ecco/ Eighteenth Century Collections Online (ECCO-TCP)] | ||
*[http://docsouth.unc.edu/ UNC's ''Documenting the American South'' Project] | *[http://docsouth.unc.edu/ UNC's ''Documenting the American South'' Project] |
Revision as of 04:02, 24 February 2016
Resources for Exploring Text Analysis
R
- Matthew Jockers, Text Analysis With R for Students of Literature (PDF available for download via the NEU Library)
- Download and install R
- Download and install RStudio
- RSeek (search tool for finding resources on R)
- Simple data types in R
Topic Modeling
- Megan R. Brett's "Basic Introduction"
- Scott Weingart's "Guided Tour"
- Ben Schmidt's post about Latent Dirichlet allocation's (LDA's) limitations
- MALLET (An open-source Java-based LDA package)
- GUI Tools that use MALLET
- Stanford Topic Modeling Toolbox
word2vec
- Ben Schmidt's Blog Post on Vector Space Models
- which links to his R wrapper package for word2vec
Miscellaneous text analysis tools
- Voyant Tools
- Laurence Anthony's AntConc, a GUI concordancing and text analysis toolkit
- David McClure's TextPlot (produces force-directed network of words in a text, based on estimated kernel densities)
Corpus building
Some places to get text
Plain text
- Project Gutenberg
- Early English Books Online (EEBO) (some texts TEI-encoded)
- Early Caribbean Digital Archive (ECDA)