Difference between revisions of "TextAnalysis"
Jump to navigation
Jump to search
Line 23: | Line 23: | ||
*[http://journalofdigitalhumanities.org/2-1/pacing-scholarly-conversations/ JDH's Special Issue] on Topic Modeling (2012) | *[http://journalofdigitalhumanities.org/2-1/pacing-scholarly-conversations/ JDH's Special Issue] on Topic Modeling (2012) | ||
**[http://journalofdigitalhumanities.org/2-1/topic-modeling-a-basic-introduction-by-megan-r-brett/ Megan R. Brett's "Basic Introduction"] (conceptual) | **[http://journalofdigitalhumanities.org/2-1/topic-modeling-a-basic-introduction-by-megan-r-brett/ Megan R. Brett's "Basic Introduction"] (conceptual) | ||
+ | *[http://tedunderwood.com/2012/04/07/topic-modeling-made-just-simple-enough/ Ted Underwood, "Topic modeling made just simple enough"] | ||
*[http://www.scottbot.net/HIAL/?p=19113 Scott Weingart's "Guided Tour"] (comprehensive, lots of links) | *[http://www.scottbot.net/HIAL/?p=19113 Scott Weingart's "Guided Tour"] (comprehensive, lots of links) | ||
− | |||
*[http://journalofdigitalhumanities.org/2-1/words-alone-by-benjamin-m-schmidt/ Ben Schmidt's article about Latent Dirichlet allocation's (LDA's) limitations] (also from the JDH special issue) | *[http://journalofdigitalhumanities.org/2-1/words-alone-by-benjamin-m-schmidt/ Ben Schmidt's article about Latent Dirichlet allocation's (LDA's) limitations] (also from the JDH special issue) | ||
Revision as of 14:28, 2 March 2016
Resources for Exploring Text Analysis
- "Where to Start," courtesy of Ted Underwood
- Stanford's Introduction, from Tooling Up for Digital Humanities
- Brendan O’Connor, et al., "Computational Text Analysis for Social Science"
- Paul Baker, Using Corpora in Discourse Analysis (soon to be available in the NEU Library stacks); covers Corpus Building, Frequency and Dispersion, Concordance, and Collocation
- seeking suggestions for a web-based resource
Python
- Folgert Karsdorp, Python Programming for the Humanities
- Charles Severance, Python for Informatics, an applied but comprehensive introductory Python text with sections on text parsing
- Download and install Python
- Download and install PyCharm, an Integrated Development Environment (IDE) for Python
R
- Matthew Jockers, Text Analysis With R for Students of Literature (PDF available for download via the NEU Library)
- Download and install R
- Download and install RStudio, an Integrated Development Environment (IDE) for R
- RSeek, a search tool for finding resources on R
- Simple data types in R
Topic Modeling
- JDH's Special Issue on Topic Modeling (2012)
- Megan R. Brett's "Basic Introduction" (conceptual)
- Ted Underwood, "Topic modeling made just simple enough"
- Scott Weingart's "Guided Tour" (comprehensive, lots of links)
- Ben Schmidt's article about Latent Dirichlet allocation's (LDA's) limitations (also from the JDH special issue)
Tools
- MALLET, an open-source and Java-based Latent Dirichlet allocation (LDA) package
- Shawn Graham, Scott Weingart, and Ian Milligan's tutorial for setting up a command line environment for using MALLET
- Ben Schmidt's R package wrapping MALLET
- GUI Tools that use MALLET
- Stanford Topic Modeling Toolbox (an alternative to MALLET)
word2vec
- Ben Schmidt's Blog Post on Vector Space Models
- which links to his R package wrapping word2vec (word2vec is written in C)
Miscellaneous text analysis tools
- Voyant Tools, a simple web-based analysis and visualization tool
- Lexos, a tool for scrubbing, chunking, and tokenizing text; in addition to performing modest analysis and visualizing clusters
- Laurence Anthony's AntConc, a GUI concordancing and text analysis toolkit
- CasualConc, a Mac OSX-native toolkit (AntConc's Mac version is ported from the PC, and has some bugs)
- David McClure's TextPlot, a Python package that produces force-directed network of words in a text, the nodes of which are clustered using estimated kernel densities
- Bookworm, a customizable corpus trend visualization tool
- Word Tree, a tool that creates word trees from a block of text
Corpus building
Some places to get texts
Plain text
- Project Gutenberg
- Early English Books Online (EEBO) (some texts TEI-encoded)
- Early Caribbean Digital Archive (ECDA)