Difference between revisions of "TextAnalysis"

From Digital Scholarship Group
Jump to navigation Jump to search
 
(39 intermediate revisions by the same user not shown)
Line 3: Line 3:
 
*[http://toolingup.stanford.edu/?page_id=981 Stanford's Introduction, from ''Tooling Up for Digital Humanities'']
 
*[http://toolingup.stanford.edu/?page_id=981 Stanford's Introduction, from ''Tooling Up for Digital Humanities'']
 
*[http://people.cs.umass.edu/~wallach/workshops/nips2011css/papers/OConnor.pdf Brendan O’Connor, et al., "Computational Text Analysis for Social Science"]
 
*[http://people.cs.umass.edu/~wallach/workshops/nips2011css/papers/OConnor.pdf Brendan O’Connor, et al., "Computational Text Analysis for Social Science"]
* Paul Baker, ''Using Corpora in Discourse Analysis'' (soon to be available via the NEU Library), covers Corpus Building, Frequency and Dispersion, Concordance, and Collocation
+
* Paul Baker, ''Using Corpora in Discourse Analysis'' (soon to be available in the NEU Library stacks); covers Corpus Building, Frequency and Dispersion, Concordance, and Collocation
 +
** seeking suggestions for a web-based resource
  
 
==Python==
 
==Python==
Line 10: Line 11:
 
* [https://www.python.org/downloads/ Download and install Python]
 
* [https://www.python.org/downloads/ Download and install Python]
 
* [https://www.jetbrains.com/pycharm/ Download and install PyCharm], an Integrated Development Environment (IDE) for Python
 
* [https://www.jetbrains.com/pycharm/ Download and install PyCharm], an Integrated Development Environment (IDE) for Python
 +
* [https://ipython.org/ Download and install IPython], an interactive shell for Python
  
 
==R==
 
==R==
Line 20: Line 22:
  
 
==Topic Modeling==
 
==Topic Modeling==
*[http://journalofdigitalhumanities.org/2-1/topic-modeling-a-basic-introduction-by-megan-r-brett/ Megan R. Brett's "Basic Introduction"] (conceptual)
+
*[http://journalofdigitalhumanities.org/2-1/pacing-scholarly-conversations/ JDH's Special Issue] on Topic Modeling (2012)
 +
**[http://journalofdigitalhumanities.org/2-1/topic-modeling-a-basic-introduction-by-megan-r-brett/ Megan R. Brett's "Basic Introduction"] (conceptual)
 +
*[http://tedunderwood.com/2012/04/07/topic-modeling-made-just-simple-enough/ Ted Underwood, "Topic modeling made just simple enough"]
 
*[http://www.scottbot.net/HIAL/?p=19113 Scott Weingart's "Guided Tour"] (comprehensive, lots of links)
 
*[http://www.scottbot.net/HIAL/?p=19113 Scott Weingart's "Guided Tour"] (comprehensive, lots of links)
*[http://journalofdigitalhumanities.org/2-1/words-alone-by-benjamin-m-schmidt/ Ben Schmidt's article about Latent Dirichlet allocation's (LDA's) limitations]
+
*[http://journalofdigitalhumanities.org/2-1/words-alone-by-benjamin-m-schmidt/ Ben Schmidt's article about Latent Dirichlet allocation's (LDA's) limitations] (also from the JDH special issue)
 +
 
  
 
===Tools===
 
===Tools===
*[http://mallet.cs.umass.edu/topics.php MALLET] (An open-source, Java-based LDA package)
+
*[http://mallet.cs.umass.edu/topics.php MALLET], an open-source and Java-based Latent Dirichlet allocation (LDA) package
** [https://github.com/bmschmidt/RMallet Ben Schmidt's R package wrapping MALLET]
+
** [http://programminghistorian.org/lessons/topic-modeling-and-mallet Shawn Graham, Scott Weingart, and Ian Milligan's tutorial] for setting up a command line environment for using MALLET
 +
** [https://github.com/bmschmidt/RMallet Ben Schmidt's R package] wrapping MALLET
 
** GUI Tools that use MALLET
 
** GUI Tools that use MALLET
 
*** [https://code.google.com/p/topic-modeling-tool/ Google's Topic Modeling Tool]
 
*** [https://code.google.com/p/topic-modeling-tool/ Google's Topic Modeling Tool]
Line 34: Line 40:
 
==word2vec==
 
==word2vec==
 
* [http://bookworm.benschmidt.org/posts/2015-10-25-Word-Embeddings.html Ben Schmidt's Blog Post on Vector Space Models]
 
* [http://bookworm.benschmidt.org/posts/2015-10-25-Word-Embeddings.html Ben Schmidt's Blog Post on Vector Space Models]
** which links to his [https://github.com/bmschmidt/wordVectors R wrapper package] for word2vec
+
** which links to his [https://github.com/bmschmidt/wordVectors R package wrapping word2vec] (word2vec is written in C)
  
 
==Miscellaneous text analysis tools==
 
==Miscellaneous text analysis tools==
* [http://voyant-tools.org/ Voyant Tools]
+
* [http://voyant-tools.org/ Voyant Tools], a simple web-based analysis and visualization tool
 +
* [http://lexos.wheatoncollege.edu/upload Lexos], a tool for scrubbing, chunking, and tokenizing text; in addition to performing modest analysis and visualizing clusters
 +
** [http://scottkleinman.net/blog/2014/07/25/how-to-create-topic-clouds-with-lexos/ Scott Kleinman's blog post] on "How to Create Topic Clouds with Lexos"
 
* [http://www.laurenceanthony.net/software/antconc/ Laurence Anthony's AntConc], a GUI concordancing and text analysis toolkit
 
* [http://www.laurenceanthony.net/software/antconc/ Laurence Anthony's AntConc], a GUI concordancing and text analysis toolkit
* [https://sites.google.com/site/casualconc/ CasualConc], a Mac OSX-native toolkit (AntConc's Mac version is ported)
+
* [https://sites.google.com/site/casualconc/ CasualConc], a Mac OSX-native toolkit (AntConc's Mac version is ported from the PC, and has some bugs)
* David McClure's TextPlot, a Python package that produces force-directed network of words in a text, based on estimated kernel densities
+
* David McClure's TextPlot, a Python package that produces force-directed network of words in a text, the nodes of which are clustered using estimated kernel densities
** [http://dclure.org/essays/mental-maps-of-texts/ Blog post explaining concept]
+
** [http://dclure.org/essays/mental-maps-of-texts/ Blog post explaining concept of TextPlot]
** [http://dclure.org/tutorials/textplot-refresh/ Blog post to download and set up]
+
** [http://dclure.org/tutorials/textplot-refresh/ Blog post on downloading and setting up TextPlot]
** [http://dclure.org/logs/tuning-textplot/ Blog post explaining parameters]
+
** [http://dclure.org/logs/tuning-textplot/ Blog post explicating TextPlot's parameters]
* [http://bookworm.culturomics.org/ Bookworm]
+
* [http://bookworm.culturomics.org/ Bookworm], a customizable corpus trend visualization tool
 +
* [https://www.jasondavies.com/wordtree/ Word Tree], a tool that creates [http://betterevaluation.org/evaluation-options/wordtree word trees] from a block of text
  
 
=Corpus building=
 
=Corpus building=
 +
*Amanda Rust's [http://subjectguides.lib.neu.edu/textdatamining Subject Guide] on "Text and Data Mining Library Databases"  (Northeastern University Libraries)
  
==Some places to get text==
+
==Some places to get texts==
 
===Plain text===
 
===Plain text===
 
*[https://www.gutenberg.org/ Project Gutenberg]
 
*[https://www.gutenberg.org/ Project Gutenberg]

Latest revision as of 04:14, 16 March 2016

Resources for Exploring Text Analysis

Python

R

Topic Modeling


Tools

word2vec

Miscellaneous text analysis tools

Corpus building

  • Amanda Rust's Subject Guide on "Text and Data Mining Library Databases" (Northeastern University Libraries)

Some places to get texts

Plain text

TEI-Encoded