Difference between revisions of "TextAnalysis"

From Digital Scholarship Group
Jump to navigation Jump to search
 
(21 intermediate revisions by the same user not shown)
Line 11: Line 11:
 
* [https://www.python.org/downloads/ Download and install Python]
 
* [https://www.python.org/downloads/ Download and install Python]
 
* [https://www.jetbrains.com/pycharm/ Download and install PyCharm], an Integrated Development Environment (IDE) for Python
 
* [https://www.jetbrains.com/pycharm/ Download and install PyCharm], an Integrated Development Environment (IDE) for Python
 +
* [https://ipython.org/ Download and install IPython], an interactive shell for Python
  
 
==R==
 
==R==
Line 21: Line 22:
  
 
==Topic Modeling==
 
==Topic Modeling==
*[http://journalofdigitalhumanities.org/2-1/topic-modeling-a-basic-introduction-by-megan-r-brett/ Megan R. Brett's "Basic Introduction"] (conceptual)
+
*[http://journalofdigitalhumanities.org/2-1/pacing-scholarly-conversations/ JDH's Special Issue] on Topic Modeling (2012)
 +
**[http://journalofdigitalhumanities.org/2-1/topic-modeling-a-basic-introduction-by-megan-r-brett/ Megan R. Brett's "Basic Introduction"] (conceptual)
 +
*[http://tedunderwood.com/2012/04/07/topic-modeling-made-just-simple-enough/ Ted Underwood, "Topic modeling made just simple enough"]
 
*[http://www.scottbot.net/HIAL/?p=19113 Scott Weingart's "Guided Tour"] (comprehensive, lots of links)
 
*[http://www.scottbot.net/HIAL/?p=19113 Scott Weingart's "Guided Tour"] (comprehensive, lots of links)
*[http://tedunderwood.com/2012/04/07/topic-modeling-made-just-simple-enough/ Ted Underwood, "Topic modeling made just simple enough"] (interpreting the results)
+
*[http://journalofdigitalhumanities.org/2-1/words-alone-by-benjamin-m-schmidt/ Ben Schmidt's article about Latent Dirichlet allocation's (LDA's) limitations] (also from the JDH special issue)
*[http://journalofdigitalhumanities.org/2-1/words-alone-by-benjamin-m-schmidt/ Ben Schmidt's article about Latent Dirichlet allocation's (LDA's) limitations]
+
 
  
 
===Tools===
 
===Tools===
 
*[http://mallet.cs.umass.edu/topics.php MALLET], an open-source and Java-based Latent Dirichlet allocation (LDA) package
 
*[http://mallet.cs.umass.edu/topics.php MALLET], an open-source and Java-based Latent Dirichlet allocation (LDA) package
** [https://github.com/bmschmidt/RMallet Ben Schmidt's R package wrapping MALLET]
+
** [http://programminghistorian.org/lessons/topic-modeling-and-mallet Shawn Graham, Scott Weingart, and Ian Milligan's tutorial] for setting up a command line environment for using MALLET
 +
** [https://github.com/bmschmidt/RMallet Ben Schmidt's R package] wrapping MALLET
 
** GUI Tools that use MALLET
 
** GUI Tools that use MALLET
 
*** [https://code.google.com/p/topic-modeling-tool/ Google's Topic Modeling Tool]
 
*** [https://code.google.com/p/topic-modeling-tool/ Google's Topic Modeling Tool]
Line 40: Line 44:
 
==Miscellaneous text analysis tools==
 
==Miscellaneous text analysis tools==
 
* [http://voyant-tools.org/ Voyant Tools], a simple web-based analysis and visualization tool
 
* [http://voyant-tools.org/ Voyant Tools], a simple web-based analysis and visualization tool
 +
* [http://lexos.wheatoncollege.edu/upload Lexos], a tool for scrubbing, chunking, and tokenizing text; in addition to performing modest analysis and visualizing clusters
 +
** [http://scottkleinman.net/blog/2014/07/25/how-to-create-topic-clouds-with-lexos/ Scott Kleinman's blog post] on "How to Create Topic Clouds with Lexos"
 
* [http://www.laurenceanthony.net/software/antconc/ Laurence Anthony's AntConc], a GUI concordancing and text analysis toolkit
 
* [http://www.laurenceanthony.net/software/antconc/ Laurence Anthony's AntConc], a GUI concordancing and text analysis toolkit
 
* [https://sites.google.com/site/casualconc/ CasualConc], a Mac OSX-native toolkit (AntConc's Mac version is ported from the PC, and has some bugs)
 
* [https://sites.google.com/site/casualconc/ CasualConc], a Mac OSX-native toolkit (AntConc's Mac version is ported from the PC, and has some bugs)
Line 46: Line 52:
 
** [http://dclure.org/tutorials/textplot-refresh/ Blog post on downloading and setting up TextPlot]
 
** [http://dclure.org/tutorials/textplot-refresh/ Blog post on downloading and setting up TextPlot]
 
** [http://dclure.org/logs/tuning-textplot/ Blog post explicating TextPlot's parameters]
 
** [http://dclure.org/logs/tuning-textplot/ Blog post explicating TextPlot's parameters]
* [http://bookworm.culturomics.org/ Bookworm], a customizable trend visualization tool
+
* [http://bookworm.culturomics.org/ Bookworm], a customizable corpus trend visualization tool
 +
* [https://www.jasondavies.com/wordtree/ Word Tree], a tool that creates [http://betterevaluation.org/evaluation-options/wordtree word trees] from a block of text
  
 
=Corpus building=
 
=Corpus building=
 +
*Amanda Rust's [http://subjectguides.lib.neu.edu/textdatamining Subject Guide] on "Text and Data Mining Library Databases"  (Northeastern University Libraries)
  
 
==Some places to get texts==
 
==Some places to get texts==

Latest revision as of 04:14, 16 March 2016

Resources for Exploring Text Analysis

Python

R

Topic Modeling


Tools

word2vec

Miscellaneous text analysis tools

Corpus building

  • Amanda Rust's Subject Guide on "Text and Data Mining Library Databases" (Northeastern University Libraries)

Some places to get texts

Plain text

TEI-Encoded