Skip to Main Content

Digital Humanities and Digital Scholarship: Text analysis

Digital humanities services, tools, and selected bibliography from WSU librarians

Getting Started

Voyant

Voyant is an online, text analysis tool.  It allows users to upload or paste text and get back visulizations and analysis based on that text.  Includes standard text analysis features such as topic modeling, word frequency, co-location of phrases, and more. 

JSTOR Data for Research

This tool allows for free exploration of a subset of materials from JSTOR.  Includes standard text analysis features such as topic modeling, word frequency, co-location of phrases, and more.  You can read more about it here.

TAPoR 3

Text Analysis Portal for Research (TAPoR) is a portal for discovering text analysis tools.

Bookworm

Tool that allows users to view word frequencies in large, pre-loaded text corpora, including Open Library digitized books.

Examples and Inspiration

Text Analysis -- sometimes referred to as text analytics, text mining -- is the process of synthesizing new information from a body of text.  Text analysis is one of the more established areas of Digital Humanities, but continues to grow in ability and scope.

Intro to LDA image

Image credit: "Topic Modeling and Network Analysis", Scott Weingert [article link]

Learning More

The following are more advanced tools for text analysis, often requiring some kind of installation or minimal programming experience.

Python Natural Language Tool Kit (NLTK)

NLTK is a library that can be used in python applications for importing, cleaning and preparing, and analyzing text.  It underpins many text-analysis systems for this kind of preparatory work for textual materials.

Gensim: Topic Modelling for Humans

Another python library. Slightly higher-level than NLTK above, gensim is a library more advanced text analysis including topic modeling with Latent Dirichlet Allocation (LDA) models.  Tutorials can be found here.

Stanford Name Entity Recognizer (NER)

Java based tool for extracting proper names (names, places, events, etc.) from a body of text.

MALLET

MALLET is another Java based suite of tools used for text analysis, including: document classification, sequence tagging, and topic modeling.