Skip to Main Content

Digital Humanities and Digital Scholarship: Data Wrangling

Digital humanities services, tools, and selected bibliography from WSU librarians

Additional Resources

Places for Storing Data

Cloud Storage

Cloud storage is a convenient way to store and manage data over the long-term, as it allows for easy sharing, automatic backup and versioning (automatic saving at different times), and sometimes can be even be directly connected to DH tools.  Google Drive and Dropbox are two simple, popular ones.  Amazon S3 has a steeper learning curve, but underpins many DH projects. 

Your Computer

For many, their own computer is the best option for storing data!  Just remember, if you're going to store and manage data locally, come up with a plan.  Consider making a "projects" or "DH" folder dedicated to DH work (this can help you keep things together).  Other simple tricks like avoiding spaces of special characters in filenames can make your life easier.

Overview

Data "Wrangling" is a critical part of the workflow for any Digital Humanities (DH) project.  Many, if not most, DH projects rely on some kind of corpus of data (see more information here).  To analyze this data, be it text analysis, image processing, data visualization, mapping, what have you, it will need to be massaged into a form that is appropriate for the analysis software or process.  Finding data, saving data, managing data, manipulating data, and providing access to your data, are all important parts of any DH project.

Data wrangling is more of an art than a science, and involves finding workflows that work for you!  The library is happy to help brainstorm tools and workflows that fit your project.

 

Scripting and Programming

Python

Python is one of the most popular programming languages around, and on particularly well suited for "wrangling" data.  It can be used to rename, convert, or move files, among many other things.  Here is just an example of some file management you can do with python.

Common File Formats

.csv - Comma Seperated Values

Comma and tab delimited files, are very similar to Excel spreadsheets, but instead of "columns" different values are separated either by commas or tabs.  They are one of the most common formats to get, organize, and share data in, because they are "plain" text files that almost any computer can read.

JSON (Javascript Object Notation)

JSON has become one of the widely used formats for storing and sharing data.  You can recognize a JSON by an abudance of curly brackets, "{}".  JSON is replacing XML in many platforms and systems as the preferred data format.

XML - Extensible Markup Language

Until recently, XML was the go-to format for storing and sharing structured information.  JSON has replaced XML in many places, but XML still remains a popular format.

Contact us- We can help!

If you would like a one-on-one consultation about a project (or potential project), DH technology, support resources available in the library, WSU and great SE Michigan DH community, or just to bounce around ideas/find inspiration, call or send an email to knowledgeable, friendly WSU Librarians:

  • Alexandra Sarkozy, Liaison Librarian, 313-577-8672, ff2662@wayne.edu
  • Damecia Donahue, Immersive Tech Librarian (The Bunker and The Vault), 313-577-5811, dnd@wayne.edu

We are happy to help, and look forward to hearing from you!