Preparing texts for network visualization

  • lmrhody 

When I presented at MSA 13 earlier this month, I was unsatisfied with my methods for creating network visualizations of texts.  I knew that preprocessing automatically would not work yet, since I have yet to identify precisely how I want to designate nodes across larger bodies of poems.  What I’ve been looking for is a way to mark texts up descriptively, using some form of markup language (XML, TEI), that would be uniform enough to render data that could be meaningfully displayed, and then to find a visualization software package with an algorithm that would “work” the way I wanted it to.  The problem, of course, is that when you’re a rogue DH scholar out in the world borrowing tools and using whatever tends to fall your way, then you’re not going to be sure about how each tool works (unless you have a CS or social science degree that includes learning about network algorithms, which I do not have), and this is going to detract from the validity of how and what you say about your object of study.  On the flip side, tools and text analysis software are becoming more widely available, and so doing what I’ve done, which is to say Googled “discourse network tool” and finding Philip Leifield’s “Discourse Network Analyzer” is actually possible.  What is remarkable about how DNA, a GUI text processing software, works is that it is designed as an interpretive tool to mark texts up in XML so that they can be displayed using free network visualizing software such as Visone, Ucinet, or Netdraw.  The designed purpose of Leifield’s DNA software is to collect articles on a topic area and to use those articles to create network visualizations of agreement and disagreement between individuals and groups.  For example, the sample dataset used for a tutorial on the software comes from someone at the University of Maryland named Dana R. Fischer, (I have no idea who she is… but I’m definitely going to look her up!) who marked up articles, testimony, and other texts about climate change.  Essentially, she could input each text into the DNA software and create a basic XML document with very minimal encoding (document type, author, dates, title) and then use DMA to select portions of text that create a “statement” about climate change.  By tagging the speaker, the organization the speaker is affiliated with, and the content type –a restricted list of terms created by the user to describe the topic being discussed—as well as whether or not the speaker agreed or disagreed with the topic) she could create networks of statements made about climate change that also included the individuals involved in the climate change debate and their organizations.  Such a visualization helps us to understand how much any one group (say, the Senate and the EPA) agree with one another, to identify the issues on which they agree and disagree, and to also understand affiliations (which speakers are affiliated with which climate change debates).

This isn’t *exactly* what I had in mind, but it’s really darn close.  The power of this particular piece of software is that I can be in charge of what constitutes an article (a poem), what constitutes a speaker (the poetic speaker, the author, the third person omniscient… all of them), and the “content” to be described.  Granted the “organization” classification is less helpful to me, but in the instance of “The Venus Hottentot (1825)” I could differentiate between speakers from the first section of the poem from the second using this feature.  Using the software this way does not begin to utilize it’s real power, which is to read topics and speakers over large corpuses of texts in similar ways.  For now, I’m looking at one poem; however, I could see in the future were I to take this poem and situate it in a larger public discourse about black female subjectivity, I could.  I could import, for example, Sander Gilman’s article “Black Bodies, White Bodies: Toward an Iconography of Female Sexuality in Late Nineteenth-Century Art, Medicine, and Literature,” which we know Elizabeth Alexander read before writing the poem.  We could also bring in articles by Sadiah Quershi on “Displaying Sara Baartman” or Terri Francis’s “I and I: Elizabeth Alexander’s Collective First-Person Voice, the Witness and the Lure of Amnesia,” or chapters from Deborah Willis’s Hottentot Venus 2010 and demonstrate how Alexander’s poem participates in a larger act of social recovery.

There are, as with any tool, limitations, though.  So far, the only way to create the visualizations is using the speakers, organizations, and categories with directional lines indicating agreement or disagreement.  I have not found a way of creating networks of “statements.”  In other words, I have not found a way to pull a category and then visualize the network of statements about that category and how they relate to each speaker; however, I have only begun the process of creating visualizations.  Another complication is that I have only found ways to make a statement associated with one category.  I’m fairly certain I can find a work around for that, but for the moment, that’s not worked out; however, I will say that having to choose between regular category designations (ones of my own creation) did make me very attuned to my assumptions about the text.  That process helped me to realize how my visualizations of these networks will always be limited and remind me that I need to make those limitations transparent when I write about what the visualization actually visualizes.

In the meantime, even though I am not teaching right now, I’m really excited about what this kind of software could mean for my students.  In the English 101 courses at the University of Maryland, students write three linked assignment papers on a self-selected research topics.  These are position papers, where the student must make purposeful arguments for what he or she believes in and respond to the discourse of the field in which their selected debate is ongoing.  We generally assign an annotated bibliography as the first part of that linked assignment as a way of getting students to read the work and to then explain who agrees with each other on particular points and who disagrees.  The hard part of this assignment is that each entry is generally 2 paragraphs long and includes only 8-10 sources, and getting the students to actually compare arguments, identifying points of agreement and disagreement is difficult.  However, if the assignment were to use the Discourse Network Analyzer to import each article and then go through each article tagging “statements,” “speakers,” “organizations,” and “categories” (for example, are the speakers arguing that a particular action should be taken or that one event causes another…) as well as “agreement” or “disagreement” with that statement, they might begin to see how their readings create a network of ideas and by understanding who agrees and what they agree upon, the student might be better able to situate him or herself within the discourse of that issue.  It’s an intriguing idea to me, and at some point when I’m teaching again, I think I’m going to make use of this technology.