Teaching LDA with the Topic Modeling Game

In February, I visited Matthew Kirschenbaum’s #ENGL668K Introduction to Digital Humanities course at the University of Maryland, and I brought to class an activity that I had been mulling over in my own mind for a long time, called the Topic Modeling Game.  The game is designed to teach the basic principles of topic modeling with LDA through engaged, constructivist, and problem-based techniques.

As I was learning about LDA myself, I realized that I was essentially playing this game in my head over and over again: following through how I thought LDA worked, learning where I made mistakes, revising my assumptions, and playing the game all over again.  When I went to write about the results of my topic modeling experiments for the Revising Ekphrasis project in my dissertation, what I discovered is that I really needed a way to explain topic modeling such that someone who had no knowledge of the methodology could read the results of my experiments and trust my conclusions.

That process led to a written explanation of LDA that will appear in a future article in the Journal of Digital Humanities.  In the essay, I create a hypothetical situation to explain the assumptions LDA makes about natural language texts in order to produce its results.  The example walks readers through the process of figuring out what produce is available at a farmers’ market that they have never been to themselves and asks the reader to consider the problem from a quantitative perspective.  When I created that explanation, I did it by playing this game in my head.  So, I thought, perhaps this could be an effective way of teaching LDA, as well.

After tweeting something about the Topic Modeling Game, other DH instructors requested copies of the game.  I absolutely wanted to share, but I also wanted to learn from other teachers’ experiences.  More importantly, I wanted other new teachers to benefit from the experience of those who had already tried it.

Meanwhile, there’s been a lot of conversation about the value of public, open repositories of data.  ProfHacker has had several recent posts about using GitHub to revise documents (see Getting Started with a GitHub Repository and Forks and Pull Requests in GitHub).  Also, Matt Burton presented a helpful introduction to Git at MLA 2013 in the Scaling and Sharing: Data Management in the Humanities special session #s586.

What better way to share a lesson plan with peers, I figured, than to create a repository for it on GitHub, to invite others to use it and to ask that they share their results and add their changes and revisions back to the repository?

As a result, I created a GitHub public repository for the Topic Modeling Game.  Currently, in the repository, there are two Word documents.  One includes instructions and background information for teachers.  The other is a rudimentary hand-out to use to begin the game as an in-class (face-to-face) assignment.  The instructor document includes a list of materials that could be used for the lesson, but I have not uploaded sheets of “sample words” to use—at least not yet.

There is plenty of room to edit, improve, revise, innovate, and share.  The one thing that I do ask is that if you download and use the Topic Modeling Game, you contribute to the repository by adding lessons you have learned, revisions you made, and suggestions for improvement.

So, bring your forks (again, you may want to read Konrad Lawson’s recent post on forking and pulling with git if you’re unfamiliar with the process) and dig on in.  I’m looking forward to hearing back from those who try it.

