The following is a talk I gave with my former professor and adviser Dorothy Kim, and my colleague at Vassar Laura Lebow about work we did on digitizing the manuscript Bodleian Laud 108, at BABEL 2012 in Boston. Our panel was called “Parts, Wholes, and the New.”
Ooh, Aah, OOO: Hacking Havelok the Dane
This session asks the participants to juggle several moving parts: disciplinary difference, experimental poetics, the relation between parts and wholes. We’re a collaborative trio working on digitally editing a 13th-century Middle English manuscript, Oxford, Bodleian MS Laud 108. So this talk is in three parts: ooh, aah, and OOO…
Some of the questions we are trying to juggle in this triad include: How do you create a framework for all potential disciplines? What sort of potential catastrophes do you then run into? How specific must the detail of the parts be? How general must you make it to bring people under one umbrella to interact and play?
Part I: Ooh: Fun with Digital (Jill)
We have to begin by talking about what it means to digitize Havelok the Dane. So in this way, we are actually going to explain the insides of a digital edition–i.e. what’s under the hood of the pretty car.
The medieval romance, Havelok the Dane, is found in Laud 108. One of the medium-specific traits of medieval manuscript is that unlike modern texts that are mass-produced but whose ideas are copyrighted and protected, medieval manuscripts’ texts are in the commons–they predate copyright laws. But the actual THING of the manuscript, the vellum the words are inked onto, is singular and therefore access is restricted.
There is an excellent student’s TEAMS edition of Havelok the Dane, which is great because then you can CTRL-F (or Apple-F, whatever) to search for specific words. But the format it’s in, HTML, is not the most stable form. The TEAMS edition also doesn’t take full advantage of the medium specificity of computers. We were interested in looking at how computers afford us new access points and modes of understanding for Havelok the Dane, and how we can play with these.
We started by putting it in XML, or ‘extensible markup language.’ XML is basically a language of browsers. It’s how to tell browsers how to present your text, how to lay it out like a printer once laid out movable type on a printing press. It is also more updateable and archiveable than existing alternatives like HTML. You can think of XML as similar to O blood, the universal donor, which can be understood and processed by nearly everything. HTML, on the other hand, is more like AB blood, which can only be accepted by others like it.
And not only can you tell the browser how to format things with XML, but you can also ‘tag’ various elements of your text to mark them in useful ways. Because once you tag an element, you can scoop the data and reimagine it differently. Thus, what this does permits you to examine the various objects or ‘parts’ in the text as ‘wholes’ unto themselves. For example, a linguist could go through Havelok and tag every word with a French root with the element name <frenchword></frenchword>. Or a biologist could tag every ‘animal’ that appears. Once the data is tagged, it’s manipulable in nonlinear ways that it simply wasn’t when it was on a page.
You could put all the french root words next to each other. You can generate a pie chart. You can use ARCGIS and make your own map, timelines, etc. Our goal for the project is to allow people from any discipline to have access to Havelok and ‘play with it’ according to the rules of their field, to borrow a metaphor from the Uptons’ talk yesterday. So XML is a framework, a means of telling the computer how to distinguish elements and thereby making it easy to pull them out and treat them as parts of a whole.
So that’s the theory of XML, of interacting with a text in a digital-native way, and that sounds great, but turning it into something practical was a little more difficult. And that’s when the excited “Ooh” became an aggravated “Aah” (pronounced UGH)
Part II: Aaah! XML in action (Laura)
It seemed like a straightforward venture: tag person, place, and thing. We were familiar with the story and the language, so we thought that tagging a few nouns would be simple. Soon after we opened the file, however, we realized we were wrong: in XML, a basic part of speech becomes terrifying. Do we tag every noun? What do we do with phrases that encompass more than one category? What constitutes a thing? (And to make it worse, digital humanists don’t yet have a standard “thing” element.) How do we answer these questions in a consistent and comprehensible way that would be accessible and augmentable for students and scholars in a variety of disciplines?
In order to tag efficiently, Jill and I needed a concrete definition of “thing.” We turned to Ian Bogost and adapted his explanation of OOO for our first attempt at tagging.
Armed with what we considered a solid methodology, we expected an easy course. Naturally, we had no such luck. We encountered a succession of issues; for the sake of time, I will only sample a few. Characters often invoked saints, but Jill and I were hesitant to tag these religious figures with just the “person” tag. We wanted our tagging to encompass all possibilities in the medieval world: a saint could be a person or an object (ie a body part in a reliquary). And think about this: when people were referred to by their occupation (knight, king, cook) or by their identities (man, woman, father, mother), did that constitute a ‘thing’ or a ‘person?’
At this point, our heads were spinning, so we sought out an alternate approach. We turned to the Historical Thesaurus of the Oxford English dictionary, which presented us with a comprehensive taxonomy of things. The Thesaurus divides nouns into three categories: the external world, the mental world, and the social world. It literally encompasses every ‘thing.’ A taxonomy of things according to the OED meant a lot more tagging and a lot more work than if we stuck to Bogost’s more theoretical (and more vague) definition, but it was also the only way to even approach objectivity and comprehensiveness.
For example, it effectively solved the problem of saints: religious entities such as God and Jesus Christ, whom we had been tagging as “person,” were also to be classified as ‘thing,’ falling under the “Mental World” category of the Thesaurus taxonomy. Using the Historical Thesaurus provides a foundation for the next group of researchers who will be able to take the mechanism we have used to define ‘thing’ and narrow it using a methodology that best fits their projects. And, of course, if researchers do not have a more specific taxonomy in mind, our comprehensive Thesaurus-based system is already in place.
Jill and I started using this mantra: “when in doubt, tag.”
Our tagging philosophy, then, is similar to the computer method known as “brute force hacking” wherein a problem is surmounted by mounting the computer’s raw processing power towards finding a solution instead of using a more intricate or creative method of problem-solving. Instead of going with a more refined taxonomy, for example, such as OOO, we used “brute force” methodology in tagging everything that might possibly fall under the capacious umbrella of “thing.”
So what are the ramifications?
OOO: What does it all mean? praxis, theory, experiment (Dorothy)
It means that in choosing the Historical Thesaurus of the OED as the rubric, object-oriented ontology as a system or theory cannot live in the hardware of a database. Or in a more refined way, sure, we could create a database using OOO, but then it would be Dorothy, Jill, and Laura’s personal database of objects rather than an object database of Havelok. Theoretically we thought OOO should function as a baseline for object tagging, but in practice, it was too insanity-producing and not efficient. In the end, we realized practically that an organizational stemma (more indebted to Carl Linne’s organization bent in the 18th century)–the thing we so want to run away from–is the most logical, the most sane, and the most stream-lined way to create baseline tags for a database. That leaves a lot of cascading questions about the potential impurities created by the practicalities of disciplinary mixing, bumping, and dancing.
More abstractly–or perhaps more concretely–putting Havelok into a digital format made us think about Havelok the Dane itself as an object, or rather a series of interrelated objects. First is the actual manuscript, a corporeal object existing in the Bodleian Library. But the text itself, the system of letters, words, phrases and ideas, is also an object in a digital rather than analogue sense.
Finally, we want to end this little journey from praxis, theory, into experimentation. One of the potential uses of tagging is the possibility of scooping data for research visualization. But this also permits us to experiment with aesthetics. We come full circle back to aesthetics; back to viewing the world, the poem, poetry as an act of aesthetics. We’ve taken this text, pulled it apart into its taxonomic parts, and now we’ve visually recreated it into another new art object. [Where, apparently, the most important node is the word “ne”]