One of the most interesting sessions by far, from my humble perspective, was 1209: Whodunnit? Literary Forensics and Authorship Attribution for the Middle Ages. Three Middle Dutch scholars, all of whom work on questions of authorship and transmission, all spoke on different uses of statistical analysis in looking at textual variants. Before recapping each paper, allow me to talk for a bit about the interesting ideas and issues the session raised.
Firstly: computational linguistics. This method of linguistic analysis rests on the fact that individual speakers of a common language have distinct linguistic markers. These markers are not topic-specific, but show up in really common words (articles, conjunctions, subordinators) and grammatical patterns. Put simply, you can tell the difference between a post by Magistra and a post by me by the fact that Magistra talks about early medieval history and I talk a high medieval sex; but a computational linguist would run our two anonmymised posts through a computer program and discover that I use certain conjunctions far more than she, and she uses some particular grammatical structure a lot more than I do.
Now, I seem to have a lot more faith in computational linguistics than many literary scholars – I think this is because I got taught the basic principles (although by no means how to do it) in first year, thank you Craig Ronalds. I know, for example, that this business about individual language markers has been rigorously tested on modern speakers from different language backgrounds. I know the method has been used to expose cases of police interfering with witness testimony (police members as a group show certain linguistic traits that are not shared by the general population, as a result of their training). I know it’s uses for humanities scholars haven’t been fully explored or tested yet, but I also suspect that a lot of the distrust people have for evidence drawn from computational linguistics is to do with the unfamiliar kind of evidence. Computational linguistics relies on data and statistical analysis and sciencey-kinds of things: I get the feeling that a lot of humanities scholars don’t trust that (it’s repeatable, sure, but you can’t go through your edition and mark it up and SEE the evidence right there). Our discipline trains us to check everything against the text, rather than checking it for thorough and repeatable experimental process: maybe we’re not so willing to trust people who branch out into other kinds of evidence.
With that said, it must also be stated that I don’t know enough about computational linguistics for my bullshit detector to work properly when hear about it. So I have no way of knowing if an individual scholar is doing their computational linguistics Rong. Given that the application of computational linguistics to literary scholarship is a relatively new field, one risk would be that there aren’t enough trained bullshit-detectors around, but that can only change with time and the increasing usefulness of computational techniques.
So what are some of the uses of computational linguistics to medievalists?
Rombert Stapel has been using computational linguistics to determine how much of Hendrick Gerardsz van Vianen (sp?)’s Croniken van der Duytcher Order, a late 15th c. chronicle of the Teutonic Order with specific focus on the area around Utrecht, was written by the said Hendrick. Several segments are easily identified as being from other sources – the prologue claims to be by a 12th century bishop who certainly wasn’t in Acre when he said he was; and the Balliwick chronicle for Utrecht seems separate from the main body of the text.
Traditional philological analysis would look at unusual words, and has been of some use to Rombert Stapel, but in the absence of original source texts it’s hard to tell where emendation has been happening. Instead, he took samples from the privileges written by the said Hendrick in his capacity as secretary to the Lands Commander Johan von Drongen. The samples are not just written at a different time to the Croniken, they’re also in a completely different style – something which would usually override philologically distinct vocabulary features, but doesn’t usually override the grammatical data used in computational linguistics.
The full set samples which he fed into the program (Delta, by someone named Burrows – it’s free, and apparently easy to use) were:
- 2 sets of samples from the Croniken where traditional philological evidence (comparisons to original sources, I believe) shows Hendrick left traces as author.
- The privileges mentioned above
- The Sachenspiegel, known to have been copied by Hendrick
- 2 unrelated texts of the same period and genre – one hagiography and one chronicle.
After testing that Delta could distinguish between the unrelated texts and the Hendrick texts, he then compared the samples to the entire rest of the Croniken, and pulled up several sections clearly not by Hendrick, including the first half of the prologue (but not the second); the Balliwick chronicle; and some formulaic documents- privileges and court pleadings. The rest appears to be either by Hendrick or substantially modified by him.
Rombert then argued that Hendrick’s strong presence across the Croniken suggests that he was both author and compiler at once; noting the existence of other Teutonic Order chronicle texts in this period in theLowlands, he says this points to a strong, self-aware hagiographical tradition in the balliwicks, away from the administrative centre of the Order.
Note: I’ve probably got the author/scribe’s name spelled wrong, but I’m pretty sure Croniken was on the slides, with a C not a K.