Detecting the real authors of books and papers
The researchers say the system is now freely available for others to use and could be adapted for different types of text and for languages other than English.
Led by Professor Derek Abbott of the University of Adelaide’s school of electrical and electronic engineering, the six-member team used advanced software techniques to analyse author styles based on commonly used words and their frequencies in the text.
“We applied our new authorship detection technique to two hotly debated texts, the Federalist Papers and the Letter to the Hebrews,” Abbott said.
“The Federalist Papers are a collection of 85 influential political essays written in the late 1700s in the lead-up to the US Constitution. Their authorship was originally a guarded secret but scholars now accept that Alexander Hamilton, James Madison and John Jay were the authors.”
He said that although Hamilton and Madison eventually provided a list of the papers they had written, 12 of the essays were claimed by both as sole authorship. Some studies have suggested that a 13th essay, normally attributed to Jay, was written by Madison.
“We’ve shown that one of the disputed texts, Essay 62, was indeed written by James Madison with a high degree of certainty,” Abbott said. “But the other 12 essays cannot be allocated to any of the three authors with a similarly strong likelihood.
“We believe they are probably the result of a certain degree of collaboration between the authors, which would also explain why there hasn’t been scholarly consensus to date.”
The researchers also investigated the Letter to the Hebrews, traditionally attributed to St Paul but debated since the third century AD, with scholars suggesting Barnabas, Luke and Clement of Rome as alternatives.
The analysis was carried out using the original common Greek texts, with the four possible authors plus the three other gospel authors, Matthew, Mark and John and another possible author, Ignatius of Antioch.
“We found the Letter to the Hebrews is indeed closest to [the writings of] St Paul than to any of these other authors,” Abbott said. “But the sting in the tail is that this positive result had only a weak likelihood weighting.
“There are two possibilities: Luke was the second closest match, so there may have been some collaboration between the two, for example if Paul wrote the letter in Aramaic (the Hebrew language) and then Luke translated it into Greek.
“Or it may simply mean we have yet to find the true author! If the Vatican were to agree to supply us with more extra-canonical texts that we haven't tried, we would be happy to do more exhaustive tests.”
The new detection system was initially tested against short stories by seven undisputed authors in English including Charles Dickens and Sir Arthur Conan Doyle, with greater than 90% accuracy. Abbott said the system, which uses multidimensional analysis of the frequencies of commonly used words, could be adapted for other languages.
In a paper published in the journal PLoS, the researchers describe how they developed two automated authorship attribution schemes, one based on multiple discriminant analysis and the other on a support vector machine using classification features based on word frequencies in the text.
“We adopted an approach of pre-processing each text by stripping it of all characters except a-z and space to increase the portability of the software to different types of texts,” they write. “We tested the methodology on a corpus of undisputed English texts, and used leave-one-out cross validation to demonstrate classification accuracies in excess of 90%.
“We further tested our methods on the Federalist Papers, which have a partly disputed authorship and a fair degree of scholarly consensus. And finally, we applied our methodology to the question of the authorship of the Letter to the Hebrews by comparing it against a number of original Greek texts of known authorship. These tests identified where some of the limitations lie, motivating a number of open questions for future work.”
The researchers say an open source implementation of the methodology is freely available for use here.