Friday, August 29, 2008
Text Linguistics and Statistics
As Mark Twain said: There are lies, there are damnable lies, and then there statistics! Thanks to Rick Brannan, I found this quote from Matthew Brook O'Donnell:
"It seems unlikely that by simply counting words it is possible to differentiate between authors. While a particular author may have a core or base vocabulary, as well as an affinity for certain words (or combination/collocation of words), there are many factors, for instance, age, further education, social setting, rhetorical purpose and so on, that restrict or expand this core set of lexical items. In spite of this, New Testament attribution studies and many commentaries (sadly, some rather recent ones at that) have placed considerable weight on counting the number of words found in one letter but not found in a group of letters assumed to be authentic" (Corpus Linguistics and the Greek of the New Testament, 388).