Abstract
Previous PAN workshops have offered us the opportunity to explore three different approaches using basic statistics of stopword pairs for author verification. In this PAN, we were able to select our ‘best’ approach and explore the question of how authors writing about different subjects would necessarily adapt to term lengths specific to the subject. The adaptation required is, essentially, a redistribution of frequency: where longer terms occur. We introduce the notion of a ‘topic cost’ which increases the propensity for matching. Results show AUC and C1 scores of 0.51, 0.46 and 0.59 for Dutch, Greek and Spanish respectively. The English results are not yet available, as the evaluation system was unable to run the approach due to as yet unknown reasons.