The paper presents the approach we developed for the Authorship Link Ranking and Complete Author Clustering task at the PAN 2016 competition. Given a document collection, the task is to group documents written by the same author, so that each cluster corresponds to a different author. This task can also be viewed as one of establishing authorship links between documents. We use a combination of classification and agglomerative clustering with a rich set of features such as average sentence length, function words ratio, type-Token ratio and part of speech tags.
|Number of pages||6|
|Journal||CEUR Workshop Proceedings|
|Publication status||Published - 1 Jan 2016|
ASJC Scopus subject areas
- Computer Science(all)