In this article, recently accepted at PLOS ONE journal and available as open access, the authors propose a framework for reconstructing text evolutionary trees, aiming at reconstructing the history of modifications that a set of related documents has gone through.
A serie of distinct combinations of dissimilarity measures and reconstruction strategies are evaluated with extensive experiments, including a set of artificial near-duplicate documents and documents collected from Wikipedia. Moreover, two potential application areas of the proposed framework are discussed: plagiarism and stemmatology.
Marmerola GD, Oikawa MA, Dias Z, Goldenstein S, Rocha A (2016) On the Reconstruction of Text Phylogeny Trees: Evaluation and Analysis of Textual Relationships. PLOS ONE 11(12): e0167822. doi: 10.1371/journal.pone.0167822