Library and Information Science Paper (ID LIS054001)
- Authorship Attribution by Data Compression Program
- No.54, p.1-18
- Issue date
Benedetto et al. recently confirmed the validity of a method for measuring similarity using data compression software. Despite its potential, this method has not yet been applied to the field of information science. The present study proposes the use of CIR, a modified method that uses an improved ratio of compression, and describes two experiments on authorship attribution using data from modern Japanese literature. The first experiment compares the results of applying CIR and Benedetto’s method to test collections of modified data (fixed length) using a procedure similar to that described by Matsuura et al. The second experiment is based on original data (variable length).
The first experiment showed an average precision rate of 97.7% for CIR, while Benedetto’s method gave a rate of 90.5%. The CIR method proves to be an improvement on the best method described by Matsuura et al. The second experiment confirmed the effectiveness of the CIR method, giving an average precision rate of 95.7%.
- Full Text
- Full Text PDF (332K)