Explore open access research and scholarly works from STORE - University of Staffordshire Online Repository

Advanced Search

Crossing Linguistic Barriers: Authorship Attribution in Sinhala Texts

Sarwar, Raheem, Perera, Maneesha, Teh, Pin Shen, NAWAZ, Raheel and Hassan, Muhammad Umair (2024) Crossing Linguistic Barriers: Authorship Attribution in Sinhala Texts. ACM Transactions on Asian and Low-Resource Language Information Processing, 23 (5). pp. 1-14. ISSN 2375-4699

[thumbnail of 3655620.pdf]
Preview
Text
3655620.pdf - Publisher's typeset copy
Available under License Type Creative Commons Attribution 4.0 International (CC BY 4.0) .

Download (1MB) | Preview
Official URL: http://dx.doi.org/10.1145/3655620

Abstract or description

Authorship attribution involves determining the original author of an anonymous text from a pool of potential authors. The author attribution task has applications in several domains, such as plagiarism detection, digital text forensics, and information retrieval. While these applications extend beyond any single language, existing research has predominantly centered on English, posing challenges for application in languages such as Sinhala due to linguistic disparities and a lack of language processing tools. We present the first comprehensive study on cross-topic authorship attribution for Sinhala texts and propose a solution that can effectively perform the authorship attribution task even if the topics within the test and training samples differ. Our solution consists of three main parts: (i) extraction of topic-independent stylometric features, (ii) generation of a small candidate author set with the help of similarity search, and (iii) identification of the true author. Several experimental studies were carried out to demonstrate that the proposed solution can effectively handle real-world scenarios involving a large number of candidate authors and a limited number of text samples for each candidate author. © 2024 Copyright held by the owner/author(s). Publication rights licensed to ACM.

Item Type: Article
Faculty: Executive
Depositing User: Raheel NAWAZ
Date Deposited: 11 Sep 2024 15:28
Last Modified: 11 Sep 2024 16:01
URI: https://eprints.staffs.ac.uk/id/eprint/8438

Actions (login required)

View Item
View Item