Explore open access research and scholarly works from STORE - University of Staffordshire Online Repository

Advanced Search

AGI-P: A Gender Identification Framework for Authorship Analysis Using Customized Fine-Tuning of Multilingual Language Model

Sarwar, Raheem, An Ha, Le, Teh, Pin Shen, Sabah, Fahad, NAWAZ, Raheel, Hameed, Ibrahim A. and Hassan, Muhammad Umair (2024) AGI-P: A Gender Identification Framework for Authorship Analysis Using Customized Fine-Tuning of Multilingual Language Model. IEEE Access, 12. pp. 15399-15409. ISSN 2169-3536

[thumbnail of AGI-P_A_Gender_Identification_Framework_for_Authorship_Analysis_Using_Customized_Fine-Tuning_of_Multilingual_Language_Model.pdf]
Preview
Text
AGI-P_A_Gender_Identification_Framework_for_Authorship_Analysis_Using_Customized_Fine-Tuning_of_Multilingual_Language_Model.pdf - Publisher's typeset copy
Available under License Type Creative Commons Attribution 4.0 International (CC BY 4.0) .

Download (1MB) | Preview
Official URL: http://dx.doi.org/10.1109/ACCESS.2024.3358199

Abstract or description

In this investigation, we propose a solution for the author's gender identification task called AGI-P. This task has several real-world applications across different fields, such as marketing and advertising, forensic linguistics, sociology, recommendation systems, language processing, historical analysis, education, and language learning. We created a new dataset to evaluate our proposed method. The dataset is balanced in terms of gender using a random sampling method and consists of 1944 samples in total. We use accuracy as an evaluation measure and compare the performance of the proposed solution (AGI-P) against state-of-the-art machine learning classifiers and fine-tuned pre-trained multilingual language models such as DistilBERT, mBERT, XLM-RoBERTa, and Multilingual DEBERTa. In this regard, we also propose a customized fine-tuning strategy that improves the accuracy of the pre-trained language models for the author gender identification task. Our extensive experimental studies reveal that our solution (AGI-P) outperforms the well-known machine learning classifiers and fine-tuned pre-trained multilingual language models with an accuracy level of 92.03%. Moreover, the pre-trained multilingual language models, fine-tuned with the proposed customized strategy, outperform the fine-tuned pre-trained language models using an out-of-the-box fine-tuning strategy. The codebase and corpus can be accessed on our GitHub page at: https://github.com/mumairhassan/AGI-P © 2013 IEEE.

Item Type: Article
Uncontrolled Keywords: Business analytics, gender identification, language models, tourism industry.
Faculty: Executive
Depositing User: Raheel NAWAZ
Date Deposited: 11 Sep 2024 15:34
Last Modified: 11 Sep 2024 16:00
URI: https://eprints.staffs.ac.uk/id/eprint/8446

Actions (login required)

View Item
View Item