Staffordshire University logo
STORE - Staffordshire Online Repository

Construction and Performance Analysis of a Groomed Polarity Lexicon Derived from Product Review Source Datasets

COLLEY, Derek and ASADUZZAMAN, Md (2021) Construction and Performance Analysis of a Groomed Polarity Lexicon Derived from Product Review Source Datasets. In: 11th International Conference on Intelligent Data Acquisition and Advanced Computing Systems: Technology and Applications (IDAACS 2021). IEEE. (In Press)

[img] Text (Construction and Performance Analysis of a Groomed Polarity Lexicon Derived from Product Review Source Datasets)
paper5.pdf - AUTHOR'S ACCEPTED Version (default)
Restricted to Repository staff only until 1 March 2023.
Available under License All Rights Reserved.

Download (437kB) | Request a copy

Abstract or description

Using a large, publicly-available dataset [1], we extract over 51 million product reviews. We split and associate each word of each review comment with the review score and store the resulting 3.7 billion word- and score pairs in a relational database. We cleanse the data, grooming the dataset against a standard English dictionary, and create an aggregation model based on word count distributions across review scores. This renders a model dataset of words, each associated with an overall positive or negative polarity sentiment score based on star rating which we correct and normalise across the set. To test the efficacy of the dataset for sentiment classification, we ingest a secondary cross-domain public dataset containing freeform text data and perform sentiment analysis against this dataset. We then compare our model performance against human classification performance by enlisting human volunteers to rate the same data samples. We find our model emulates human judgement reasonably well, reaching correct conclusions in 56% of cases, albeit with significant variance when classifying at a coarse grain. At the fine grain, we find our model can track human judgement to within a 7% margin for some cases. We consider potential improvements to our method and further applications, and the limitations of the lexicon-based approach in cross-domain, big data environments.

Item Type: Book Chapter, Section or Conference Proceeding
Faculty: School of Digital, Technologies and Arts > Computer Science, AI and Robotics
Depositing User: Derek COLLEY
Date Deposited: 12 Oct 2021 10:34
Last Modified: 17 Nov 2021 04:30
URI: https://eprints.staffs.ac.uk/id/eprint/7029

Actions (login required)

View Item View Item

DisabledGo Staffordshire University is a recognised   Investor in People. Sustain Staffs
Legal | Freedom of Information | Site Map | Job Vacancies
Staffordshire University, College Road, Stoke-on-Trent, Staffordshire ST4 2DE t: +44 (0)1782 294000