Construction and Performance Analysis of a Groomed Polarity Lexicon Derived from Product Review Source Datasets
COLLEY, Derek and ASADUZZAMAN, Md (2021) Construction and Performance Analysis of a Groomed Polarity Lexicon Derived from Product Review Source Datasets. In: 11th International Conference on Intelligent Data Acquisition and Advanced Computing Systems: Technology and Applications (IDAACS 2021). IEEE. (In Press)
|
Text (Construction and Performance Analysis of a Groomed Polarity Lexicon Derived from Product Review Source Datasets)
paper5.pdf - AUTHOR'S ACCEPTED Version (default) Available under License All Rights Reserved. Download (437kB) | Preview |
Abstract or description
Using a large, publicly-available dataset [1], we extract over 51 million product reviews. We split and associate each word of each review comment with the review score and store the resulting 3.7 billion word- and score pairs in a relational database. We cleanse the data, grooming the dataset against a standard English dictionary, and create an aggregation model based on word count distributions across review scores. This renders a model dataset of words, each associated with an overall positive or negative polarity sentiment score based on star rating which we correct and normalise across the set. To test the efficacy of the dataset for sentiment classification, we ingest a secondary cross-domain public dataset containing freeform text data and perform sentiment analysis against this dataset. We then compare our model performance against human classification performance by enlisting human volunteers to rate the same data samples. We find our model emulates human judgement reasonably well, reaching correct conclusions in 56% of cases, albeit with significant variance when classifying at a coarse grain. At the fine grain, we find our model can track human judgement to within a 7% margin for some cases. We consider potential improvements to our method and further applications, and the limitations of the lexicon-based approach in cross-domain, big data environments.
Item Type: | Book Chapter, Section or Conference Proceeding |
---|---|
Faculty: | School of Digital, Technologies and Arts > Computer Science, AI and Robotics |
Depositing User: | Derek COLLEY |
Date Deposited: | 12 Oct 2021 10:34 |
Last Modified: | 01 Mar 2023 01:38 |
URI: | https://eprints.staffs.ac.uk/id/eprint/7029 |
Actions (login required)
View Item |