User Bias Removal in Fine Grained Sentiment Analysis
Rahul Wadbude*, Vivek Gupta*, Dheeraj Mekala, Janish Jindal, Harish Karnick
*Equal contribution
The Crux
Major problem in current sentiment classification models is noise due to presence of user biases in reviews rating.
We worked on two simple statistical methods to remove user bias noise to improve fine grained sentimental classification.
We applied our methods on SNAP published Amazon Fine Food Reviews data-set and two major categories Electronics and Movies & TV of e-Commerce Reviews data-set.
We gained improvement on standard evaluation metrics (rmse) for three commonly used feature representation after removing user bias compared to one without removing bias on task of fine grained sentiment analysis
Models
User Bias Removal - I (UBR-I)
User Bias Removal - II (UBR-II)
Results and Analysis
Classification Results
- We report results on three datasets: Amazon food reviews, Amazon electronics reviews and Amazon movies & TV reviews
- The review text was represented into three features representation (tf-idf,lda & doc2vec) and evaluated on standard metric like rmse. Below tables, shows the performance of different classification methods with various feature representations.
FOOD REVIEWS
Methods | tf-idf | LDA | PV-DBOW |
---|---|---|---|
Majority Voting | 1.535 | 1.535 | 1.535 |
User Mean | 0.599 | 0.599 | 0.599 |
User Mode | 2.557 | 2.557 | 2.557 |
Product Mean | 1.140 | 1.140 | 1.140 |
Product Mode | 1.746 | 1.746 | 1.746 |
Direct | 0.888 | 1.494 | 1.06 |
Direct(bigram) | 0.737 | - | - |
UBR-I | 0.546 | 0.597 | 0.56 |
UBR-I(bigram) | 0.529 | - | - |
UBR-II | 0.669 | 0.778 | 0.71 |
UBR-II(bigram) | 0.642 | - | - |
ELECTRONICS REVIEWS
Methods | tf-idf | LDA | PV-DBOW |
---|---|---|---|
Majority Voting | 1.417 | 1.417 | 1.417 |
User Mean | 1.022 | 1.022 | 1.022 |
User Mode | 1.278 | 1.278 | 1.278 |
Product Mean | 1.095 | 1.095 | 1.095 |
Product Mode | 1.358 | 1.358 | 1.358 |
Direct | 0.932 | 1.434 | 1.1 |
Direct(bigram) | 0.805 | - | - |
UBR-I | 0.815 | 0.988 | 0.86 |
UBR-I(bigram) | 0.763 | - | - |
UBR-II | 0.821 | 1.011 | 0.9 |
UBR-II(bigram) | 0.761 | - | - |
MOVIES & TV
Methods | tf-idf | LDA | PV-DBOW |
---|---|---|---|
Majority Voting | 1.494 | 1.494 | 1.494 |
User Mean | 1.005 | 1.005 | 1.005 |
User Mode | 1.258 | 1.258 | 1.258 |
Product Mean | 1.066 | 1.066 | 1.066 |
Product Mode | 1.347 | 1.347 | 1.347 |
Direct | 0.936 | 1.273 | 1.08 |
Direct(bigram) | 0.853 | - | - |
UBR-I | 0.818 | 0.959 | 0.87 |
UBR-I(bigram) | 0.783 | - | - |
UBR-II | 0.814 | 0.982 | 0.87 |
UBR-II(bigram) | 0.775 | - | - |
Conclusion
Our experiments show that user-bias removal is a legitimate problem and need to be handled in fine grained sentiment analysis.
Our experiments show that the methods proposed above remove user bias and improve fine grained sentiment analysis.
Our experiments show that the methods proposed above work well with commonly use text feature representation methods.