ISO 9001:2015

Cross-Domain Sentiment Analysis: Evaluating Model Robustness on Combined Review Datasets from Amazon

Anuj Tiwari & Akash Saraswat

Sentiment analysis has become an essential tool for understanding consumer opinions, particularly in the domain of product reviews. This study focuses on cross-domain sentiment analysis, specifically evaluating the robustness of sentiment classification models trained on multiple combined review datasets. The primary objective of this research is to assess how different machine learning models perform when trained on diverse sources of review data, such as Amazon product reviews, and how well they generalize across various domains. The study compares the performance of classical machine learning algorithms, including Naive Bayes (NB), Logistic Regression (LR), Support Vector Machine (SVM), Decision Tree (DT), and Random Forest (RF), using a combined dataset of Amazon product reviews. The research employs a structured approach beginning with data preprocessing, which involves cleaning raw review texts by eliminating noise such as URLs, special characters, and stopwords. The dataset is further processed to extract relevant features that capture the sentiments expressed in the reviews. A series of classification models are then applied, each trained on a feature set derived from the processed text. Key evaluation metrics, such as accuracy and F1-score, are used to assess the effectiveness of each model in predicting sentiment, and the results are analyzed for statistical significance. The research finds that models trained on combined datasets exhibit varying levels of performance, with certain algorithms outperforming others in terms of both accuracy and robustness. The Naive Bayes and Logistic Regression models, in particular, demonstrate higher stability across different subsets of the test data, suggesting their suitability for real-world sentiment classification tasks. Additionally, the paper presents an analysis of the factors contributing to model performance, including the impact of domain-specific vocabulary and the challenges posed by the variability in review content. Through detailed performance metrics and model comparison, this research provides valuable insights into the practical challenges and opportunities of applying sentiment analysis in real-world scenarios where data comes from multiple sources. The findings contribute to the broader field of Natural Language Processing (NLP) by highlighting the strengths and limitations of cross-domain sentiment analysis, offering practical guidelines for selecting the most appropriate machine learning models for sentiment classification tasks in e-commerce and beyond.

 


DOI:

Article DOI:

DOI URL:


Download Full Paper:

Download