Online reviews provide valuable information about products and services to consumers. Since they can promote or harm the brand of a product or service, buying or selling fake reviews would be a profitable business and a big threat. Previous attempts for spammer detection used reviewer’s behaviour, text similarity, rating pattern or response time of reviewing. However, in reality, there are other kinds of spammers who can imitate behaviours of genuine reviewers, and thus, cannot be detected by the available techniques. Here, we analyse common behaviours of spammers to come out with unexpected rules that helps us build the opinion spam detection model.
Reviews from Tripadvisor.com for some of the major hotels in Singapore has been scrapped for building the analysis for this project. The following explains the directory structure
Python-Codes/ - The directory contains the python codes used for crawling the reviews from Tripadvisor.com. The web-scrapping uses BeautifulSoup, a python library to pull data from HTML content.
R-Codes/ - This directory contains R code for performing the statistical analysis of the crawled reviews.
Doc.pdf - The file contains the complete documentation of the project highlighting the analysis performed for identifying the spam reviews.