According to Michele Walters, Co-Founder of Origin World Labs, hotel data science seems to have a plethora of untapped, solvable problems.
"This shortage will hit the hospitality industry especially hard as it tends to be at the bottom of the totem pole for attracting analytical and technical talent. Unfortunately, hospitality is still perceived as an industry where soft skills are overwhelmingly more important than hard skills."
"Companies such as Marriott and Disney have realized that hospitality is a data-intensive business and that there is a wealth of creative strategies and tactics that can be found when the data is analyzed by professionals. Yet, even for these big brands the single biggest obstacle to making data-driven progress is their inability to find enough qualified talent to fill their analytics positions."
Anonymized hotel_id data was obtained by Professor Hongning Wang.
He and co-authors have accompanying machine learning papers discovering latent aspects in the rating.
Overview of data:
2232 Hotels, 37181 Reviews, 34187 Reviewers, 96.5 Avg Len, [3.92-1.23, 3.929+1.23] Rating.
"Latent Aspect Rating Analysis on Review Text Data: A Rating Regression Approach", paper, slides
"Latent Aspect Rating Analysis without Aspect Keyword Supervision", paper, slides
Clustering on four aspect ratings Value, Room, Location, Cleanliness
(the features)
from the table H_normed
in the database app.db
available in the web-app
folder.
Store data from [Data page] into SQL databases. Group ratings by hotel_id and average the predictions for the aspect ratings.
- Scrape hotel data from Yelp using their API and have non-anonymized data.
- Create LARA, a combination of regression and maximum likelihood estimation, code for python to get prediction for the four aspects listed above in Model.
- Use word2vec on the keywords extracted from LARA to build an alternative model for unsupervised learning.