This is a group project for CS229, categorized in Natural Language Process.
More details would be in this poster.
While there are a lot of recommendation systems for movies, audios, books, or restaurants, surprisingly there is
hardly any for dishes. As food lovers, we intend to address this issue by exploring models for recommending Chinese dishes to users.
We scraped data from a popular Chinese recipe website/app called Xia Chu Fang Cooking Recipe, and
implemented word embedding algorithms and recommendation systems algorithms to build our model.
Specifically, the input to our algorithm are users from Xia Chu Fang and dish names they stored in their favourite list.
We then use Word2Vec and collaborative filtering to build our recommendation system.Specifically,
we explore the Skip-Gram model in Word2Vec to calculate dish similarity, and apply the Non-NegativeMatrix Factorization
and Singular Value Decomposition methods in collaborative filtering to output predicted ratings on dishes,
which allows generation of top recommendations for users based on predicted ratings.Limiting our dishes to Chinese
cuisine allows us to take on a more tailored recommendation system, while still maintain a high practical value given
the rising popularity and diversity of Chinese cuisines. Through this project, we hope to tackle a less touched topic
of using Machine Learning for dish recommendation and promote Chinese food.
chef_spider.py
- Utilize parallel crawling and proxies to fetch data more efficiently
- We run this spider on Google Cloud, which created two virtual machines and each contains 8vCPUs.
preprocess.py
- Map users' favorite recipes into a fixed-length dictionary, which has 1871 unique keys representing recipe names.
- This dictionary also has corresponding english translation of each dish's chinese recipe name.
This content-based method could tackle the issue of a cold start, and intuitively gives us recommendations based on
- your input keywords
- other users appetites in our database
word2vec.py
Usage
- Suppose you have downloaded our repo including the models in the
models
directory. - Then you just need to execute this script by typing
python word2vec.py
Tips
-
You may need to install some dependencies by run
pip install -r requirements.txt
. -
If you really don't want to see any warnings, you can simply use this command
python -W ignore word2vec.py
. -
The visualization by T-SNE is in the
res
folder, you can see the two figures (here and here) which generated by different seeds.Also, we have a sample only trained by part data.
- NMF (Non-negative Matrix Factorization)
- SVD (Singular Value Decomposition)
We implement this idea mainly by matrix factorization, which gives us more quantified way to measure our model by recall, see more details here.
TL;DR;
Dev Set RMSE | Test Set RMSE | Dev Set Recall | Test Set Recall | |
---|---|---|---|---|
NMF | 0.4574 | 0.5851 | 0.5081 | 0.5393 |
SVD | 0.3317 | 0.3634 | 0.9173 | 0.9301 |