This repository contains our solution to FARFETCH Fashion Recommendations Challenge that achieved 3rd place.
The aim is to predicit the products clicked by a user from a list of selected recommendations. First, we use Cleora - our graph embedding method - to represent products as a directed graph and learn their vector representation. Products are embeded in two relations:
- Only clicked products in a given session (
clicked-modality
) - Viewed products in a given session (
viewed-modality
)
As a results we obtain two embeddings. No all products were clikced or viewed, so for a small number of proucts we do not have a vector representation from Cleora. Next, we apply EMDE to predict the product based on previously clicked and viewed products. We also add some features associated with each user.
Model takes as an input 4 sketches:
- Sketch of all products clicked in the previous sessions (
clicked-modality
) - Sketch of all products clicked in the current sessions apart from the current query (
clicked-modality
) - Sketch of all products viewed in the current sessions apart from the current query (
viewed-modality
) - Sketch of all products displayed in the current query (
viewed-modality
)
Model return a sketch of a product clicked in the current query. Output sketch is from viewed-modality
as it contains more product embedding/codes. The output sketch is then scored against all product sketch from viewd-modality
and click probiablity is obtained.
- Download binary Cleora release. Then add execution permission to run it. Refer to cleora github webpage for more details about Cleora.
- Python 3.7
- Install requirments:
pip install -r requirements.txt
- GPU for training
-
Create
data
directory insrc
folder:mkdir src/data
-
Put
train.parquet
,validation.parquet
andtest.parquet
intosrc/data
folder -
Change directory to
src
cd src
-
Transform all parquest files to CSV with sequential-like form. It also creates input files to Cleora:
python transform_to_sequential_data.py --data-dir data
This script will create three CSV files:
data/train_original_processed_reproducing.csv
,data/val_original_processed_reproducing.csv
anddata/test_original_processed_reproducing.csv
And two input files for Cleora algorithmdata/cleoraInput_sessionIdGrouped_viewed
,data/cleoraInput_sessionIdGrouped_onlyClicked
. Script also creates and saves dict with products2attributes data; atdata/products_dict_reproducing
. -
Create datapoints for running the model:
python create_datapoints.py --data-dir data
This script will create three files:
data/train_datapoints_sequential_reproducing
,data/validation_datapoints_sequential_reproducing
anddata/test_datapoints_sequential_reproducing
. -
Compute product sketches using Cleora and EMDE
python encode.py --data-dir data
This script will create LSH codes for each product from
viewed-modality
andclicked-modality
. Codes are saved todata/codes_viewed
andcodes_clicked
-
Run training
python train.py --data-dir data
Logs are saved to:
src/logs/runs
-
Download trained model checkpoint: https://drive.google.com/file/d/1vnuKZGdEGHzGkBrVUx7JNbyOcqE-OK5o/view?usp=sharing
-
Run test. Use flag
checkpoint-path
to specify trained model path;model_trained.ckpt
by default. Flag--subset-to-use
to specify whether to usevalidation
ortest
subset;test
by default.python test.py --data-dir data