The data directory is constucted as follows:
.
├── data
| ├── features
| | └── xxx.bin
│ ├── labels
| | └── xxx.meta
│ ├── knns
| | └── ...
features
currently supports binary file. (We plan to supportnp.save
file in near future.)labels
supports plain text where each line indicates a label corresponding to the feature file.knns
is not necessary as it can be built with the provided functions.
Take MS-Celeb-1M (Part0 and Part1) for an example. The data directory is as follows:
data
├── features
├── part0_train.bin # acbbc780948e7bfaaee093ef9fce2ccb
├── part1_test.bin # ced42d80046d75ead82ae5c2cdfba621
├── labels
├── part0_train.meta # class_num=8573, inst_num=576494
├── part1_test.meta # class_num=8573, inst_num=584013
├── knns
├── part0_train/faiss_k_80.npz # 5e4f6c06daf8d29c9b940a851f28a925
├── part1_test/faiss_k_80.npz # d4a7f95b09f80b0167d893f2ca0f5be5
├── pretrained_models
├── pretrained_gcn_d_ms1m.pth # 213598e70ddbc50f5e3661a6191a8be1
├── pretrained_gcn_s_ms1m.pth # 3251d6e7d4f9178f504b02d8238726f7
├── pretrained_gcn_d_iop_ms1m.pth # 314fba47b5156dcc91383ad611d5bd96
├── pretrained_gcn_v_ms1m.pth # 020236d4e8dbff975360f08cb47109c0
├── pretrained_gcn_e_ms1m.pth # 315ff08f28f14bc494dd36158c11e900
├── pretrained_lgcn_ms1m.pth # 97fc6e52d1b5e907eabeb01e7b0825f9
To experiment with custom dataset, it is required to provided extracted features and labels. For training, the number of features should be equal to the number of labels. For testing, the F-score will be evaluated if labels are provided, otherwise only clustering results will be generated.
Note that labels
is only required for training clustering model, but it is not mandatory for clustering unlabeled data.
Basically, there are two ways to cluster unlabeled data without meta file.
(1) Do not pass the label_path
in config file. It will not generate loss and evaluation results.
(2) Make a pseudo meta label, e.g., setting all labels to -1, but just ignore the loss and the evaluation results.
The supported datasets are listed below.
- MS-Celeb-1M
- Part1 (584K): GoogleDrive or BaiduYun (passwd: geq5)
- Benchmarks (5.21M): GoogleDrive or OneDrive.
- Precomputed KNN: OneDrive
- Image Lists: GoogleDrive or OneDrive.
- Original Images: OneDrive. We re-align MS1M-ArcFace with our own face alignment model.
- Pretrained Face Recognition Model: GoogleDrive. For using the model to extract features, please check the code and use sample data to have a try.
- YouTube-Face: GoogleDrive or BaiduYun (passwd: aper).
- DeepFashion: GoogleDrive or BaiduYun (passwd: 8fai)
You can download datasets with above links or with scripts below:
python tools/download_data.py
Now, you can switch to README.md to train and test the model.