A pytorch trainer with a range of choice for backbones, losses, augmentations. And wandb tracking and sweeps!
- Easy to use for your own datasets, based on
torchvision.datasets.ImageFolder
, supports multiple combinations and pre-existing datasets from Pytorch; - Several backbones available, based on the awesome
timm
; - Losses are a mix from what you can find in the official Pytorch and Pytorch Metric Learning;
- Experiments can be tracked with Weight & Biases and sweeps are integrated, check the section about that;
- Create and activate the conda environment (it should take a few minutes):
conda env create --name miyagi --file=environment.yml
conda activate miyagi
- (Optionally) Login in with your wandb account:
wandb login
- Run training (ex.: CIFAR10, mobilenet_v3, CE loss):
python miyagi_trainer/train.py --resize_size 224 --train_datasets CIFAR10 --val_datasets CIFAR10 --backbone mobilenetv3_large_100
That should download everything you need and start running. If you did 2
for sure you're interested in tracking this experiment. For that, set the following args:
--track_experiment --experiment_group miyagi-test --experiment_name test1 --wandb_user my_wandb_user
You always need to set a size to resize all your images (that's make easier to accommodate for different datasets).
For that, set the --resize_size
param, such as --resize_size 224
.
To have up and running in no time, the trainer support using the Pytorch official datasets, examples are CIFAR10, MNIST, ImageNet, etc.). Anyone in that list should work fine.
--train-datasets CIFAR10 --val-datasets CIFAR10
Of course you can also use your own data to train. For that we use the torchvision.datasets.ImageFolder
. As per the documentation, you should put your images for each class inside a folder named with that class, ex:
custom_dataset_folder
│
└───train
│ │
│ └───class1
│ | file001.png
│ | file002.png
│ | ...
│ └───class2
│ file003.jpg
│ file004.png
│ ...
└───val
└───class1
│ file005.jpeg
...
For reasons that should become clear, you need to chose a name for your custom dataset and put them into the CUSTOM_DATASETS
dict in datasets.py:
CUSTOM_DATASETS = {
"custom_dataset": "data/custom_dataset_folder/"
}
Then you can reference it by name. Is's always expected that you have a train
and val
folder as the primary subfolders in your path (check tree above). You also can combine multiples datasets using +
, like this:
--train-datasets CIFAR10+CIFAR100 --val-datasets CIFAR10
I implemented using +
and a single string because otherwise sweeps will not work for them. A cool thing about this feature is that you can have a dataset that has only a subset of classes from another and it will still work just fine (see implementation for the details)
The lib uses the 0.5.4 version of the timm
package. You can always update to the newest version, if you want it. It also works on torchvision models, but, if you choose a backbone that is in both, priority is always given to timm
. You can control if want a pre-trained model (the default) or don't (--no_transfer_learning
). You specify a backbone using the following:
--backbone mobilenetv3_large_100
A nice tip: you can search by the timm backbones easily as well, for example:
>>> import timm
>>> timm.list_models("mobilenet*")
['mobilenetv2_035', 'mobilenetv2_050', 'mobilenetv2_075', 'mobilenetv2_100', 'mobilenetv2_110d', 'mobilenetv2_120d', 'mobilenetv2_140', 'mobilenetv3_large_075', 'mobilenetv3_large_100', 'mobilenetv3_large_100_miil', 'mobilenetv3_large_100_miil_in21k', 'mobilenetv3_rw', 'mobilenetv3_small_050', 'mobilenetv3_small_075', 'mobilenetv3_small_100']
Since we're dealing with classification problems, cross entropy is the default loss. You also can choose to do label smoothing by setting --ce_loss_label_smoothing [LABEL SMOOTHING FACTOR]
. The Pytorch Metric Learning package is also included, right now you can only use the --loss angular
from it (we should include more in the near future, but if you want it's quite easy to include it in losses.py
).
We don't use an specific lib for augmentation, but we support some nice ones. You just need to define the --augmentation
parameter to one of those:
no_aug
: resize the image according to theresize_size
parameter and normalize it;simple
: some crop, flips and rotations;random_erase
: erase parts of the images randomly;rand-m9-n3-mstd0.5
,rand-mstd1-w0
, etc: RandAugment from timm library. Usually the best option, check the official doc for details.
We have a few other options that you can set as well. For a full list just run python miyagi_trainer/train.py --help
. Here's the most important ones:
weight_decay
(default1e-4
): weight decay is nowadays consider to be a good idea almost always. We set the default, but you could try to find the sweet spot for your problem, you can check this paper to see for yourself if that is worth doing it.optimizer
(defaultsgd
): the optimizer for training, you can also set toadam
andadamp
. I usually let it to SGD because we useCosineAnnealingWarmRestarts
as the LR scheduler by default and I had some problems using other optimizers with that.- Others:
batch_size
(default64
),n_epochs
(default30
).
As mentionned in the quickstart, you have a few option if you want to enable experiment tracking in the trainer, check it out:
track_experiment
: this boolean param must be set for enabling experiment tracking;wandb_user
: your user from Weight & Biases;experiment_group
(defaultmiyagi-pytorch-trainer
): that's the name you use to group experiments and compare them, it's the project arg on wandb.init(). See documentation.
Do not forge to run wandb login
before your experiment. If everything works out, you should have experiment tracking and can analyze your losses and accuracies. Check it out:
That's even an analysis of how much system resource are you using. Also, associated with each experiment there is all the original configuration used for it.
Almost every time that you're training a model, you want to try out a bunch of different models, configs, hyperparameters. Wandb sweeps can help you a lot for that and it's quite simple to use. Basically, you need to register a sweep, which defines which options/intervals you want to explore. This info is kept in the cloud and then you can open multiples machines (agents) to run this "queue" of experiments in parallel.
There is an example in the sweeps/
folder, called cifar10_full_sweep.yaml
which tests different options, such as backbones, augmentations, weight decay values, etc. To run this sweep you use the following command:
$ wandb sweep sweeps/cifar10_full_sweep.yaml
wandb: Creating sweep from: sweeps/cifar10_full_sweep.yaml
wandb: Created sweep with ID: xxxxxxx
wandb: View sweep at: https://wandb.ai/wandb_user/miyagi_pytorch_trainer/sweeps/xxxxxx
wandb: Run sweep agent with: wandb agent wandb_user/miyagi_pytorch_trainer/xxxxxx
Once you done this, you need to run agents in the machines that you had previously setup:
wandb agent wandb_user/miyagi_pytorch_trainer/xxxxxx
This example will run a train for all the options (grid search) that is specified in the yaml file. That should take a couple of days, but you end up with a very cool experimental set:
- export pytorch models (maybe onnx);
- resume training checkpoint;
Anything else, open an issue.