SdP-Net - SlapDash Net

This is actually a less serious weekend project called SlapDash-Net which can be considered a less serious variation on VIT architecture. We use some encoder type transformer layers, together with some register tokens. Prior to encoder encoder layer we introduce some convolution layers in a highly slapdash manner. The dudes will be trained on ImageNet1k/22k dataset.

Our motto in coming up with SdP-Net:

No promise to get very high accuracy,
No prior assumption that exactly the same idea might have been used elsewhere,
No attempt to tweak hyperparameters more than needed,
We would like to hybridize things,
We do bizzare combinations for the mere reason: because we would like to!!!,
In SdP-Net we trust!

The setup is as follows:

(Patcher + Embedding Layer) + 5 x CLS_tokens --> N x (SDP_NeT_Blocks) --> Encoder_Layer + CLS_tks --> CLS_tks.mean(-2) -> MLP -> Logits

where SDP_NeT_Blocks = 2 x (DW_Conv + MLP) + Transformer_Encoder_Blocks. CLS tokens attend the image only through attention blocks. They are able to attend each other as well.

Training Details

#Size	#Params	#Blocks	Patch_size	Conv_Size	Embed_Dim	Top1 Acc
XXS	55M	7	16	7	128	?
S	76M	12	16	7	512	?
M	86M	12	16	7	768	?
L	86M	12	16	7	768	?
XL	101 M	17	14	7	768	82.1

Bitter lesson: The biggest model gives 82.2 acc on Imagenet1K, EMA of this model gives slightly better result. Trained only for 200 epochs. The Weights of the trained models can be publicized upon request.

Optimizers

AdamW: lr = 0.001875 (=0.001*batch_size/512) Weight decay 0.05 CosineAnnealing with warm starts in addition to 5 warming up epochs.

Augmentation and Regularization

RandAugment + Random erase + Random resize+ CutMix + MixUp + Dropout(0.2) (Only to FFN parts of Attention layers)

Augmentation and Regularization

#TODO

Gating mechanism in FFN?
LayerScale --> This will be needed for deeper networks!
Neighborhood embedding --> See layers ConvEmbedding layer (A larger embedding look up dictionary is used and for an individual patch, a neighbour of embeddings are averaged!)
Use KeLü activation instead of GeLu (KeLü implemented but may not be really optimized!)
Use BCE loss.

Name		Name	Last commit message	Last commit date
Latest commit History 313 Commits
README.md		README.md
cifar100_test.py		cifar100_test.py
dataset_generator.py		dataset_generator.py
hf_dataset_generator.py		hf_dataset_generator.py
layers.py		layers.py
lr.ipynb		lr.ipynb
model.py		model.py
model_config_vit.yaml		model_config_vit.yaml
model_test.py		model_test.py
model_train.py		model_train.py
training_tools.py		training_tools.py
training_utilities.py		training_utilities.py
utility_layers.py		utility_layers.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SdP-Net - SlapDash Net

Our motto in coming up with SdP-Net:

Training Details

Optimizers

Augmentation and Regularization

Augmentation and Regularization

About

Releases

Packages

Languages

y-akbal/SdP-Net

Folders and files

Latest commit

History

Repository files navigation

SdP-Net - SlapDash Net

Our motto in coming up with SdP-Net:

Training Details

Optimizers

Augmentation and Regularization

Augmentation and Regularization

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages