GitHub - NovTi/Vision-Transformer-Without-Flattening

ViT Without Flattening

Experiment of adding convolutional layers to replace the flattening operation of ViT.

The inputs to ViT are not 1-D vectors but are the 2-D feature maps. This is different to the paper CvT: Introducing Convolutions to Vision Transformers.

My ViT got bad performance on my small dataset(3k train, 1k test). Inspired by the CNN's feature of remaining the 2-D structure of the image and it good performance on this small dataset, I want to remaining the 2-D structure of the input of ViT.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
__pycache__		__pycache__
dataset		dataset
Classify.py		Classify.py
ConvBlock.py		ConvBlock.py
ImageEmbed.py		ImageEmbed.py
README.md		README.md
SA1.py		SA1.py
SA2.py		SA2.py
SA3.py		SA3.py
SA4.py		SA4.py
SA5.py		SA5.py
SelfAttention.py		SelfAttention.py
ViT.py		ViT.py
ViTEncoder.py		ViTEncoder.py
dataloader.py		dataloader.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ViT Without Flattening

Current Progress: Fixing the gradient issue

About

Releases

Packages

Languages

NovTi/Vision-Transformer-Without-Flattening

Folders and files

Latest commit

History

Repository files navigation

ViT Without Flattening

Current Progress: Fixing the gradient issue

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages