Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Geometric Parametrization for Inf-CLIP #4

Open
zer0int opened this issue Oct 30, 2024 · 0 comments
Open

Geometric Parametrization for Inf-CLIP #4

zer0int opened this issue Oct 30, 2024 · 0 comments

Comments

@zer0int
Copy link

zer0int commented Oct 30, 2024

Dear researchers, thank you very much for your paper & code!

I am keen to hear your thoughts on implementing Geometric Parametrization (GmP) with Inf-CLIP.
I have previously implemented GmP for 'classic' CLIP fine-tuning. In a nutshell:

GmP CLIP MLP:

(mlp): Sequential(
  |-(c_fc): GeometricLinear()
  | (gelu): QuickGELU()
|-}-(c_proj): GeometricLinear()
| | 
| |-- visual.transformer.resblocks.0.mlp.c_fc.r
| |-- visual.transformer.resblocks.0.mlp.c_fc.theta
| |-- visual.transformer.resblocks.0.mlp.c_fc.bias
|
|---- visual.transformer.resblocks.0.mlp.c_proj.r
|---- visual.transformer.resblocks.0.mlp.c_proj.theta
|---- visual.transformer.resblocks.0.mlp.c_proj.bias

(Same for [text] transformer.resblocks)

I was able to archive a marked improvement over pre-trained OpenAI CLIP ViT-L/14 with this technique (dataset: COCO-SPRIGHT-40k). The model was fine-tuned on 1x RTX 4090 with a batch size of 40 (!).

GmP-results

Evals:
github.com/LAION-AI/CLIP_benchmark
objectnet.dev/mvt/

Code to reproduce results / fine-tune:
github.com/zer0int/CLIP-fine-tune

Models (dataset is linked) + further results (retrieval, multimodal gap):
huggingface.co/zer0int/CLIP-GmP-ViT-L-14

CLIP GmP was inspired by this paper:
ReLU Characteristic Activation Analysis


I have forked your Inf-CLIP and provided an initial implementation of GmP:
https://github.com/zer0int/Inf-CLIP

I am unable to test it due to being 'GPU-poor', as above; however, I'd be curious to see if GmP provides additional benefits for Inf-CLIP. Or, on the other hand, if there are problems with GmP + Inf-CLIP.

Also, I am in the process of further modifying your code to implement a "fake" distributed backend to construct a sequential compute of 'tiles' using 1 GPU. Any tips (by anybody who happens to read this) with regards to handling data exchange (which would inevitably involve the CPU) are welcome. Again, thank you for your work!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant