Skip to content

xiaoen0/Awesome-MM-Learning

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 

Repository files navigation

Awesome-MM-Learning

Image-Language

  • Denoising Diffusion Probabilistic Models
    [paper] [blog post]
    Known as: DDPMs, diffusion models, score-based generative models or simply autoencoders.

  • 4M-21: An Any-to-Any Vision Model for Tens of Tasks and Modalities
    [paper] [code]

  • Minigpt-4: Enhancing vision-language understanding with advanced large language models
    [ICLR 2023] [paper] [project page]

  • GILL: Generating Images with Multimodal Language Models
    [NeurIPS 2023] [paper] [code]
    Captioning loss: The goal is to minimize the difference between the generated captions and the ground truth captions provided in the training data. The most common type of captioning loss is cross-entropy loss.
    Single-stage training

  • BLIP-2
    [paper]

  • Rethinking Visual Prompting for Multimodal Large Language Models with External Knowledge
    [paper]

  • Robust Multimodal Learning via Representation Decoupling
    [paper]

Image/Video-Text-Audio-Depth-Thermal-IMU

  • PandaGPT: One Model To Instruction-Follow Them All
    [TLLM 2023] [paper] [project page] [code]
    Task:
    (1) Image-Text Tasks: image description generation
    (2) Video-Text Tasks: writing stories inspired by videos
    (3) Audio-Text Tasks: answering questions about audios
    Data: 160k image-text instruction-following data released by LlaVa and MiniGPT-4
    Model: ImageBind + Vicuna
  • Enhance the Robustness in Text-Centric Multimodal Alignments
    [paper]

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published