GitHub - jggomez/multimodal-ai-gemini

Multimodal AI with Gemini

Overview

This repository provides a collection of examples demonstrating the capabilities of multimodal AI using the Gemini model. We explore various modalities including image, video, audio, and text, showcasing how to effectively combine these inputs for diverse applications.

Structure

The repository is organized into the following directories:

gemini: Examples specifically utilizing the Gemini model for multimodal tasks.
image_and_video: Demonstrations of multimodal AI with image and video data.
audio: Examples focused on audio-based multimodal AI.
embeddings: Code for generating multimodal embeddings using Gemini.

Examples

Image and Video:
- Image captioning: Generate descriptive captions for images.
- Video object detection: Identify and locate objects within videos.
- Video summarization: Create concise summaries of video content.
- Image-text generation: Generate descriptive text for given images or videos.
Audio:
- Audio transcription: Convert spoken language into text.
Embeddings:
- Image embeddings: Generate numerical representations of images.
- Video embeddings: Create numerical representations of videos.
PDF:
- Add PDFs to Gemini requests to perform tasks that involve understanding the contents of the included PDFs

Contributions

We welcome contributions to this repository! If you have any improvements, new examples, or bug fixes, please feel free to open a pull request.

Made with ❤ by jggomez.

License

Copyright 2024 Juan Guillermo Gómez

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.gitignore		.gitignore
Embeddings_Multimodal.ipynb		Embeddings_Multimodal.ipynb
Multimodal_AI_Audio.ipynb		Multimodal_AI_Audio.ipynb
Multimodal_AI_Images_and_Videos.ipynb		Multimodal_AI_Images_and_Videos.ipynb
Multimodal_AI_PDF.ipynb		Multimodal_AI_PDF.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Multimodal AI with Gemini

Overview

Structure

Examples

Contributions

License

About

Releases

Packages

Languages

jggomez/multimodal-ai-gemini

Folders and files

Latest commit

History

Repository files navigation

Multimodal AI with Gemini

Overview

Structure

Examples

Contributions

License

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages