Skip to content

jggomez/multimodal-ai-gemini

Repository files navigation

Multimodal AI with Gemini

Overview

This repository provides a collection of examples demonstrating the capabilities of multimodal AI using the Gemini model. We explore various modalities including image, video, audio, and text, showcasing how to effectively combine these inputs for diverse applications.

Structure

The repository is organized into the following directories:

  • gemini: Examples specifically utilizing the Gemini model for multimodal tasks.
  • image_and_video: Demonstrations of multimodal AI with image and video data.
  • audio: Examples focused on audio-based multimodal AI.
  • embeddings: Code for generating multimodal embeddings using Gemini.

Examples

  • Image and Video:
    • Image captioning: Generate descriptive captions for images.
    • Video object detection: Identify and locate objects within videos.
    • Video summarization: Create concise summaries of video content.
    • Image-text generation: Generate descriptive text for given images or videos.
  • Audio:
    • Audio transcription: Convert spoken language into text.
  • Embeddings:
    • Image embeddings: Generate numerical representations of images.
    • Video embeddings: Create numerical representations of videos.
  • PDF:
    • Add PDFs to Gemini requests to perform tasks that involve understanding the contents of the included PDFs

Contributions

We welcome contributions to this repository! If you have any improvements, new examples, or bug fixes, please feel free to open a pull request.

Made with ❤ by jggomez.

Twitter Badge Linkedin Badge Medium Badge

License

Copyright 2024 Juan Guillermo Gómez

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published