🚀 Description: 🚀 This is the amazing Google Gemini Vision Pro 📸, a powerful tool that scans images, generates descriptions using Gemini AI Pro Vision API, and provides speech feedback 🗣️. It also captures images using the webcam 🖥️.
Google Gemini Vision Pro is a versatile application that combines image processing 🖼️, speech recognition 🎤, and text-to-speech capabilities 📢. With this application, you can capture images using your webcam 📷, convert spoken words to text 📝, generate image descriptions 📚, and even have the descriptions spoken back to you 📣.
git clone https://github.com/haseeb-heaven/Gemini-Vision-Pro
cd Gemini-Vision-Pro
pip install -r requirements.txt
streamlit run script.py
- Obtain the Google Palm API key.
- Visit the following URL: Google AI Studio
- Click on the Create API Key button.
- The generated key is your API key. Please make sure to copy it and paste it in the application settings.
- The API key is crucial for the functioning, Please ensure to keep it safe and do not share it with anyone.
The core AI sections of this project include:
- 📷 Webcam detection using WebRTC, OpenCV, and PIL
- 🗣️ Speech-to-text conversion using Google Cloud Speech-to-Text API
- 🎙️ Text-to-speech conversion using Google Cloud Text-to-Speech API
- 📸 Image processing using Gemini AI Pro Vision API
- 📷 Webcam detection with real-time image capture
- 🗣️ Speech-to-text conversion for spoken words
- 🎙️ Text-to-speech for generating spoken descriptions
- 📸 Image processing using AI to provide detailed descriptions
- 📝 Logging using Python's logging module
- ⚙️ Error handling with Python's exception handling
This project relies on various Python packages, including:
- Streamlit - A web app framework used to build the application
- Streamlit Webrtc - Used for capturing images from the webcam
- OpenCV - Utilized for webcam image capture
- PIL (Pillow) - Used for image processing and conversion
- gTTS (Google Text-to-Speech) - Converts text to speech
- SpeechRecognition - Converts speech to text
- google.cloud.speech - Part of Google Cloud services for speech-to-text conversion
Follow these links for Google Gemini Vision Pro related content:
- Version: 1.0 : Initial Release
We welcome contributions! Please follow our Contribution Guidelines to get started.
This project is licensed under the MIT License - see the LICENSE file for details.
- HeavenHM
- Date: 17-12-2023