VoiceProtect allows users to gauge whether audio files or live audio streams have been generated with AI. As deepfake audio scams are becoming more prevalent, reliable deepfake audio detectors will become increasingly valuable. This app leverages the Tortoise-TTS library, specifically the AudioMiniEncoderWithClassifierHead() function along with a classification model available publicly at Tortoise-TTS on HuggingFace (saved as classifier.pth in root of this repository).
The original intent of this app is toward a live time deployment with iOS/android call data, which is not accessible via public API's. The pyaudio record audio input acts a prototype for the feature of live scam call detection with call data.
Table of Contents
Below, the set-up process is listed to host VoiceProtect on your local machine. Be careful to install both the library requirements and the system requirements.
To run this project, you must download the latest version of the pip installer. Additionally, download the system requirements listed below.
-
Download ffmpeg: https://ffmpeg.org/download.html (used by pydub)
-
portaudio19-dev: macOS see below, windows should install implicitly with pyaudio
-
pip install --upgrade pip
-
MACOS ONLY:
brew install portaudio
-
Clone the repo
git clone https://github.com/Shivamkak19/Deepfake-Detector.git
-
Switch to tortoise_tts folder
cd tortoise_tts
-
Install dependencies
pip install -r requirements.txt
-
Deploy Streamlit app on local server
streamlit run voiceProtect_app.py
Use the local VoiceProtect deployment to analyze the likelihood that an input audio file or live audio recording contains audio created with generative AI. To receive results, wait until the streamlit app has finished processing function calls (indicated in the product pictures). The accuracy of this identification system is based on preset tortoise-tts models and functions, as described in main description above.
Make sure to launch the file ./tortoise_tts/voiceProtect_app.py. The main app must be launched within the tortoise_tts folder, as tortoise_tts must be launched in the main thread to resolve signal issues with the atlastk library (see issues.txt).
Uploaded file audio classification:
Live audio stream classification:
** Additionally, the live streamlit deployment of VoiceProtect is currently facing issues with detecting an input device for audio recording with pyaudio. Check back here for updates. **
-
Clone Tortoise-tts library locally
-
Collect .mp3 file path
- Load File, convert to tensor object with torchaudio
-
Utilize functions in classifier.py to classify deepfake audio input
-
Set up Streamlit GUI
-
Utilize pyaudio to accept live audio stream via default device microphone
- Convert files to/from wav as needed with pydub
-
Generate waveform plot from input audio file or recorded audio
- Convert files to/from wav as needed with pydub
- Configure ffmpeg on system, add to system path
-
Export requirements.txt
-
Troubleshoot various threading issues:
- MatPlotLib backend GUI issues
- Tortoise-tts must be called from main thread
-
TODO Issues - Streamlit Deploy:
- pyaudio senses no default mic in hosted streamlit environment
- Utilize streamlit-webrtc library
- streamlit deploy does not have access to system file uploader
- pyaudio senses no default mic in hosted streamlit environment
See the open issues for a full list of proposed features (and known issues).
If you have a suggestion that would make this better, please fork the repo and create a pull request. You can also simply open an issue with the tag "enhancement". Don't forget to give the project a star! Thanks again!
- Fork the Project
- Create your Feature Branch (
git checkout -b feature/newFeature
) - Commit your Changes (
git commit -m 'Add some new feature to Deepfake-Detector'
) - Push to the Branch (
git push origin feature/newFeature
) - Open a Pull Request
Distributed under the MIT License. See LICENSE
for more information.
- Shivam Kak: [email protected]
- Project Link: https://github.com/Shivamkak19/Deepfake-Detector
- AI Anytime, for tutorials on the Tortoise-TTS library, useful function calls, and integration with other relevant libraries (torchaudio, librosa, etc).
- AI Anytime Youtube Channel