This webui is designed to train models for Beatrice v2 which is compatible with w-okada's realtime voice changing client v2: https://github.com/w-okada/voice-changer
The code to train beatrice models is adapted from: https://huggingface.co/fierce-cats/beatrice-trainer
The latest version of w-okada is 2.0.61-alpha as of writing this readme
- Nvidia GPU (any 3rd to 4th gen should be sufficient)
- The Ultimate Vocal Remover - https://github.com/Anjok07/ultimatevocalremovergui
As with a majority of my packages/repos, official support will be for Windows only. Linux shouldn't have much of an issue, just some pathing changes may be necessary. Pull request are accepted, though, I won't be able to actively maintain any Linux additions.
Will be available for Youtube Channel Members at the Supporter (Package) level: https://www.youtube.com/channel/UCwNdsF7ZXOlrTKhSoGJPnlQ/join
- After downloading the zip file, unzip it.
- Launch the webui with launch_webui.bat
- Python 3.11 - https://www.python.org/downloads/release/python-3119/
- git - https://git-scm.com/downloads
-
Install FFMPEG, overall, just a good tool to have and is needed for the repo.
-
Clone the repository
git clone https://github.com/JarodMica/beatrice_trainer_webui.git
-
Navigate into the repo
cd .\beatrice_trainer_webui\
-
Setup a virtual environment, specifying python 3.11
py -3.11 -m venv venv
-
Activate venv. If you've never run venv before on windows powershell, you will need to change ExecutionPolicy to RemoteSigned: https://learn.microsoft.com/en-us/answers/questions/506985/powershell-execution-setting-is-overridden-by-a-po
.\venv\Scripts\activate
-
Run the requirements.txt
pip install -r .\requirements.txt
-
Uninstall and reinstall torch manually. Other packages will install torch without cuda, to enable cuda, you need the prebuilt wheels.
torch 2.4.0 causes issues with CTranslate2 (causes issue with whisperx) so make sure you do this step
pip uninstall torch pip install torch==2.3.1 torchvision==0.18.1 torchaudio==2.3.1 --index-url https://download.pytorch.org/whl/cu121
-
Initialized submodules
git submodule init git submodule update --remote
-
Install submodules into venv
pip install .\modules\beatrice_trainer\ pip install .\modules\gradio_utils\
-
Grab the assets from the original beatrice HuggingFace repo at this hash here: https://huggingface.co/fierce-cats/beatrice-trainer/tree/be628e89d162d0d1aa038f57f19e1f578b7e6328
The easiest way is to clone the repo, checkout at that specific hash, then copy and paste
assets
into the root folder of the beatrice trainer webuigit clone https://huggingface.co/fierce-cats/beatrice-trainer.git cd beatrice-trainer git checkout be628e89d162d0d1aa038f57f19e1f578b7e6328 cd ..
The folder structure should look like this:
beatrice_trainer_webui\assets
-
Run the webui
python webui.py
-
(Optional) Make a .bat file to automatically run the webui.py each time without having to activate venv each time. How to: https://www.windowscentral.com/how-create-and-run-batch-file-windows-10
call venv\Scripts\activate python webui.py
There are 3 tabs: Create Dataset, Train, and Settings.
This tab is where you create your dataset. Follow the steps below to get a feel for doing this.
- Obtain audio data to train on the speaker
- This can be a podcast, audiobook, youtube video, etc. Basically, anything that has audio (even songs, but not recommended)
- One large audio file is recommended, but several smaller files can be used too.
- Navigate into the WebUI
datasets
folder in the file explorer. Create a new folder in here and name it whatever you want the final model to be named. Open this now empty folder. - Decide how many speakers you want inside of this beatrice model (as beatrice can be multispeaker) and then create a new folder for each speaker you want. Then, place audio files of each speaker into their respective folders.
- For example, let's say from step 2, you want the model to be called
elden_ring
and you have audio files for two speakers,melina
andranni
- The folder structure would look like this:
elden_ring\ranni\<many audio files of ranni> elden_ring\melina\<many audio files of ranni>
- For example, let's say from step 2, you want the model to be called
- Now launch the training webui. In the
Dataset to Process
dropdown, select the freshly created dataset from steps 1-3 (if you don't see it, clickRefresh Datasets Available
)- If you run into any errors here, you may have setup the folder structure incorrectly
- Click
Begin Process
and it will start curating a dataset. The output will be placed in yourtraining
folder- Manual install users will incur an additional download for the whisper model that is used to split the datasaet.
- After some time, you should see something like
Dataset creation completed successfully
in theProgress Console
window. - Congrats, your first dataset has been completed!
I haven't run into any issues at this step, so if you do, please open an issue in the github tab
The Create Dataset step should be completed before this proceeding here. If you don't see anything in the dropdown menu, click Refresh Training Datasets Available
and then choose the dataset to train on.
You could just click Start Training
and use the defaults, but I would adjust some of the settings based on what the webui says.
Dark Mode - Toggle on/off Dark Mode
Toggle Custom Theme - Toggle on/off custom theme
This would not be possible without w-okada and his contributors. Huge thanks to them for creating this powerful open-source tool: https://github.com/w-okada/voice-changer
Everything I've coded it MIT. Check w-okada for any licenses involving his tools (the voice changer client and beatrice)
Audio files used here are directly from Libritts-r: https://www.openslr.org/141/ which retains a license of CC BY 4.0.