docs: update readme

gpustack · Nov 27, 2024 · bccbab1 · bccbab1
1 parent 50149fb
commit bccbab1
Showing 1 changed file with 95 additions and 12 deletions.
diff --git a/README.md b/README.md
@@ -26,20 +26,103 @@ vox-box start --model --huggingface-repo-id Systran/faster-whisper-small --data-
 - --model-scope-model-id: Model scope model id for the model.
 - --data-dir: Directory to store downloaded model data. Default is OS specific.
 
-## Supported Backends
+## Supported Models
 
-The project supports the following backends:
+| Model                           | Type           | Link                                                                                                                                                                                        |
+| ------------------------------- | -------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| Faster-whisper-large-v3         | speech-to-text | [Hugging Face](https://huggingface.co/Systran/faster-whisper-large-v3), [ModelScope](https://www.modelscope.cn/models/iic/Whisper-large-v3)                                                 |
+| Faster-whisper-large-v2         | speech-to-text | [Hugging Face](https://huggingface.co/Systran/faster-whisper-large-v2)                                                                                                                      |
+| Faster-whisper-large-v1         | speech-to-text | [Hugging Face](https://huggingface.co/Systran/faster-whisper-large-v1)                                                                                                                      |
+| Whisper-large-v3-turbo          | speech-to-text | [ModelScope](https://www.modelscope.cn/models/iic/Whisper-large-v3-turbo)                                                                                                                   |
+| Faster-whisper-medium           | speech-to-text | [Hugging Face](https://huggingface.co/Systran/faster-whisper-medium)                                                                                                                        |
+| Faster-whisper-medium.en        | speech-to-text | [Hugging Face](https://huggingface.co/Systran/faster-whisper-medium.en)                                                                                                                     |
+| Faster-whisper-small            | speech-to-text | [Hugging Face](https://huggingface.co/Systran/faster-whisper-small)                                                                                                                         |
+| Faster-whisper-small.en         | speech-to-text | [Hugging Face](https://huggingface.co/Systran/faster-whisper-small.en)                                                                                                                      |
+| Faster-distil-whisper-large-v3  | speech-to-text | [Hugging Face](https://huggingface.co/Systran/faster-distil-whisper-large-v3)                                                                                                               |
+| Faster-distil-whisper-large-v2  | speech-to-text | [Hugging Face](https://huggingface.co/Systran/faster-distil-whisper-large-v2)                                                                                                               |
+| Faster-distil-whisper-medium.en | speech-to-text | [Hugging Face](https://huggingface.co/Systran/faster-distil-whisper-medium.en)                                                                                                              |
+| Faster-whisper-tiny             | speech-to-text | [Hugging Face](https://huggingface.co/Systran/faster-whisper-tiny)                                                                                                                          |
+| Faster-whisper-tiny.en          | speech-to-text | [Hugging Face](https://huggingface.co/Systran/faster-whisper-tiny.en)                                                                                                                       |
+| Paraformer-zh                   | speech-to-text | [Hugging Face](https://huggingface.co/funasr/paraformer-zh), [ModelScope](https://www.modelscope.cn/models/iic/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-pytorch) |
+| Paraformer-zh-streaming         | speech-to-text | [Hugging Face](https://huggingface.co/funasr/paraformer-zh-streaming), [ModelScope](https://modelscope.cn/models/iic/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-online)     |
+| Paraformer-en                   | speech-to-text | [Hugging Face](https://huggingface.co/funasr/paraformer-en), [ModelScope](https://www.modelscope.cn/models/iic/speech_paraformer-large-vad-punc_asr_nat-en-16k-common-vocab10020)           |
+| Conformer-en                    | speech-to-text | [Hugging Face](https://huggingface.co/funasr/conformer-en), [Modelscope](https://modelscope.cn/models/iic/speech_conformer_asr-en-16k-vocab4199-pytorch)                                    |
+| Qwen-Audio                      | speech-to-text | [Hugging Face](https://huggingface.co/Qwen/Qwen-Audio)                                                                                                                                      |
+| Qwen-Audio-Chat                 | speech-to-text | [Hugging Face](https://huggingface.co/Qwen/Qwen-Audio-Chat)                                                                                                                                 |
+| SenseVoiceSmall                 | speech-to-text | [Hugging Face](https://huggingface.co/FunAudioLLM/SenseVoiceSmall), [ModelScope](https://www.modelscope.cn/models/iic/SenseVoiceSmall)                                                      |
+| Bark                            | text-to-speech | [Hugging Face](https://huggingface.co/suno/bark)                                                                                                                                            |
+| Bark-small                      | text-to-speech | [Hugging Face](https://huggingface.co/suno/bark-small)                                                                                                                                      |
+| CosyVoice-300M-Instruct         | text-to-speech | [Hugging Face](https://huggingface.co/FunAudioLLM/CosyVoice-300M-Instruct), [ModelScope](https://modelscope.cn/models/iic/CosyVoice-300M-Instruct)                                          |
+| CosyVoice-300M-SFT              | text-to-speech | [Hugging Face](https://huggingface.co/FunAudioLLM/CosyVoice-300M-SFT), [ModelScope](https://modelscope.cn/models/iic/CosyVoice-300M-SFT)                                                    |
+| CosyVoice-300M                  | text-to-speech | [Hugging Face](https://huggingface.co/FunAudioLLM/CosyVoice-300M), [ModelScope](https://modelscope.cn/models/iic/CosyVoice-300M)                                                            |
+| CosyVoice-300M-25Hz             | text-to-speech | [ModelScope](https://modelscope.cn/models/iic/CosyVoice-300M-25Hz)                                                                                                                          |
 
-- FunASR
-- Faster-Whisper
-- Bark
-- CosyVoice
+## Supported APIs
 
-All models supported by these backends can be deployed with this project.
+### Create speech 
 
-### Supported Models
+**Endpoint**: `POST /v1/audio/speech`
 
-- [FunASR](https://github.com/modelscope/FunASR?tab=readme-ov-file#model-zoo)
-- [Faster-Whisper](https://huggingface.co/Systran)
-- [Bark](https://huggingface.co/suno)
-- [CosyVoice](https://modelscope.cn/collections/CosyVoice-1a4baea39a135)
+Generates audio from the input text. Compatible with the [OpenAI audio/speech API](https://platform.openai.com/docs/api-reference/audio/createSpeech).
+
+**Example Request**:
+```bash
+curl http://localhost/v1/audio/speech \
+  -H "Authorization: Bearer $OPENAI_API_KEY" \
+  -H "Content-Type: application/json" \
+  -d '{
+    "model": "cosyvoice",
+    "input": "Hello world",
+    "voice": "English Female"
+  }' \
+  --output speech.mp3
+```
+
+**Response**:
+The audio file content.
+
+### Create transcription 
+
+**Endpoint**: `POST /v1/audio/transcriptions`
+
+Transcribes audio into the input language. Compatible with the [OpenAI audio/transcription API](https://platform.openai.com/docs/api-reference/audio/createTranscription).
+
+**Example Request**:
+```bash
+curl https://localhost/v1/audio/transcriptions \
+  -H "Authorization: Bearer $OPENAI_API_KEY" \
+  -H "Content-Type: multipart/form-data" \
+  -F file="@/path/to/file/audio.mp3" \
+  -F model="whisper-large-v3"
+```
+
+**Response**:
+```json
+{
+  "text": "Hello world."
+}
+```
+
+### List Models
+
+**Endpoint**: `GET /v1/models`
+
+Returns the current running models.
+
+### Get Model
+
+**Endpoint**: `GET /v1/models/{model_id}`
+
+Returns the current running model.
+
+### Get Voices
+
+**Endpoint**: `GET /v1/voices`
+
+Returns the supported voice for current running model.
+
+### Health Check
+
+**Endpoint**: `GET /health`
+
+Returns the heath check result of the Vox Box.