Skip to content

Latest commit

 

History

History
167 lines (128 loc) · 4.01 KB

File metadata and controls

167 lines (128 loc) · 4.01 KB

VietGPT-VoiceBot-A-Vietnamese-Speech-Recognition-Chatbot

VietGPT VoiceBot: Chatbot automatically recognizes Vietnamese voice and uses the ChatGPT API for natural language interaction.

Installation

  1. PyAudio:
  • Use pip to install PyAudio:
    pip install pyaudio
  1. Requests:
  • Use pip to install Requests:
    pip install requests
  1. gTTS (Google Text-to-Speech):
  • Use pip to install gTTS:
    pip install gtts
  1. Keyboard:
  • Use pip to install Keyboard:
    pip install keyboard
  1. Pygame:
  • Use pip to install pygame
    pip install pygame
  1. Transformers:
  • Use pip to install Transformers
    pip install transformers

Record audio

I use python's pyaudio library to record voice and write it to an input .wav file

  • record_audio

    def record_audio(duration=5):
    CHUNK = 1024
    FORMAT = pyaudio.paInt16
    CHANNELS = 1
    RATE = 16000
    
    p = pyaudio.PyAudio()
    
    stream = p.open(format=FORMAT,
                    channels=CHANNELS,
                    rate=RATE,
                    input=True,
                    frames_per_buffer=CHUNK)
    
    print("Hold Space bar to start recording...\n")
    
    frames = []
    is_recording = False
    while True:
        if keyboard.is_pressed('space'):
            if not is_recording:
                mixer.quit()
                is_recording = True
        else:
            if is_recording:
                time.sleep(2)
                break
    
        if is_recording:
            data = stream.read(CHUNK)
            frames.append(data)
    
    stream.stop_stream()
    stream.close()
    p.terminate()
    
    wf = wave.open(filename, 'wb')
    wf.setnchannels(CHANNELS)
    wf.setsampwidth(p.get_sample_size(FORMAT))
    wf.setframerate(RATE)
    wf.writeframes(b''.join(frames))
    wf.close()

Speech auto Recognition

I use Phowhisper's transformers library developed by VinAI Company to listen and understand Vietnamese or English language, from which I can output the words you say.

  • Transcrible language

      transcriber = pipeline("automatic-speech-recognition", model="vinai/PhoWhisper-small")
      output = transcriber(filename)['text']

Connect openAI

I use the API provided by openAI to chat, interact and ask questions with chatGPT.

  • Connect with openAI
      def GetResultFromOpenAI(text):
      openai_api_key = "openai_api_key"
      url = "https://api.openai.com/v1/chat/completions"
          
      headers = {
          "Content-Type": "application/json",
          "Authorization": f"Bearer {openai_api_key}"
      }
    
      data = {
          "model": "gpt-3.5-turbo",
          "messages": json_data_list,
          "temperature": 1.1
      }
      response = requests.post(url, headers=headers, json=data)
    
      if response.status_code == 200:
          result = response.json()['choices'][0]['message']['content']
          return result
      else:
          print("Error:", response.status_code, response.text)

Text to speech with Google

I use the gTTS library made by Google to convert text to audio and play it to listener

  • Text to speech
    def text_to_speech_vietnamese(text):
    mixer.quit()
    if os.path.exists(output_file):
        os.remove(output_file)
        
    tts = gTTS(text=text, lang='vi')
    tts.save(output_file)
    
    mixer.init()
    mixer.music.load(output_file)
    mixer.music.play()
    

Authors

Donations

Support the ongoing development and improvement of SeleniumSupport: