Skip to content

Text to Speech Research Notes

jdxiao edited this page Sep 27, 2023 · 2 revisions

Speech to Text General Overview

  • using this page as a basic tutorial and overview for a starting point
  • researched and tested the pyttsx3 text to speech library
  • pyttsx3 library is ideal for our purposes as it is a cross-platform library (ideal for our initial stages of development and testing) and functions offline (also ideal for development and testing, but also for practical use)
  • speaks text instead of saving as audio file

Text to Speech Library Testing

  • takes a string as input, in the context of our project, this is our generated response
  • tested by using various essays and pieces of literature as input to assess the average speed of speech. The findings are below:
Number of Words Time Elapsed (seconds) Words per Second
126 37.49 3.361
97 30.96 3.133
312 95.42 3.270
Overall Average: 3.255
  • the average speaking range is between 150-160 wpm, which is equivalent to 2.5=2.67 wps. Adjustments may need to be made in order to facilitate comprehension, which is supported by the engine.setProperty('rate', newVoiceRate) function
  • unable to pronounce non-English phonetics
  • unable to distinguish between punctuation indications for tone beyond pausing

Text to Speech Library Considerations

  • this library does not differentiate emotion (i.e., 'yes.', 'yes!', and 'yes?' are all spoken the same way). While this is unlikely to cause an issue with RTA mode, this may possibly interfere with the way the user interacts with the project in practice mode
  • also unable to pronounce sounds that don't have phonetic equivalents in English, but testing was not carried out on enough words to consider any widespread implications for commonly used words in English
  • may have to conduct additional testing/research for speaking rate comprehension and flow of conversation for RTA mode