-
Notifications
You must be signed in to change notification settings - Fork 0
Text to Speech Research Notes
jdxiao edited this page Sep 27, 2023
·
2 revisions
- using this page as a basic tutorial and overview for a starting point
- researched and tested the
pyttsx3
text to speech library -
pyttsx3
library is ideal for our purposes as it is a cross-platform library (ideal for our initial stages of development and testing) and functions offline (also ideal for development and testing, but also for practical use) - speaks text instead of saving as audio file
- takes a string as input, in the context of our project, this is our generated response
- tested by using various essays and pieces of literature as input to assess the average speed of speech. The findings are below:
Number of Words | Time Elapsed (seconds) | Words per Second |
---|---|---|
126 | 37.49 | 3.361 |
97 | 30.96 | 3.133 |
312 | 95.42 | 3.270 |
Overall Average: 3.255 |
- the average speaking range is between 150-160 wpm, which is equivalent to 2.5=2.67 wps. Adjustments may need to be made in order to facilitate comprehension, which is supported by the
engine.setProperty('rate', newVoiceRate)
function - unable to pronounce non-English phonetics
- unable to distinguish between punctuation indications for tone beyond pausing
- this library does not differentiate emotion (i.e., 'yes.', 'yes!', and 'yes?' are all spoken the same way). While this is unlikely to cause an issue with RTA mode, this may possibly interfere with the way the user interacts with the project in practice mode
- also unable to pronounce sounds that don't have phonetic equivalents in English, but testing was not carried out on enough words to consider any widespread implications for commonly used words in English
- may have to conduct additional testing/research for speaking rate comprehension and flow of conversation for RTA mode