Text to Speech Research Notes

Speech to Text General Overview

using this page as a basic tutorial and overview for a starting point
researched and tested the pyttsx3 text to speech library
pyttsx3 library is ideal for our purposes as it is a cross-platform library (ideal for our initial stages of development and testing) and functions offline (also ideal for development and testing, but also for practical use)
speaks text instead of saving as audio file

takes a string as input, in the context of our project, this is our generated response
tested by using various essays and pieces of literature as input to assess the average speed of speech. The findings are below:

the average speaking range is between 150-160 wpm, which is equivalent to 2.5=2.67 wps. Adjustments may need to be made in order to facilitate comprehension, which is supported by the engine.setProperty('rate', newVoiceRate) function
unable to pronounce non-English phonetics
unable to distinguish between punctuation indications for tone beyond pausing

this library does not differentiate emotion (i.e., 'yes.', 'yes!', and 'yes?' are all spoken the same way). While this is unlikely to cause an issue with RTA mode, this may possibly interfere with the way the user interacts with the project in practice mode
also unable to pronounce sounds that don't have phonetic equivalents in English, but testing was not carried out on enough words to consider any widespread implications for commonly used words in English
may have to conduct additional testing/research for speaking rate comprehension and flow of conversation for RTA mode

Test Notes

Bi-Weekly Core Meetings

Other Meeting Notes