Skip to content
Mark Niemann-Ross edited this page Jan 26, 2017 · 8 revisions

Architecture of Text-to-speech

Text-to-speech on the fish involves several steps:

  1. Obtain a string to speak
  2. Convert the string to an audio file
  3. Play the audio file

These steps should happen asynchronously. That is:

  1. Any program, at any time, should be able to request speech
  2. Converting text to an audio file shouldn't block anything else
  3. Playing back the sound should be FIFO. The second sound in the queue shouldn't play until the first is finished.
  4. There should be a way to dump the queue or set a priority. (to come if necessary)

To implement this, I'm using SQLite. A record looks like:

  • UID - unique id
  • creation date of record - second sort
  • priority - first sort
  • string to be spoken
  • BLOB - wav file to be spoken
  • Result or Error code

The process is broken into three code chunks:

bmbb_fish.fishSays()

a common function to queue speech requests to SQLite. It has two arguments: priority and the string to speak.

do_TTS.py &

  1. sweeps SQLite for phrases that need to be converted to audio, handles the conversion, then inserts the BLOB into the record.
  2. If there is a problem with the conversion service, the return code is stashed in the record.
  3. do_TTS constantly loops through the records, selecting a list of strings with no blobs.
  4. It is throttled to 20 requests per minute.
  5. Bing requires an authentication token which expires every ten minutes. do_TTS checks to ensure it has a valid token every six minutes. If not, it gets a new one.
  6. voice gender is currently chosen by one of the front panel switches
  7. Phrases are broken into multiple sentences
  8. Questions are provided with <prosody pitch="high">

speakNextPhrase.py &

  1. pulls the next phrase from SQLite and plays it.
  2. Triggers movement of head and tail
  3. When that phrase is finished, delete it from SQLite and get the next phrase.

notes on microsoft bing Text-to-speech

Microsoft Bing Text-to-speech

subscription keys here

github sample code

attempts with espeak

sudo apt-get install espeak

If you are using python 2

wget https://pypi.python.org/packages/source/p/pyttsx/pyttsx-1.1.tar.gz
gunzip pyttsx-1.1.tar.gz
tar -xf pyttsx-1.1.tar
cd pyttsx-1.1/
sudo python setup.py install

If you are using python 3

sudo apt-get install python3-pip   (this may not be necessary python)
sudo pip install pyttsx

or

wget https://bootstrap.pypa.io/ez_setup.py -O - | sudo python3
wget https://pypi.python.org/packages/source/p/pyttsx/pyttsx-1.1.tar.gz
gunzip pyttsx-1.1.tar.gz
tar -xf pyttsx-1.1.tar
cd pyttsx-1.1/
sudo python3 setup.py install