Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some investigations in STT Services #11

Open
hailiang-wang opened this issue Jan 17, 2017 · 1 comment
Open

Some investigations in STT Services #11

hailiang-wang opened this issue Jan 17, 2017 · 1 comment

Comments

@hailiang-wang
Copy link
Member

Description

In order to get a stable, fast and stt vendor, I try IBM and Google's solution.
Here are some outputs.

IBM

Service Portal - https://console.ng.bluemix.net
After login, navigate to watson cloud, provision a service, then get the credentials.

#! /bin/bash 
###########################################
# recognize text with voice file
# http://www.ibm.com/smarterplanet/us/en/ibmwatson/developercloud/doc/speech-to-text/tutorial.shtml
###########################################

# constants
baseDir=$(cd `dirname "$0"`;pwd)
. $baseDir/../watson_rc
# testFile=$baseDir/0001.flac
testFile=$baseDir/icomefromchina.wav
# testFileType=flac
testFileType=wav

# functions

# main 
[ -z "${BASH_SOURCE[0]}" -o "${BASH_SOURCE[0]}" = "$0" ] || return

cd $baseDir

curl -u $sttUserName:$sttPassword -X POST \
--header "Content-Type: audio/$testFileType" \
--header "Transfer-Encoding: chunked" \
--data-binary @$testFile \
"https://stream.watsonplatform.net/speech-to-text/api/v1/recognize?continuous=true"

  • watson_rc
export sttUserName=YOUR_USERNAMR
export sttPassword=YOUR_PASSWD

TTS

IBM also provide TTS api, in my personal, it is state of art.
http://www.ibm.com/watson/developercloud/doc/text-to-speech/index.shtml

<voice-transformation type="Custom" glottal_tension="-80%" rate="x-slow">body </voice-transformation>

demo

docs
image

watson = require('watson-developer-cloud'),
var textToSpeech = watson.text_to_speech({
    version: 'v1',
    username: config.watson_tts.username,
    password: config.watson_tts.password,
    url: config.watson_tts.url,
});

function transcript(txt, output) {
    let deferred = Q.defer();
    let params = {
        text: txt,
        voice: 'en-US_AllisonVoice', // Optional voice 
        accept: 'audio/wav'
    };
    // Pipe the synthesized text to a file 
    let tran = textToSpeech.synthesize(params);
    tran.on('error', function (error) {
        logger.error(error);
        deferred.reject();
    });

    tran.on('end', function () {
        logger.debug('done.');
        deferred.resolve();
    })

    tran.pipe(fs.createWriteStream(output));

    return deferred.promise;
}

Google

let googleSpeech = (file, text = null) => {
  let api = `https://speech.googleapis.com/v1beta1/speech:syncrecognize`
  let bitmap = fs.readFileSync(file)
  let speech = new Buffer(bitmap).toString('base64')
  let postData = {
    'config': {
      'encoding': 'FLAC',
      'sampleRate': 24000,
      'languageCode': 'en-US'
    },
    'audio': {
      'content': speech
    }
  }

  if (text) {
    postData.config.speechContext = {
      "phrases": [text]
    }
  }
  return new Promise((resolve, reject) => {
    # Get google token first.
    get_google_access_token()
      .then((access_token) => {

        superagent
          .post(api)
          .proxy(https_proxy)
          .set('Content-Type', 'application/json')
          .set('Authorization', `Bearer ${access_token}`)
          .send(JSON.stringify(postData))
          .end((err, res) => {
            logger.debug('[speech google]', 'end')
            fs.unlink(file)
            if (err) {
              reject(err)
            }
            else {
              let resObj = JSON.parse(res.text)
              if (resObj.results) {
                resolve(resObj.results[0].alternatives[0].transcript)
              }
              else {
                resolve()
              }
            }
          })
      })
  })
}

云知声

文档:https://github.com/oraleval/http_api_doc/blob/master/eval.md

let unisoundASR = (file) => {
  return new Promise((resolve, reject) => {
    superagent
      .post('http://enasr.edu.hivoice.cn:5858/eval/pcm')
      .set('X-EngineType', 'asr.en_US')
      .set('appkey', 'xxx')
      .attach('voice', file)
      .end((err, res) => {
        if (err) {
          logger.debug('[error]', err)
          reject(err)
        }
        resolve(res)
      })
  })
}

@huan
Copy link
Member

huan commented Jan 17, 2017

Thanks buddy!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants