Skip to content
This repository has been archived by the owner on Apr 13, 2022. It is now read-only.

Docker image awsmediatools/livetranscrib - Update for VocabularyFilterName and VocabularyFilterMethod #46

Open
AustinSnow opened this issue Jun 14, 2021 · 2 comments

Comments

@AustinSnow
Copy link

Hello AWS Labs,
The Transcribing word filtering isn't working for this repository. It seems the awsmediatools/livetranscrib docker image doesn't have the latest version of transcribe-to-dynamo-withSDK.js script. The ECS log output of this line logs the following:

...
LanguageCode: 'en-US',
MediaEncoding: 'pcm',
MediaSampleRateHertz: 16000,
RequestId: '307db0c6-c708-4988-8202-fbf1239f4ba3',
SessionId: 'da63e9ee-d547-49bd-b0cf-838fad481faa',
TranscriptResultStream: { [Symbol(Symbol.asyncIterator)]: [Function] },
VocabularyName: undefined
...

It should be like the following:
...
LanguageCode: 'en-US',
MediaEncoding: 'pcm',
MediaSampleRateHertz: 16000,
NumberOfChannels: undefined,
RequestId: 'c31323a8-c00a-455d-a217-0ae202bde502',
SessionId: 'b71d30db-d8d9-481d-a244-19eb2c6bd11b',
ShowSpeakerLabel: false,
TranscriptResultStream:
{ [Symbol(Symbol.asyncIterator)]: [AsyncGeneratorFunction: [Symbol.asyncIterator]] },
VocabularyFilterMethod: 'mask',
VocabularyFilterName: 'filter-words-en-US',

VocabularyName: undefined }
...

Can the image be updated?

Thank You
Austin Snow

@AustinSnow
Copy link
Author

Hello AWS Labs,
I was able to finally get the word filtering working by creating a updated Docker image, replacing yours (awsmediatools/livetranscribe:v1.1). You may "just" need to update your Docker image. You can find mine on Docker Hub named austinsnow/svuedecstranscribe:latest.

There is also an issue with the transcribe-to-dynamo-withSDK.js where is doesn't run with the latest Node release. The following is the error:

/transcriber/node_modules/@aws-sdk/eventstream-handler-node/dist/cjs/EventStreamPayloadHandler.js:66
throw err;
^
[Error: EAGAIN: resource temporarily unavailable, read] {
errno: -11,
code: 'EAGAIN',
syscall: 'read'
}

And the Load Balancer health check port is missing from your Dockfile. Below is my Dockfile.

# Copyright 2019 Amazon.com, Inc. or its affiliates. All Rights Reserved.
# SPDX-License-Identifier: Apache-2.0

# start with version of the node.js with alpine os docker image
# The transcribe-to-dynamo-withSDK.js script doesn't work with Node v12.22.1 or v14.17.1
FROM node:12.14.1-alpine

# create the application directory
RUN mkdir /transcriber
WORKDIR /transcriber

# Install Build Dependencies for the docker image. 
RUN apk add --no-cache --virtual .gyp \
        python3 \
        make \
        g++ \
        ffmpeg

# install application dependencies
RUN npm install aws-sdk aws-signature-v4 query-string sleep websocket bcrypt @aws-sdk/client-transcribe-streaming@gamma @aws-sdk/eventstream-marshaller @aws-sdk/util-utf8-node 

# copy the application files
COPY transcribe-to-dynamo-withSDK.js healthcheck.py run.sh ./

RUN ["chmod", "+x", "run.sh"]

# Expose the port for UDP
EXPOSE 7950/udp

# Expose the health check
EXPOSE 8080/tcp

# Run this inside the docker container
# CMD ./ffmpeg -re -i video.mp4 -f mpegts udp://localhost:7950

# run it when the container starts -- requires environment vars
CMD sh run.sh

Thank You
Austin Snow

@eggoynes
Copy link
Contributor

Hello Austin,
Thank you for finding this. We have this in our backlog.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants