diff --git a/.gitattributes b/.gitattributes new file mode 100644 index 0000000..0a11cce --- /dev/null +++ b/.gitattributes @@ -0,0 +1,3 @@ +*.html linguist-detectable=false +*.css linguist-detectable=false +*.js linguist-detectable=false diff --git a/.gitignore b/.gitignore new file mode 100644 index 0000000..87b3921 --- /dev/null +++ b/.gitignore @@ -0,0 +1,30 @@ +# Binaries for programs and plugins +.DS_Store +*.exe +*.exe~ +*.dll +*.o +*.a +*.so +*.dylib + +# Folders +_obj +_test +*.test + +# Output of the go coverage tool, specifically when used with LiteIDE +*.out + +# Dependency directories (remove the comment below to include it) +# vendor/ + +# cgo stuff +*.cgo1.go +*.cgo2.c +_cgo_defun.c +_cgo_gotypes.go +_cgo_export.* +_testmain.go +.idea/ +*.iml diff --git a/CODE_OF_CONDUCT.md b/CODE_OF_CONDUCT.md index 1dc8f5e..303ec62 100644 --- a/CODE_OF_CONDUCT.md +++ b/CODE_OF_CONDUCT.md @@ -17,23 +17,23 @@ diverse, inclusive, and healthy community. Examples of behavior that contributes to a positive environment for our community include: -* Demonstrating empathy and kindness toward other people -* Being respectful of differing opinions, viewpoints, and experiences -* Giving and gracefully accepting constructive feedback -* Accepting responsibility and apologizing to those affected by our mistakes, +- Demonstrating empathy and kindness toward other people +- Being respectful of differing opinions, viewpoints, and experiences +- Giving and gracefully accepting constructive feedback +- Accepting responsibility and apologizing to those affected by our mistakes, and learning from the experience -* Focusing on what is best not just for us as individuals, but for the +- Focusing on what is best not just for us as individuals, but for the overall community Examples of unacceptable behavior include: -* The use of sexualized language or imagery, and sexual attention or +- The use of sexualized language or imagery, and sexual attention or advances of any kind -* Trolling, insulting or derogatory comments, and personal or political attacks -* Public or private harassment -* Publishing others' private information, such as a physical or email +- Trolling, insulting or derogatory comments, and personal or political attacks +- Public or private harassment +- Publishing others' private information, such as a physical or email address, without their explicit permission -* Other conduct which could reasonably be considered inappropriate in a +- Other conduct which could reasonably be considered inappropriate in a professional setting ## Enforcement Responsibilities @@ -60,7 +60,7 @@ representative at an online or offline event. Instances of abusive, harassing, or otherwise unacceptable behavior may be reported to the community leaders responsible for enforcement at -Deepgram Developer Relations . +devrel@deepgram.com. All complaints will be reviewed and investigated promptly and fairly. All community leaders are obligated to respect the privacy and security of the @@ -106,7 +106,7 @@ Violating these terms may lead to a permanent ban. ### 4. Permanent Ban **Community Impact**: Demonstrating a pattern of violation of community -standards, including sustained inappropriate behavior, harassment of an +standards, including sustained inappropriate behavior, harassment of an individual, or aggression toward or disparagement of classes of individuals. **Consequence**: A permanent ban from any sort of public interaction within @@ -116,7 +116,7 @@ the community. This Code of Conduct is adapted from the [Contributor Covenant][homepage], version 2.0, available at -https://www.contributor-covenant.org/version/2/0/code_of_conduct.html. +. Community Impact Guidelines were inspired by [Mozilla's code of conduct enforcement ladder](https://github.com/mozilla/diversity). @@ -124,5 +124,5 @@ enforcement ladder](https://github.com/mozilla/diversity). [homepage]: https://www.contributor-covenant.org For answers to common questions about this code of conduct, see the FAQ at -https://www.contributor-covenant.org/faq. Translations are available at -https://www.contributor-covenant.org/translations. +. Translations are available at +. diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index 4230ea3..f46ceeb 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -1,105 +1,50 @@ - # Contributing Guidelines -:+1::tada: We :heart: contributions from everyone! :tada::+1: - -It is a good idea to reach out with an issue first if you plan to add any new functionality. Otherwise, bug reports, bug fixes and feedback is always appreciated. Check out the [Contributing Guidelines][contributing] for more information and please follow the [GitHub Flow][githubflow]. - -![contributions welcome][contribadge] - -The following is a set of guidelines for contributing to this project, which are hosted on GitHub. These are mostly guidelines, not rules. Use your best judgment, and feel free to propose changes to this document in a pull request. - -Please take the time to review the [Code of Conduct][coc], which all contributors are subject to on this project. - -[I don't want to read this whole thing, I just have a question!!!](#i-dont-want-to-read-this-whole-thing-i-just-have-a-question) - -[TOC] - -## Reporting Bugs - -This section guides you through submitting a bug report. Following these guidelines helps maintainers and the community understand your report :pencil:, reproduce the behavior :computer: :computer:, and find related reports :mag_right:. - -Before creating bug reports [please check for other issues,](#before-submitting-a-bug-report) as you might find out that you don't need to create one. When you are creating a bug report, please [include as many details as possible](#how-do-i-submit-a-good-bug-report). Fill out the issue, and include any information it asks for to help us resolve issues faster. - -> **Note:** If you find a **Closed** issue that seems like it is the same thing that you're experiencing, open a new issue and include a link to the original issue in the body of your new one. - -### Before Submitting A Bug Report - -* **Perform a cursory search** to see if the problem has already been reported. If it has **and the issue is still open**, add a comment to the existing issue instead of opening a new one. - -### How Do I Submit A (Good) Bug Report? - -Bugs are tracked as GitHub issues. Create an issue and provide the following information as required or necessary. - -Explain the problem and include additional details to help maintainers reproduce the problem: - -* **Use a clear and descriptive title** for the issue to identify the problem. -* **Describe the exact steps which reproduce the problem** in as many details as possible. For example, start by explaining how you started. When listing steps, **don't just say what you did, but explain how you did it**. -* **Provide specific examples to demonstrate the steps**. Include links to files or copy/pasteable snippets, which you use in those examples. If you're providing snippets in the issue, use [Markdown code blocks][githubcodeblocks]. -* **Describe the behavior you observed after following the steps** and point out what exactly is the problem with that behavior. -* **Explain which behavior you expected to see instead and why.** -* **Include screenshots and animated GIFs** where possible. Show how you follow the described steps and clearly demonstrate the problem. You can use [this tool][licecap] to record GIFs on macOS and Windows, and [this tool][silentcast] on Linux. -* **If the problem wasn't triggered by a specific action**, describe what you were doing before the problem happened and share more information using the guidelines below. -* **Can you reliably reproduce the issue?** If not, provide details about how often the problem happens and under which conditions it normally happens. -Include details about your configuration and environment: - -## Suggesting Enhancements - -This section guides you through submitting a suggestion, including completely new features and minor improvements to existing functionality. Following these guidelines helps maintainers and the community understand your suggestion :pencil: and find related suggestions :mag_right:. - -Before creating enhancement suggestions, please check [this list](#before-submitting-an-enhancement-suggestion) as you might find out that you don't need to create one. When you are creating an enhancement suggestion, please [include as many details as possible](#how-do-i-submit-a-good-enhancement-suggestion). Fill out [the required template][featurerequest], the information it asks for helps us resolve issues faster. - -### Before Submitting An Enhancement Suggestion - -* **Perform a cursory search** to see if the enhancement has already been suggested. If it has, add a comment to the existing issue instead of opening a new one. - -### How Do I Submit A (Good) Enhancement Suggestion? - -Enhancement suggestions are tracked as GitHub issues. Create an issue and provide the following information. - -* **Use a clear and descriptive title** for the issue to identify the suggestion. -* **Provide a step-by-step description of the suggested enhancement** in as many details as possible. -* **Provide specific examples to demonstrate the steps**. Include copy/pasteable snippets which you use in those examples, as [Markdown code blocks][githubcodeblocks]. -* **Describe the current behavior** and **explain which behavior you expected to see instead** and why. -* **Explain why this enhancement would be useful** to most users. - -## Your First Code Contribution - -Unsure where to begin contributing? You can start by looking through these `beginner` and `help-wanted` issues on any of our projects. While not perfect, number of comments is a reasonable proxy for impact a given change will have. - -## Pull Requests - -Please follow these steps to have your contribution considered by the maintainers: - -1. Follow all instructions in [the template][pullrequest] -2. Adhear the [Code of Conduct][coc] -3. After you submit your pull request, verify that all [status checks][githubstatuschecks] are passing. - -While the prerequisites above must be satisfied prior to having your pull request reviewed, the reviewer(s) may ask you to complete additional design work, tests, or other changes before your pull request can be ultimately accepted. - -# I don't want to read this whole thing I just have a question!!! - -You can join our community for any questions you might have: - -* [Contact our Developer Relations Team][community] -* [Reach out on Twitter][twitter] - * This Twitter is monitored by our Marketing and Developer Relations team, but not 24/7—please be patient! - -Alternatively, you can raise an issue on the project. - -[contribadge]: https://img.shields.io/badge/contributions-welcome-brightgreen.svg?style=flat "Contributions Welcome" - -[coc]: CODE_OF_CONDUCT.md "Code of Conduct" -[contributing]: CONTRIBUTING.md "Contributing" -[license]: LICENSE "MIT License" -[pullrequest]: PULL_REQUEST_TEMPLATE/PULL_REQUEST_TEMPLATE.md "Pull Request template" - -[community]: https://github.com/orgs/deepgram/discussions "Deepgram Community" -[signup]: https://console.deepgram.com/signup "Deepgram Console" -[twitter]: https://twitter.com/DeepgramAI "Deepgram on Twitter" - -[githubcodeblocks]: https://help.github.com/articles/markdown-basics/#multiple-lines "GitHub Markdown Code Blocks" -[githubflow]: https://guides.github.com/introduction/flow/index.html "GitHub Flow" -[githubstatuschecks]: https://help.github.com/articles/about-status-checks/ "GitHub Status Checks" -[licecap]: https://www.cockos.com/licecap/ "LICEcap: animated screen captures" -[silentcast]: https://github.com/colinkeenan/silentcast "Silentcast: silent mkv screencasts and animated gifs" +Want to contribute to this project? We ❤️ it! + +Here are a few types of contributions that we would be interested in hearing about. + +* Bug fixes + * If you find a bug, please first report it using Github Issues. + * Issues that have already been identified as a bug will be labeled `bug`. + * If you'd like to submit a fix for a bug, send a Pull Request from your own fork and mention the Issue number. + * Include a test that isolates the bug and verifies that it was fixed. +* New Features + * If you'd like to accomplish something in the extension that it doesn't already do, describe the problem in a new Github Issue. + * Issues that have been identified as a feature request will be labeled `enhancement`. + * If you'd like to implement the new feature, please wait for feedback from the project maintainers before spending + too much time writing the code. In some cases, `enhancement`s may not align well with the project objectives at + the time. +* Tests, Documentation, Miscellaneous + * If you think the test coverage could be improved, the documentation could be clearer, you've got an alternative + implementation of something that may have more advantages, or any other change we would still be glad hear about + it. + * If its a trivial change, go ahead and send a Pull Request with the changes you have in mind + * If not, open a Github Issue to discuss the idea first. +* Snippets + * To add snippets: + * Add a directory in the `snippets` folder with the name of the language. + * Add one or more files in the language directory with snippets. + * Update the `package.json` to include the snippets you added. + +We also welcome anyone to work on any existing issues with the `good first issue` tag. + +## Requirements + +For a contribution to be accepted: + +* The test suite must be complete and pass +* Code must follow existing styling conventions +* Commit messages must be descriptive. Related issues should be mentioned by number. + +If the contribution doesn't meet these criteria, a maintainer will discuss it with you on the Issue. You can still +continue to add more commits to the branch you have sent the Pull Request from. + +## How To + +1. Fork this repository on GitHub. +1. Clone/fetch your fork to your local development machine. +1. Create a new branch (e.g. `issue-12`, `feat.add_foo`, etc) and check it out. +1. Make your changes and commit them. (Did the tests pass? No linting errors?) +1. Push your new branch to your fork. (e.g. `git push myname issue-12`) +1. Open a Pull Request from your new branch to the original fork's `main` branch. diff --git a/LICENSE b/LICENSE index f5bde48..fb8aa61 100644 --- a/LICENSE +++ b/LICENSE @@ -1,6 +1,6 @@ MIT License -Copyright (c) 2023 Deepgram Templates +Copyright (c) 2024 Deepgram Devs Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal diff --git a/README.md b/README.md index a7b7174..192d47b 100644 --- a/README.md +++ b/README.md @@ -1,24 +1,26 @@ -> Copy the entire contents of https://github.com/deepgram-starters/deepgram-starters-ui to the `./static/` folder. +# Text-to-Speech WebSocket Starter for Go -> The name of the project and repo, is less important than the correct configuration of the `deepgram.toml` file, if you wish for it to be included in future onboarding workflows. +This example app demonstrates how to use the Deepgram Text-to-Speech API over WebSockets with Go. -# [Usecase] [Language] Starter +The flow of this sample is: -> Write an intro for this project +1. A websocket is opened from the UI to the backend Go component +1. Text is sent over a websocket to the backend component +1. If a connection has not been established to Deepgram, create a websocket connection using the Python SDK and send the text to convert to audio +1. An audio byte response with synthesized text-to-speech is returned and forward back through the WebSocket created by the UI +1. Those audio bytes are then played by the media device contained within your browser -Nifty little into, maybe a screenshot. +A preview of the app -## Sign-up to Deepgram - -> Please leave this section unchanged, unless providing a UTM on the URL. +## What is Deepgram? -Before you start, it's essential to generate a Deepgram API key to use in this project. [Sign-up now for Deepgram](https://console.deepgram.com/signup). +[Deepgram](https://deepgram.com/) is a voice AI company providing speech-to-text and language understanding capabilities to make data readable and actionable by human or machines. -## Quickstart +## Sign-up to Deepgram -> Detail the manual steps to get started. +Before you start, it's essential to generate a Deepgram API key to use in this project. [Sign-up now for Deepgram and create an API key](https://console.deepgram.com/signup?jump=keys). -e.g. +## Quickstart ### Manual @@ -26,59 +28,54 @@ Follow these steps to get started with this starter application. #### Clone the repository -Go to GitHub and [clone the repository](https://github.com/deepgram-starters/prerecorded-node-starter). +Go to GitHub and [clone the repository](https://github.com/deepgram-starters/go-live-text-to-speech). #### Install dependencies Install the project dependencies. ```bash -npm install +go mod tidy ``` -#### Edit the config file +#### Set your Deepgram API key -> Config file can be any appropriate file for the framework/language. For e.g. -> Node is using a config.json file, while Python is only use .env files +If using bash, this can be done in your `~/.bash_profile` like so: -Copy the code from `sample.env` and create a new file called `.env`. Paste in the code and enter your API key you generated in the [Deepgram console](https://console.deepgram.com/). - -```json -DEEPGRAM_API_KEY=%api_key% +```bash +export DEEPGRAM_API_KEY="YOUR_DEEPGRAM_API_KEY" ``` -#### Run the application +#### Run the Go Application -> to support the UI, it must always run on port 8080 - -The `dev` script will run a web and API server concurrently. Once running, you can [access the application in your browser](http://localhost:8080/). +If you have set your `DEEPGRAM_API_KEY` environment variable, start the Backend go application using this command: ```bash -npm start +go run main.go ``` -## What is Deepgram? - -Deepgram is an AI speech platform which specializes in (NLU) Natural Language Understanding features and Transcription. It can help get the following from your audio. +If you haven't, this could also be done by a simple export before executing your Go application: -- [Speaker diarization](https://deepgram.com/product/speech-understanding/) -- [Language detection](https://deepgram.com/product/speech-understanding/) -- [Summarization](https://deepgram.com/product/speech-understanding/) -- [Topic detection](https://deepgram.com/product/speech-understanding/) -- [Language translation](https://deepgram.com/product/speech-understanding/) -- [Sentiment analysis](https://deepgram.com/product/speech-understanding/) -- [Entity detection](https://deepgram.com/product/speech-understanding/) -- [Transcription](https://deepgram.com/product/transcription/) -- [Redaction](https://deepgram.com/product/transcription/) +```bash +DEEPGRAM_API_KEY="YOUR_DEEPGRAM_API_KEY" go run main.go +``` -## Create a Free Deepgram Account +#### Open the UI in a Browser -Before you start, it's essential to generate a Deepgram API key to use in our starter applications. [Sign-up now for Deepgram](https://console.deepgram.com/signup). +To open the Frontend UI, just navigate to `http://localhost:3000` in Chrome. ## Issue Reporting If you have found a bug or if you have a feature request, please report them at this repository issues section. Please do not report security vulnerabilities on the public GitHub issue tracker. The [Security Policy](./SECURITY.md) details the procedure for contacting Deepgram. +## Getting Help + +We love to hear from you so if you have questions, comments or find a bug in the project, let us know! You can either: + +- [Open an issue in this repository](https://github.com/deepgram-starters/live-node-starter/issues/new) +- [Join the Deepgram Github Discussions Community](https://github.com/orgs/deepgram/discussions) +- [Join the Deepgram Discord Community](https://discord.gg/xWRaCDBtW4) + ## Author [Deepgram](https://deepgram.com) diff --git a/SECURITY.md b/SECURITY.md deleted file mode 100644 index f898aee..0000000 --- a/SECURITY.md +++ /dev/null @@ -1,5 +0,0 @@ -# Security Policy - -Deepgram's security policy can be found on our main website. - -[Deepgram Security Policy](https://developers.deepgram.com/documentation/security/security-policy/) diff --git a/deepgram.toml b/deepgram.toml index eff3f2d..a1ad32b 100644 --- a/deepgram.toml +++ b/deepgram.toml @@ -1,17 +1,16 @@ [meta] - title = " Starter" # update with usecase and framework - description = "Basic demo for using Deepgram to in " # update with usecase and framework - author = "Deepgram DX Team (https://developers.deepgram.com)" # update for author details - useCase = "Prerecorded" # usecase - language = "Python" # base language - framework = "Flask" # framework if not native + title = "Text-to-speech Go Starter" + description = "This example app demonstrates how to use the Deepgram Text-to-Speech API with Go." + author = "Deepgram DX Team (https://developers.deepgram.com)" + useCase = "TTS" + language = "Go" -[build] # delete if no build/install steps applicable - command = "pip install -r requirements.txt" # automatically install dependencies, delete if not applicable +[build] + command = "go get" [config] - sample = "sample.env" # the example config file - output = ".env" # the file that we will generate using their API + sample = "sample.env" + output = ".env" [post-build] - message = "Run `flask run -p 8080` to get up and running." # message to give users once setup is complete \ No newline at end of file + message = "Run `go run .` to get up and running." diff --git a/go.mod b/go.mod new file mode 100644 index 0000000..bf3c31b --- /dev/null +++ b/go.mod @@ -0,0 +1,20 @@ +module github.com/deepgram-starters/text-to-speech-starter-go + +go 1.19 + +require ( + github.com/deepgram/deepgram-go-sdk v1.6.0-dev.2 + github.com/gorilla/websocket v1.5.3 +) + +require ( + github.com/dvonthenen/websocket v1.5.1-dyv.2 // indirect + github.com/fatih/color v1.15.0 // indirect + github.com/go-logr/logr v1.3.0 // indirect + github.com/gorilla/schema v1.3.0 // indirect + github.com/hokaccha/go-prettyjson v0.0.0-20211117102719-0474bc63780f // indirect + github.com/mattn/go-colorable v0.1.13 // indirect + github.com/mattn/go-isatty v0.0.17 // indirect + golang.org/x/sys v0.6.0 // indirect + k8s.io/klog/v2 v2.110.1 // indirect +) diff --git a/go.sum b/go.sum new file mode 100644 index 0000000..132a98e --- /dev/null +++ b/go.sum @@ -0,0 +1,24 @@ +github.com/deepgram/deepgram-go-sdk v1.6.0-dev.2 h1:ygZgPkdvU/fGLBOI3/eD2F+N48AmWBa3yvKef+oEf5g= +github.com/deepgram/deepgram-go-sdk v1.6.0-dev.2/go.mod h1:il+6HLmvxa47EG12LG6VwzaHcyI8Lo+yfBsOcDq3R8s= +github.com/dvonthenen/websocket v1.5.1-dyv.2 h1:OXlWJJkeHt8k4+MEI0Y8SQjY2ihHYD2z/tI7sZZfsnA= +github.com/dvonthenen/websocket v1.5.1-dyv.2/go.mod h1:q2GbopbpFJvBP4iqVvqwwahVmvu2HnCfdqCWDoQVKMM= +github.com/fatih/color v1.15.0 h1:kOqh6YHBtK8aywxGerMG2Eq3H6Qgoqeo13Bk2Mv/nBs= +github.com/fatih/color v1.15.0/go.mod h1:0h5ZqXfHYED7Bhv2ZJamyIOUej9KtShiJESRwBDUSsw= +github.com/go-logr/logr v1.3.0 h1:2y3SDp0ZXuc6/cjLSZ+Q3ir+QB9T/iG5yYRXqsagWSY= +github.com/go-logr/logr v1.3.0/go.mod h1:9T104GzyrTigFIr8wt5mBrctHMim0Nb2HLGrmQ40KvY= +github.com/gorilla/schema v1.3.0 h1:rbciOzXAx3IB8stEFnfTwO3sYa6EWlQk79XdyustPDA= +github.com/gorilla/schema v1.3.0/go.mod h1:Dg5SSm5PV60mhF2NFaTV1xuYYj8tV8NOPRo4FggUMnM= +github.com/gorilla/websocket v1.5.3 h1:saDtZ6Pbx/0u+bgYQ3q96pZgCzfhKXGPqt7kZ72aNNg= +github.com/gorilla/websocket v1.5.3/go.mod h1:YR8l580nyteQvAITg2hZ9XVh4b55+EU/adAjf1fMHhE= +github.com/hokaccha/go-prettyjson v0.0.0-20211117102719-0474bc63780f h1:7LYC+Yfkj3CTRcShK0KOL/w6iTiKyqqBA9a41Wnggw8= +github.com/hokaccha/go-prettyjson v0.0.0-20211117102719-0474bc63780f/go.mod h1:pFlLw2CfqZiIBOx6BuCeRLCrfxBJipTY0nIOF/VbGcI= +github.com/mattn/go-colorable v0.1.13 h1:fFA4WZxdEF4tXPZVKMLwD8oUnCTTo08duU7wxecdEvA= +github.com/mattn/go-colorable v0.1.13/go.mod h1:7S9/ev0klgBDR4GtXTXX8a3vIGJpMovkB8vQcUbaXHg= +github.com/mattn/go-isatty v0.0.16/go.mod h1:kYGgaQfpe5nmfYZH+SKPsOc2e4SrIfOl2e/yFXSvRLM= +github.com/mattn/go-isatty v0.0.17 h1:BTarxUcIeDqL27Mc+vyvdWYSL28zpIhv3RoTdsLMPng= +github.com/mattn/go-isatty v0.0.17/go.mod h1:kYGgaQfpe5nmfYZH+SKPsOc2e4SrIfOl2e/yFXSvRLM= +golang.org/x/sys v0.0.0-20220811171246-fbc7d0a398ab/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg= +golang.org/x/sys v0.6.0 h1:MVltZSvRTcU2ljQOhs94SXPftV6DCNnZViHeQps87pQ= +golang.org/x/sys v0.6.0/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg= +k8s.io/klog/v2 v2.110.1 h1:U/Af64HJf7FcwMcXyKm2RPM22WZzyR7OSpYj5tg3cL0= +k8s.io/klog/v2 v2.110.1/go.mod h1:YGtd1984u+GgbuZ7e08/yBuAfKLSO0+uR1Fhi6ExXjo= diff --git a/main.go b/main.go new file mode 100644 index 0000000..ea5b904 --- /dev/null +++ b/main.go @@ -0,0 +1,318 @@ +package main + +import ( + "context" + "encoding/json" + "fmt" + "log" + "net/http" + "sync" + "time" + + "github.com/gorilla/websocket" + + msginterfaces "github.com/deepgram/deepgram-go-sdk/pkg/api/speak/v1/websocket/interfaces" + clientinterfaces "github.com/deepgram/deepgram-go-sdk/pkg/client/interfaces" + client "github.com/deepgram/deepgram-go-sdk/pkg/client/speak" +) + +type MyHandler struct { + binaryChan chan *[]byte + openChan chan *msginterfaces.OpenResponse + flushedChan chan *msginterfaces.FlushedResponse + closeChan chan *msginterfaces.CloseResponse + errorChan chan *msginterfaces.ErrorResponse + + wsUI *websocket.Conn +} + +func NewMyHandler(uiWebsocket *websocket.Conn) MyHandler { + handler := MyHandler{ + binaryChan: make(chan *[]byte), + openChan: make(chan *msginterfaces.OpenResponse), + flushedChan: make(chan *msginterfaces.FlushedResponse), + closeChan: make(chan *msginterfaces.CloseResponse), + errorChan: make(chan *msginterfaces.ErrorResponse), + wsUI: uiWebsocket, + } + + go func() { + handler.Run() + }() + + return handler +} + +// GetUnhandled returns the binary event channels +func (dch MyHandler) GetBinary() []*chan *[]byte { + return []*chan *[]byte{&dch.binaryChan} +} + +// GetOpen returns the open channels +func (dch MyHandler) GetOpen() []*chan *msginterfaces.OpenResponse { + return []*chan *msginterfaces.OpenResponse{&dch.openChan} +} + +// GetMetadata returns the metadata channels +func (dch MyHandler) GetMetadata() []*chan *msginterfaces.MetadataResponse { + return []*chan *msginterfaces.MetadataResponse{} +} + +// GetFlushed returns the flush channels +func (dch MyHandler) GetFlush() []*chan *msginterfaces.FlushedResponse { + return []*chan *msginterfaces.FlushedResponse{&dch.flushedChan} +} + +// GetClose returns the close channels +func (dch MyHandler) GetClose() []*chan *msginterfaces.CloseResponse { + return []*chan *msginterfaces.CloseResponse{&dch.closeChan} +} + +// GetWarning returns the warning channels +func (dch MyHandler) GetWarning() []*chan *msginterfaces.WarningResponse { + return []*chan *msginterfaces.WarningResponse{} +} + +// GetError returns the error channels +func (dch MyHandler) GetError() []*chan *msginterfaces.ErrorResponse { + return []*chan *msginterfaces.ErrorResponse{&dch.errorChan} +} + +// GetUnhandled returns the unhandled event channels +func (dch MyHandler) GetUnhandled() []*chan *[]byte { + return []*chan *[]byte{} +} + +// Open is the callback for when the connection opens +// golintci: funlen +func (dch MyHandler) Run() error { + wgReceivers := sync.WaitGroup{} + + // open channel + wgReceivers.Add(1) + go func() { + defer wgReceivers.Done() + + for or := range dch.openChan { + fmt.Printf("------------ [OPEN] Deepgram WebSocket connection opened\n") + + // Send metadata to the UI + openJSON, err := json.Marshal(or) + if err != nil { + log.Println("Failed to marshal open to JSON:", err) + continue + } + + fmt.Printf("Open JSON: %s\n", openJSON) + dch.wsUI.WriteMessage(websocket.TextMessage, openJSON) + } + }() + + // flushed channel + wgReceivers.Add(1) + go func() { + defer wgReceivers.Done() + + for fr := range dch.flushedChan { + fmt.Printf("------------ [FLUSHED] Final Binary\n") + + // Send metadata to the UI + flushedJSON, err := json.Marshal(fr) + if err != nil { + log.Println("Failed to marshal flushed to JSON:", err) + continue + } + + fmt.Printf("Flushed JSON: %s\n", flushedJSON) + dch.wsUI.WriteMessage(websocket.TextMessage, flushedJSON) + } + }() + + // binary channel + wgReceivers.Add(1) + go func() { + defer wgReceivers.Done() + + lastTime := time.Now().Add(-5 * time.Second) + + for br := range dch.binaryChan { + if time.Since(lastTime) > 3*time.Second { + fmt.Printf("------------ [Binary Data] Attach header.\n") + + // Add a wav audio container header to the file if you want to play the audio + // using the AudioContext or media player like VLC, Media Player, or Apple Music + // Without this header in the Chrome browser case, the audio will not play. + header := []byte{ + 0x52, 0x49, 0x46, 0x46, // "RIFF" + 0x00, 0x00, 0x00, 0x00, // Placeholder for file size + 0x57, 0x41, 0x56, 0x45, // "WAVE" + 0x66, 0x6d, 0x74, 0x20, // "fmt " + 0x10, 0x00, 0x00, 0x00, // Chunk size (16) + 0x01, 0x00, // Audio format (1 for PCM) + 0x01, 0x00, // Number of channels (1) + 0x80, 0xbb, 0x00, 0x00, // Sample rate (48000) + 0x00, 0xee, 0x02, 0x00, // Byte rate (48000 * 2) + 0x02, 0x00, // Block align (2) + 0x10, 0x00, // Bits per sample (16) + 0x64, 0x61, 0x74, 0x61, // "data" + 0x00, 0x00, 0x00, 0x00, // Placeholder for data size + } + + dch.wsUI.WriteMessage(websocket.BinaryMessage, header) + lastTime = time.Now() + } + + fmt.Printf("------------ [Binary Data] (len: %d)\n", len(*br)) + dch.wsUI.WriteMessage(websocket.BinaryMessage, *br) + } + }() + + // close channel + wgReceivers.Add(1) + go func() { + defer wgReceivers.Done() + + for cr := range dch.closeChan { + fmt.Printf("------------ [Close] Deepgram WebSocket connection closed\n") + + // Send metadata to the UI + closeJSON, err := json.Marshal(cr) + if err != nil { + log.Println("Failed to marshal close to JSON:", err) + continue + } + + fmt.Printf("Close JSON: %s\n", closeJSON) + dch.wsUI.WriteMessage(websocket.TextMessage, closeJSON) + } + }() + + // error channel + wgReceivers.Add(1) + go func() { + defer wgReceivers.Done() + + for er := range dch.errorChan { + fmt.Printf("------------ [Error] ErrCode: %s\n", er.ErrCode) + fmt.Printf("ErrMsg: %s\n", er.ErrMsg) + fmt.Printf("Description: %s\n", er.Description) + + // Send metadata to the UI + errorJSON, err := json.Marshal(er) + if err != nil { + log.Println("Failed to marshal error to JSON:", err) + continue + } + + fmt.Printf("Error JSON: %s\n", errorJSON) + dch.wsUI.WriteMessage(websocket.TextMessage, errorJSON) + } + }() + + // wait for all receivers to finish + wgReceivers.Wait() + + return nil +} + +var upgrader = websocket.Upgrader{ + CheckOrigin: func(r *http.Request) bool { + return true + }, +} + +var requestData struct { + Text string `json:"text"` +} + +func handleWebSocket(w http.ResponseWriter, r *http.Request) { + // get the model from the query string + model := r.URL.Query().Get("model") + + conn, err := upgrader.Upgrade(w, r, nil) + if err != nil { + log.Println("Failed to upgrade connection to WebSocket:", err) + return + } + defer conn.Close() + + if model == "" { + fmt.Println("No model specified, using default model") + model = "aura-asteria-en" + } + + // context + ctx := context.Background() + + // client options, if needed + cOptions := clientinterfaces.ClientOptions{} + + // Create a new Deepgram WebSocket client + sOptions := clientinterfaces.WSSpeakOptions{ + Model: model, + Encoding: "linear16", + SampleRate: 48000, + } + callback := NewMyHandler(conn) + + wsClient, err := client.NewWSUsingChan(ctx, "", &cOptions, &sOptions, callback) + if err != nil { + log.Fatalf("Failed to create WebSocket client: %v", err) + return + } + + // Wait for the connection to be established + isConnected := wsClient.Connect() + if !isConnected { + log.Fatalf("Failed to connect to Deepgram") + return + } + + for { + // Read message from the WebSocket connection + msgType, message, err := conn.ReadMessage() + if err != nil { + log.Println("Failed to read message from WebSocket:", err) + break + } + + if msgType == websocket.TextMessage { + err = json.Unmarshal(message, &requestData) + if err != nil { + log.Println("Failed to unmarshal JSON:", err) + continue + } + + if requestData.Text == "" { + log.Println("Text is required in the request") + continue + } + + log.Printf("Text: %s\n", requestData.Text) + + err = wsClient.SpeakWithText(requestData.Text) + if err != nil { + log.Println("Failed to send text to Deepgram:", err) + } + + err = wsClient.Flush() + if err != nil { + fmt.Printf("Error flushing: %v\n", err) + return + } + } + } +} + +func main() { + client.Init(client.InitLib{ + LogLevel: client.LogLevelDefault, // LogLevelDefault, LogLevelFull, LogLevelDebug, LogLevelTrace + }) + + fs := http.FileServer(http.Dir("./public")) + http.Handle("/", fs) + http.HandleFunc("/ws", handleWebSocket) + + fmt.Printf("Open the UI at http://localhost:3000\n") + log.Fatal(http.ListenAndServe(":3000", nil)) +} diff --git a/public/assets/dg_favicon.ico b/public/assets/dg_favicon.ico new file mode 100644 index 0000000..bdda536 Binary files /dev/null and b/public/assets/dg_favicon.ico differ diff --git a/public/assets/logo-6ad0fabf.png b/public/assets/logo-6ad0fabf.png new file mode 100644 index 0000000..d470082 Binary files /dev/null and b/public/assets/logo-6ad0fabf.png differ diff --git a/public/assets/preview-starter.png b/public/assets/preview-starter.png new file mode 100644 index 0000000..a98af5f Binary files /dev/null and b/public/assets/preview-starter.png differ diff --git a/public/client.js b/public/client.js new file mode 100644 index 0000000..6cd659d --- /dev/null +++ b/public/client.js @@ -0,0 +1,187 @@ +const PLAY_STATES = { + NO_AUDIO: "no_audio", + LOADING: "loading", + PLAYING: "playing", +}; + +let playState = PLAY_STATES.NO_AUDIO; +let audioPlayer; +const textArea = document.getElementById("text-input"); +const errorMessage = document.querySelector("#error-message"); + +let audioChunks = []; // Array to buffer incoming audio data chunks +let socket; + +// Function to update the play button based on the current state +function updatePlayButton() { + const playButton = document.getElementById("play-button"); + const icon = playButton.querySelector(".button-icon"); + + switch (playState) { + case PLAY_STATES.NO_AUDIO: + icon.className = "button-icon fa-solid fa-play"; + break; + case PLAY_STATES.LOADING: + icon.className = "button-icon fa-solid fa-circle-notch"; + break; + case PLAY_STATES.PLAYING: + icon.className = "button-icon fa-solid fa-stop"; + break; + default: + break; + } +} + +// Function to stop audio +function stopAudio() { + audioPlayer = document.getElementById("audio-player"); + if (audioPlayer) { + playState = PLAY_STATES.PLAYING; + updatePlayButton(); + audioPlayer.pause(); + audioPlayer.currentTime = 0; + audioPlayer = null; + } +} + +// Function to handle the click event on the play button +function playButtonClick() { + switch (playState) { + case PLAY_STATES.NO_AUDIO: + sendData(); + break; + case PLAY_STATES.PLAYING: + stopAudio(); + playState = PLAY_STATES.NO_AUDIO; + updatePlayButton(); + break; + default: + break; + } +} + +// Remove error message when the text area has a value +textArea.addEventListener("input", () => { + errorMessage.innerHTML = ""; +}); + +// Function to send data to backend via WebSocket +function sendData() { + const modelSelect = document.getElementById("models"); + const selectedModel = modelSelect.options[modelSelect.selectedIndex].value; + const textInput = document.getElementById("text-input").value; + if (!textInput) { + errorMessage.innerHTML = "ERROR: Please add text!"; + } else { + playState = PLAY_STATES.LOADING; + updatePlayButton(); + + // we want to simulate holding a connection open like you would for a websocket + // that's the reason why we only initialize once + if (!socket) { + // create a new WebSocket connection + socket = new WebSocket(`ws://localhost:3000/ws?model=${selectedModel}`); + + // disable the model select + modelSelect.disabled = true; + + socket.addEventListener("open", () => { + const data = { + text: textInput, + }; + socket.send(JSON.stringify(data)); + }); + + socket.addEventListener("message", (event) => { + // console.log("Incoming event:", event); + + if (typeof event.data === "string") { + console.log("Incoming text data:", event.data); + + let msg = JSON.parse(event.data); + + if (msg.type === "Open") { + console.log("WebSocket opened 2"); + } else if (msg.type === "Error") { + console.error("WebSocket error:", error); + playState = PLAY_STATES.NO_AUDIO; + updatePlayButton(); + } else if (msg.type === "Close") { + console.log("WebSocket closed"); + playState = PLAY_STATES.NO_AUDIO; + updatePlayButton(); + } else if (msg.type === "Flushed") { + console.log("Flushed received"); + + // All data received, now combine chunks and play audio + const blob = new Blob(audioChunks, { type: "audio/wav" }); + + if (window.MediaSource) { + console.log('MP4 audio is supported'); + const audioContext = new AudioContext(); + + const reader = new FileReader(); + reader.onload = function () { + const arrayBuffer = this.result; + + audioContext.decodeAudioData(arrayBuffer, (buffer) => { + const source = audioContext.createBufferSource(); + source.buffer = buffer; + source.connect(audioContext.destination); + source.start(); + + playState = PLAY_STATES.PLAYING; + updatePlayButton(); + + source.onended = () => { + // Clear the buffer + audioChunks = []; + playState = PLAY_STATES.NO_AUDIO; + updatePlayButton(); + }; + }); + }; + reader.readAsArrayBuffer(blob); + } else { + console.error('MP4 audio is NOT supported'); + } + + // Clear the buffer + audioChunks = []; + } + } + + if (event.data instanceof Blob) { + // Incoming audio blob data + const blob = event.data; + console.log("Incoming blob data:", blob); + + // Push each blob into the array + audioChunks.push(blob); + } + }); + + socket.addEventListener("close", () => { + console.log("Close received"); + playState = PLAY_STATES.NO_AUDIO; + updatePlayButton(); + }); + + socket.addEventListener("error", (error) => { + console.error("WebSocket error:", error); + playState = PLAY_STATES.NO_AUDIO; + updatePlayButton(); + }); + } else { + const data = { + text: textInput, + }; + socket.send(JSON.stringify(data)); + } + } +} + +// Event listener for the click event on the play button +document + .getElementById("play-button") + .addEventListener("click", playButtonClick); diff --git a/public/index.html b/public/index.html new file mode 100644 index 0000000..8f64596 --- /dev/null +++ b/public/index.html @@ -0,0 +1,70 @@ + + + + + Deepgram Test + + + + + +
+
+
+ + + +

Text-to-Speech

+
+
+
+ + + +
+
+ +
+
+
+

Be sure to check out:

+
    +
  • + The + main branch + of this repo to see basic functionality. +
  • +
+
+
+
+
+ + + diff --git a/public/style.css b/public/style.css new file mode 100644 index 0000000..4c18378 --- /dev/null +++ b/public/style.css @@ -0,0 +1,203 @@ +/* Import fonts */ +@import url("https://fonts.googleapis.com/css2?family=Arimo:wght@400;600;700"); +@import url("https://fonts.googleapis.com/css2?family=Inter"); + +/* Global styles */ +html { + background: #101014; + color: #fff; + font-family: Inter, sans-serif; +} + +/* Main content */ +main { + margin-left: auto; + margin-right: auto; + display: flex; + flex-direction: column; + min-height: 100vh; + max-width: 1400px; + gap: 1rem; + padding: 1.5rem; +} + +@media (min-width: 640px) { + main { + padding: 2rem; + } +} + +/* Grid container */ +.grid-container { + display: grid; + gap: 1rem; +} + +@media (min-width: 768px) { + .grid-container { + grid-template-columns: repeat(2, minmax(0, 1fr)); + } +} + +/* Text-to-speech section */ +.tts-section { + display: flex; + flex-direction: column; + gap: 1rem; +} + +/* Button container */ +.button-container { + display: flex; + justify-content: space-between; +} + +/* Button icon */ +.button-icon { + color: #00dda2; + font-size: 1.5rem; +} + +/* Spinner animation */ +.fa-circle-notch { + animation: spin 1s ease-in-out infinite; + -webkit-animation: spin 1s ease-in-out infinite; +} + +@keyframes spin { + to { + -webkit-transform: rotate(360deg); + } +} +@-webkit-keyframes spin { + to { + -webkit-transform: rotate(360deg); + } +} + +/* Button styling */ +button { + background: linear-gradient(#000, #000) padding-box, + linear-gradient(90deg, #201cff -91.5%, #13ef95 80.05%) border-box; + height: 48px; + width: 113px; + border: 1px solid transparent; + border-radius: 4px; + cursor: pointer; +} + +/* Error message */ +#error-message { + color: rgb(255, 74, 93); + font-weight: 800; +} + +/* Dropdown select */ +select { + height: 54px; + border-radius: 4px; + border-width: 1px; + border-color: rgba(44, 44, 51, 1); + background-color: rgba(16, 16, 20, 1); + padding: 1rem; + font-family: Fira Code, monospace; + box-shadow: 0 1px 2px 0 rgba(0, 0, 0, 0.05); + color: white; +} + +/* Label */ +label { + font-size: 14px; +} + +/* Textarea */ +textarea { + color: white; + display: flex; + min-height: 256px; + border-radius: 4px; + border-width: 1px; + border-color: rgba(44, 44, 51, 1); + background-color: rgba(16, 16, 20, 1); + padding: 1rem; + font-family: Fira Code, monospace; + box-shadow: 0 1px 2px 0 rgba(0, 0, 0, 0.05); + resize: none; +} + +/* Deepgram logo */ +.dg-logo { + padding-top: 0.75rem; + width: 10rem; +} + +/* Title */ +.title { + display: flex; + gap: 0.5rem; + align-items: flex-start; + margin-bottom: 4rem; +} + +@media (min-width: 640px) { + .title { + flex-direction: row; + } +} + +h1 { + color: transparent; + background-clip: text; + background-image: linear-gradient(90deg, #201cff -91.5%, #13ef95 120.05%); + font-size: 1.5rem; + margin: 0; + font-weight: 800; + font-family: Arimo; +} + +@media (min-width: 640px) { + h1 { + font-size: 3.5rem; + } +} + +h2 { + font: 700 2rem / 3.75rem Arimo; + letter-spacing: -0.02em; +} + +/* Information section */ +.information-section { + margin-left: 0; +} + +@media (min-width: 768px) { + .information-section { + margin-left: 120px; + color: rgb(237, 237, 242); + } +} + +/* List */ +ul { + font-family: "Arimo"; + display: flex; + padding-left: 0px; + gap: 16px; + flex-direction: column; +} + +li { + font-size: 20px; + line-height: 1.75rem; +} + +li::marker { + color: #13ef95; +} + +/* Links */ +a { + color: #13ef95; + text-decoration: none; +} diff --git a/sample.env b/sample.env deleted file mode 100644 index d68a2a2..0000000 --- a/sample.env +++ /dev/null @@ -1 +0,0 @@ -DEEPGRAM_API_KEY=%api_key% \ No newline at end of file diff --git a/static/.gitkeep b/static/.gitkeep deleted file mode 100644 index e69de29..0000000