Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Real-time Extraction of Screen Frames and Speech from Jitsi WebRTC Call #392

Open
4 tasks
tomsmith8 opened this issue May 17, 2024 · 5 comments
Open
4 tasks
Assignees

Comments

@tomsmith8
Copy link

tomsmith8 commented May 17, 2024

Description

Provide us with a how to solution to extract periodic screen frames and audio for speech recognition in real-time from a Jitsi WebRTC call. The extracted frames and audio would be processed for further analysis via another API.

Objectives

  • Explain how to capture video frames from the WebRTC stream in real-time.
  • Explain how to capture audio from the WebRTC stream
  • Explain the implementation and provide architecture on how to achieve these tasks.

Suggested Tasks to be reviewed for anything missing:

Access Media Streams

  • Access the video, audio and screen recording streams on the Jitsi Call
  • Ensure media streams are correctly identified and accessible.

Capture Video Frames

  • Extract image data from the canvas and prepare it for processing.
  • Implement a function to continuously capture frames at regular intervals? Chunks? for real-time processing while the recording/call is still happening

Capture and Process Audio

  • Implement a script processor to capture audio data in real-time?
  • Ensure the audio data is correctly buffered and ready for speech recognition.

Provide a detailed explanation of the implementation process above on whether its the correct approach. Please provide alternative or additional notes on how to process audio, video and screen recording in real-time

Acceptance Criteria

  • The solution should successfully capture and process video frames in real-time.
  • The audio stream should be captured and be ready to process in real-time
  • Detailed documentation and explanation of the implementation should be provided.
  • Prototype MVP program a nice to have - Bounty will be boosted
@JZ1999
Copy link

JZ1999 commented May 17, 2024

Hi @tomsmith8 I would like to help out with this!

@gotohigher
Copy link

gotohigher commented May 20, 2024

  • Accessing Media Streams

The first step is to access media streams from the Jitsi Call. This would involve tapping into the WebRTC API to access video, audio, and screen recording streams. We'll ensure each is identified correctly and is accessible.

  • Capturing Video Frames

Capturing video frames can be accomplished via a canvas context. We'd draw the current video frame onto an HTML canvas object, then use the getImageData method periodically to extract frames for real-time processing. A buffer procedure would be in place to handle all this smoothly without interfering with the active call.

  • Capturing and Processing Audio

We'll leverage the Web Audio API to capture audio. We can use the ScriptProcessorNode (or AudioWorklet for more modern contexts) to process audio samples in real-time. These audio packets can then be stored in a buffer ready for speech recognition.

  • As for a detailed documentation and MVP program,
    • I'd go about producing a comprehensive document that explains each step of the implementation process. However, it's crucial to understand that while I offer insight on best practices and potential challenges to look out for, the actual implementation would depend on specific requirements and constraints.
    • Regarding a prototype MVP program, I can certainly provide a basic structure and planning for how we would develop an MVP. We would strictly define the minimal features to capture and process the audio-video details, ultimately providing a lean program that serves as a solid base for future enhancements.

@tomsmith8
Copy link
Author

@JZ1999 Any update on providing a documented solution with Jitsi/Jibri/webRCT for real-time streaming?

@hkarani
Copy link

hkarani commented May 31, 2024

@tomsmith8 can I work on this? My sphinx username is asterisk32 https://community.sphinx.chat/p/cmv6tnqtu2rk819pr5mg/assigned

@tomsmith8
Copy link
Author

@hkarani sure - we're looking for a provided solution for the bounty. Once we have a provided solution we're happy with we'll look to break the solution out into further bounties (implementation)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants