CS598ps_project

In this paper we present a creative approach to reconstruct 3D audio for multiple sources from a single channel input by detecting and tracking visual cues using supervised learning methods. We also discuss a similar approach for improving speaker’s classification from a video stream by employing both facial and speech likelihoods, or simply Multimodal Speaker Recognition on a video stream.

Videos assets are here:

Name		Name	Last commit message	Last commit date
Latest commit History 62 Commits
proposals		proposals
report		report
src		src
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CS598ps_project

About

Releases

Packages

Languages

dsx-ai/SSLVC

Folders and files

Latest commit

History

Repository files navigation

CS598ps_project

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages