Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question regarding maximal speakers #6

Open
pajowu opened this issue Jan 12, 2022 · 3 comments
Open

Question regarding maximal speakers #6

pajowu opened this issue Jan 12, 2022 · 3 comments

Comments

@pajowu
Copy link

pajowu commented Jan 12, 2022

Hey,

i'm working on a project called audapolis which is based on speech recognition. For this we also created a library for speaker diarization, which is currently based on your pyBK model and code. First of all I want to say a giant thank you to everyone who worked on these papers and this code. It was an incredibly easy entrance into the world of speaker diarization. The papers were really nice to read, even as someone without experience in the field of audio processing, and the code easy to adopt to our use case.

I'm currently debugging some crashes and had a quick question regarding this line: 09c34a5#diff-a39bd4af9276ad78ae8d185cab0b9be57af65df0688e966a844900b572d167a3R517 .

Why does the code calculates np.minimum(maxNrSpeakers, np.max(nrSpeakersPerSolution))-1 instead of np.minimum(maxNrSpeakers, np.max(nrSpeakersPerSolution))-1. As far as I understand, this would mean that it can never return a solution that has maxNrSpeakers speakers and never the solution with the most speakers. Or am I missing something here?

@josepatino
Copy link
Owner

Hello,

thank you for the comments. It is nice to see that you're re-using and curating our code.

Regarding your question I think you meant 'np.minimum(maxNrSpeakers, np.max(nrSpeakersPerSolution))-1 instead of np.maximum(maxNrSpeakers, np.max(nrSpeakersPerSolution))-1, could it be? If so I think I get what you mean. That line has two purposes.

First, it gives you some control over the number of clusters that the algorithm will output, so that it will never exceed a ceiling. This is just so that, if you know a hypothetical maximum number of speakers in your use case, the algorithm won't unnecessarily generate a wrong number of speakers. Say you know for sure there will be a maximum of 4 speakers. If the algorithm would output the best solution to be 5 clusters then this line would cap that to 4, which is (probably, there are some other factors that could go into this) closer to the real clustering solution.

Second, it prevents some nuisance on the spectral clustering method where very small, uninformative, eigenvalues would generate artificially large eigen-gaps, consistently generating a wrong solution. It is close to the first reason I just mentioned but it was found to be particularly important for this clustering method in our system.

I hope that helps. You mention some debugging: if you find bugs please do feel free to do a merge request with your proposed solutions and I will be happy to include them in the code.

Best regards,
Jose

@pajowu
Copy link
Author

pajowu commented Jan 12, 2022

Hey,

thanks for the quick reply. I was wondering about the -1. If I understand it correctly this would actually only produce clusterings smaller than the max number of speakers, right?

For the bug, I just opened a pull request (#7)

Warm greetings,
Karl

@josepatino
Copy link
Owner

Thanks for the clarification. If I remember well that -1 has to do with the clustering algorithm being 1-indexed as per most of the rest of the code which is 0-indexed. I will have to run the code again to check your pull request (thanks!) and will get back to you if I have more comments on it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants