-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question regarding maximal speakers #6
Comments
Hello, thank you for the comments. It is nice to see that you're re-using and curating our code. Regarding your question I think you meant ' First, it gives you some control over the number of clusters that the algorithm will output, so that it will never exceed a ceiling. This is just so that, if you know a hypothetical maximum number of speakers in your use case, the algorithm won't unnecessarily generate a wrong number of speakers. Say you know for sure there will be a maximum of 4 speakers. If the algorithm would output the best solution to be 5 clusters then this line would cap that to 4, which is (probably, there are some other factors that could go into this) closer to the real clustering solution. Second, it prevents some nuisance on the spectral clustering method where very small, uninformative, eigenvalues would generate artificially large eigen-gaps, consistently generating a wrong solution. It is close to the first reason I just mentioned but it was found to be particularly important for this clustering method in our system. I hope that helps. You mention some debugging: if you find bugs please do feel free to do a merge request with your proposed solutions and I will be happy to include them in the code. Best regards, |
Hey, thanks for the quick reply. I was wondering about the For the bug, I just opened a pull request (#7) Warm greetings, |
Thanks for the clarification. If I remember well that -1 has to do with the clustering algorithm being 1-indexed as per most of the rest of the code which is 0-indexed. I will have to run the code again to check your pull request (thanks!) and will get back to you if I have more comments on it. |
Hey,
i'm working on a project called audapolis which is based on speech recognition. For this we also created a library for speaker diarization, which is currently based on your pyBK model and code. First of all I want to say a giant thank you to everyone who worked on these papers and this code. It was an incredibly easy entrance into the world of speaker diarization. The papers were really nice to read, even as someone without experience in the field of audio processing, and the code easy to adopt to our use case.
I'm currently debugging some crashes and had a quick question regarding this line: 09c34a5#diff-a39bd4af9276ad78ae8d185cab0b9be57af65df0688e966a844900b572d167a3R517 .
Why does the code calculates
np.minimum(maxNrSpeakers, np.max(nrSpeakersPerSolution))-1
instead ofnp.minimum(maxNrSpeakers, np.max(nrSpeakersPerSolution))-1
. As far as I understand, this would mean that it can never return a solution that hasmaxNrSpeakers
speakers and never the solution with the most speakers. Or am I missing something here?The text was updated successfully, but these errors were encountered: