Question regarding maximal speakers #6

pajowu · 2022-01-12T02:27:28Z

Hey,

i'm working on a project called audapolis which is based on speech recognition. For this we also created a library for speaker diarization, which is currently based on your pyBK model and code. First of all I want to say a giant thank you to everyone who worked on these papers and this code. It was an incredibly easy entrance into the world of speaker diarization. The papers were really nice to read, even as someone without experience in the field of audio processing, and the code easy to adopt to our use case.

I'm currently debugging some crashes and had a quick question regarding this line: 09c34a5#diff-a39bd4af9276ad78ae8d185cab0b9be57af65df0688e966a844900b572d167a3R517 .

Why does the code calculates np.minimum(maxNrSpeakers, np.max(nrSpeakersPerSolution))-1 instead of np.minimum(maxNrSpeakers, np.max(nrSpeakersPerSolution))-1. As far as I understand, this would mean that it can never return a solution that has maxNrSpeakers speakers and never the solution with the most speakers. Or am I missing something here?

The text was updated successfully, but these errors were encountered:

josepatino · 2022-01-12T18:41:01Z

Hello,

thank you for the comments. It is nice to see that you're re-using and curating our code.

Regarding your question I think you meant 'np.minimum(maxNrSpeakers, np.max(nrSpeakersPerSolution))-1 instead of np.maximum(maxNrSpeakers, np.max(nrSpeakersPerSolution))-1, could it be? If so I think I get what you mean. That line has two purposes.

First, it gives you some control over the number of clusters that the algorithm will output, so that it will never exceed a ceiling. This is just so that, if you know a hypothetical maximum number of speakers in your use case, the algorithm won't unnecessarily generate a wrong number of speakers. Say you know for sure there will be a maximum of 4 speakers. If the algorithm would output the best solution to be 5 clusters then this line would cap that to 4, which is (probably, there are some other factors that could go into this) closer to the real clustering solution.

Second, it prevents some nuisance on the spectral clustering method where very small, uninformative, eigenvalues would generate artificially large eigen-gaps, consistently generating a wrong solution. It is close to the first reason I just mentioned but it was found to be particularly important for this clustering method in our system.

I hope that helps. You mention some debugging: if you find bugs please do feel free to do a merge request with your proposed solutions and I will be happy to include them in the code.

Best regards,
Jose

pajowu · 2022-01-12T19:21:41Z

Hey,

thanks for the quick reply. I was wondering about the -1. If I understand it correctly this would actually only produce clusterings smaller than the max number of speakers, right?

For the bug, I just opened a pull request (#7)

Warm greetings,
Karl

josepatino · 2022-01-13T09:45:54Z

Thanks for the clarification. If I remember well that -1 has to do with the clustering algorithm being 1-indexed as per most of the rest of the code which is 0-indexed. I will have to run the code again to check your pull request (thanks!) and will get back to you if I have more comments on it.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question regarding maximal speakers #6

Question regarding maximal speakers #6

pajowu commented Jan 12, 2022

josepatino commented Jan 12, 2022

pajowu commented Jan 12, 2022

josepatino commented Jan 13, 2022

Question regarding maximal speakers #6

Question regarding maximal speakers #6

Comments

pajowu commented Jan 12, 2022

josepatino commented Jan 12, 2022

pajowu commented Jan 12, 2022

josepatino commented Jan 13, 2022