Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improvement suggestions #13

Open
ThibaultCastells opened this issue Aug 3, 2023 · 3 comments
Open

Improvement suggestions #13

ThibaultCastells opened this issue Aug 3, 2023 · 3 comments

Comments

@ThibaultCastells
Copy link
Contributor

ThibaultCastells commented Aug 3, 2023

Hello,
Thank you for providing this tool! It's great!
If you are interested in improvement suggestions, I would like to suggest the following:

  • When using vggish the model checkpoint is downloaded automatically, but it is not the case with pann.
    Therefore, I get the following error:
    FileNotFoundError: [Errno 2] No such file or directory: 'mypath/.cache/torch/hub/Cnn14_16k_mAP%3D0.438.pth'
    If this is intended, I would suggest to update the readme to explain how to add this model.
  • in the readme, it would be nice to have an indicator of how long it takes to generate the FAD for each method
  • in the readme, more examples of how to use the library would be great (I had to read the code to understand how I can reuse an embedding multiple times in different comparisons)
  • when computing the FAD between a data folder and itself, I do not get exactly 0 (I get a near zero value of -1.1368683772161603e-13 instead). I think it would be better to get exactly 0 in this case.

Again, thank you for your work!

@gudgud96
Copy link
Owner

gudgud96 commented Aug 3, 2023

Hi @ThibaultCastells,

Thank you for all your detailed suggestions. My replies as follow:

  1. pann should be loaded if not cached, thanks for raising that. I shall address this soon.
  2. How long it takes to generate the FAD depends on, e.g. CPU / GPU, length & amount of audio. IMO there isn't much motivation to provide a benchmark as it varies across different use case. Also, so far speed is not an issue for my own use cases. But if it is an issue in your case, please feel free to raise it.
  3. I personally care more about the FAD score instead of the embeddings. From what you raised, I think I can detail more on (1) how to extract the embeddings; (2) how to calculate scores from the extracted embeddings, other than the one-liner that we have in README now.
  4. -1e-13 is pretty small to me and passable using e.g. np.allclose. To me getting 0 would be a nice-to-have, and might have a lower priority among the above.

Feel free to raise a PR if you are interested in contributing too. Thanks again!

@ThibaultCastells
Copy link
Contributor Author

Thank you for this quick answer!

How long it takes to generate the FAD depends on, e.g. CPU / GPU, length & amount of audio. IMO there isn't much motivation to provide a benchmark as it varies across different use case. Also, so far speed is not an issue for my own use cases. But if it is an issue in your case, please feel free to raise it.

I made this suggestion because it seemed to me that pann takes much longer, so what I meant is not to make a benchmark but to give an idea of the time difference between the 2 methods.

I personally care more about the FAD score instead of the embeddings. From what you raised, I think I can detail more on (1) how to extract the embeddings; (2) how to calculate scores from the extracted embeddings, other than the one-liner that we have in README

I made this suggestion because in my case I need to compare generative models output to a baseline model output. I am assuming people would use this tool for research purpose, so this must be a common situation.

Also, I wanted to mention that none of those are real 'issues' to me, I am able to use your tool and am really happy with it. It was simple improvement suggestions to improve its usability, because I think your work deserves to become the go-to FAD tool :)

@gudgud96
Copy link
Owner

gudgud96 commented Aug 4, 2023

  1. Got your point of view, thanks for raising that. I think it is useful to showcase latency difference between models, I shall mark that down too.
  2. Sure, I also got feedback about using only the embedding extraction part. As mentioned above, I shall provide more details on that.

Thanks again for your kind words and these very insightful suggestions. I was constrained by my own view and use case, and am glad to understand other perspectives of using this tool. Again, thanks for your contribution!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants