You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I really enjoyed the paper -- a super interesting read and I learned a lot!
In case you're interested, I wanted to point you to an nonprojective entropy implementation in torch_struct that is quite efficient (should you have future use cases for it.) A similar derivation can be used for other additively factorable functions as well.
Thank you for your interest in the paper and work! That is a very good use of torch_struct, indeed your derivation is very similar to ours with just a slightly different final equation. I think as you mentioned in the Colab (I added another version that purely uses our library) I think it may have to do with the constants as what we do with the marginals in both cases is virtually the same. I believe torch_struct is designed to be very efficient and so getting the marginal distribution from there rather than our implementation is a potential reason for the comparison. Thanks for pointing this out!
our version has a loop in your implementation, your method doesn't. Can you rewrite the code to make them more similar? (Vectorize both, or use loops for both)
is it possible that enabling nograd had a bunch of overhead?
Other thoughts:
It might also be a good idea to recompute the common stuff so that only the difference in the two methods is being compared.
In order to see the big-O behavior, do a log-log plot (that makes a function like time = C*N^K look linear log time = C + K * log N). The slope K should be roughly equal and less than 3 for both methods. The intercept is the overhead coefficient C.
Hi,
I really enjoyed the paper -- a super interesting read and I learned a lot!
In case you're interested, I wanted to point you to an nonprojective entropy implementation in torch_struct that is quite efficient (should you have future use cases for it.) A similar derivation can be used for other additively factorable functions as well.
PR: harvardnlp/pytorch-struct#103
Colab comparison: https://colab.research.google.com/drive/1iUr78J901lMBlGVYpxSrRRmNJYX4FWyg?usp=sharing
Best,
Tom
The text was updated successfully, but these errors were encountered: