-
-
Notifications
You must be signed in to change notification settings - Fork 95
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CRF head [WIP] #393
base: master
Are you sure you want to change the base?
CRF head [WIP] #393
Conversation
Want to try and add a new CRF layer for sequence tagging / prediction, and will implement Viterbi decoding and NLL as the loss value.
There is some logic needed to properly create the transitions matrix, so add an initializer function using range + xavier uniform, and disallow Any -> BOS or EOS -> Any transitions.
Following other implementations, will do scores + log partition function for the forward pass (getting NLL).
Uses array passed in and only reshapes if needed (the new Tensor has a larger size than the old one). Needed / think should help when doing index_select with each subset of the same size. Example here is selecting batch_size for each time step in CRF emissions.
Ran 'nimpretty' to clean up formatting / long lines, and passed more information to the nnp_crf functions.
Implementation of forward pass underway, starting with scores (non normalized log prob with emission + transition components).
Fix some bugs with CRF non-normalized score calculation (mostly making sure that not returning matrix when shouldn't when using index_select). Also fix some out-of-bounds bug due to loop over time steps.
Don't hesitate to ask if you are stuck on a specific thing. |
Thanks, appreciate it! I'm going to try and pick this up again tonight/tomorrow and will probably have some better questions once I'm done with the forward pass. |
Haven't forgotten about this, just haven't had as much time to work on this as I thought I would. |
No problem, I don't have much time myself |
I'm going to need some help refining this (especially not sure yet how the backward pass will work), but I think that I can add a CRF head for sequence prediction that should work with (for example) the GRU layer.
I've been mostly following your guide #331
TODO: