-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LSTM-g #4
Comments
Apologies for my ignorance of Julia, but kickstarting this, GatedLayer may need to be an abstract type in order to implement the three/seven types of layers. I think a concrete immutable would need mutable objects covering the data for the union of all layer types unless the specialization is within a field/array of another type such as For API and implementation guidance, besides my cute library, the quite readable Synaptic has (may need to search
https://github.com/cazala/synaptic/blob/master/test/synaptic.js is a minimal test and https://github.com/cazala/synaptic/blob/master/dist/synaptic.js is the five above in one file but not minimized (plus irrelevant Bower, NPM etc. from https://github.com/cazala/synaptic/blob/master/src/synaptic.js). |
Thanks for the thoughts/links. When you say "3/7 types of layers", what exactly do you mean? My understanding (after a few readings of the paper) is that every LSTM component can be represented by the same type of layer with common math, differentiated solely by the connectivity structure (this is why I'm interested btw). So a memory cell is a memory cell because of the self-connection and the gating of its inward and outward connections. It has the exact same math as a forget gate, but its "G" is the empty set. @dmonner Am I missing something?
|
Sorry for the confusion. You're mostly right, but inputs, outputs, and bias units/connections do need special handling. Storing everything in terms of the specific layers (instead of having to deduce the memory cell layer(s) by their self-connected memory units) complicates things a bit but is good for performance and keeping the architectures LSTM-like. |
I'm not convinced yet. An input layer is a As for bias, I was expecting that to be a core part of the layer, possibly parameterizing the type if needed:
I'm not 100% on the bias yet... I need to get my hands more dirty with code before I decide.
Actually I think it's the opposite. Having one layer type with consistent math makes everything super simple to reason about, and also doesn't in any way lock one into the LSTM model. For those that want cookie-cutter LSTM, there would be a constructor |
Your bias idea sounds good. What I meant by layers complicating things is separating the layers by what they do, and you're right that most separation is only relevant to building the network. If you want to do memory unit deduction in the optimization pass, there's plenty of ways to store that info I guess. Inputs and outputs are special only because you write input directly to activations and need to know which units determine output and error. That's all, and Julia's iterators can probably handle that part well enough. Iterating over whatever associates tags with units may have some indirection overhead, but that's just my premature optimizer talking. |
I've been mulling this over for a couple days. I have a good framework of types/structure without the forward/backward code, but I've been re-deriving the core math from a slightly more generalized perspective, with the hope that my implementation is less "gated connections and memory cells", and more "optionally recurrent layers with multiplicative connections". The hope is that the math is slightly simplified and that there are more network architectures which can be built naturally. If I come up with something good, I may throw together a blog post about it. For now, just be patient. |
Thanks for sharing your progress. Do you have multiply-gated connections in mind? |
I want to start playing around with hierarchical networks using LSTM-g nodes... let this issue be a discussion point for anyone that wants to collaborate on a pure-Julia implementation as part of OnlineAI.
paper: http://www.overcomplete.net/papers/nn2012.pdf
cc: @dmonner @mrmormon @cazala
The text was updated successfully, but these errors were encountered: