-
Notifications
You must be signed in to change notification settings - Fork 611
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SliceOut Layer - enchanced dropout #2145
Comments
/cc @dynamicwebpaige @tanzhenyu is this in your internal roadmap? |
This seems like a generic & experimental technique which it might be best to host in addons. (so it's not specific to cv or nlp) |
@tanzhenyu An so I suppose also not in Keras standalone/tf.keras right? |
That is correct. If this becomes successful, we should help move it from addons to tf.keras. |
@bhack, @tanzhenyu Thank You for the response, can I start to implement it in TensorFlow addons and test its performances? |
@g0lemXIV Is there any reference impl? |
@bhack I couldn't find any... It seems authors didn't share an implementation in any framework. |
I think that we need to wait for a sponsor to review e co-maintain this feature. /cc @seanpmorgan |
Yeah I would agree to co-maintain this feature. Will want to benchmark it for performance / accuracy vs. dropout as is done in the paper. Please proceed with a PR @g0lemXIV |
Hello, sorry that I didn't respond. I've tried to read and implement the paper but I had many errors during the implementation. I think it will be hard to implement their structure in TensorFlow because of the changes in graph structure dynamically. Therefore, I must leave this feature request. Sorry again. |
@g0lemXIV you can paste the errors here. Maybe we can help out with that? |
The current developments would be helpful to see, too. |
Looks to be quite simple for the Dense layer: class DenseSliceOut(Dense):
def __init__(self,
units,
dropout,
**kwargs):
super().__init__(
units, **kwargs)
self.slice_size = int(units * (1 - dropout))
def call(self, inputs, training=None):
if training is None:
training = K.learning_phase()
if not training:
return super().call(inputs)
outputs_shape = self.compute_output_shape(inputs.shape)
begin = tf.random.uniform([], maxval=self.units-self.slice_size+1, dtype=tf.int32)
outputs = core_ops.dense(
inputs,
tf.slice(self.kernel, [0, begin], [self.kernel.shape[0], self.slice_size]),
tf.slice(self.bias, [begin], [self.slice_size]),
self.activation,
dtype=self._compute_dtype_object)
outputs = outputs # Placeholder for upscaling (normalization)
outputs = tf.pad(outputs, [[0, 0]]*(len(outputs_shape)-1) + [[begin, self.units-self.slice_size-begin]])
if not context.executing_eagerly():
outputs.set_shape(outputs_shape)
return outputs If stacked, inputs could be sliced too. Not sure if complementary things such as tf.pad will hurt performance anyway. |
TensorFlow Addons is transitioning to a minimal maintenance and release mode. New features will not be added to this repository. For more information, please see our public messaging on this decision: Please consider sending feature requests / contributions to other repositories in the TF community with a similar charters to TFA: |
Describe the feature and the current behavior/state.
SliceOut regularization for speedups and memory reduction by dropping contiguous sets of units at random, method preserves the regularization properties of dropout while allowing for more efficient low-level implementation, resulting in training speedups through fast memory access and matrix multiplication of smaller tensors, and memory savings by avoiding allocating memory to zero units in weight gradients and activations. Despite its simplicity, the method is highly effective.
Relevant information
Which API type would this fall under (layer, metric, optimizer, etc.)
Who will benefit with this feature?
Any other info.
The text was updated successfully, but these errors were encountered: