-
Notifications
You must be signed in to change notification settings - Fork 419
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rebased version the PR to add support for ConvTranspose layers #844
base: main
Are you sure you want to change the base?
Conversation
add new files for conv1dtranspose resource clean up so that conv code is reached. Still need to get the actual implementation matching keras implement conv1dtranspose super inefficiently (gets correct answer though) try to fix indices to make code work make the c code work for conv1dtranspose reduce weight dimensions to properly reflect transposed kernel size clean up so that transpose filter width is passes around from config fix code such that simple transpose layer gets synthesized move variables out of loops, optimize slightly and add in alternative method of computation to compute by kernel (that option is not optimized as of now) add in conv1d transpose linebuffer format code. seems to work, unsure of if it is optimized yet trying to fix stream behavior get transpose compilation working mostly as expected. weird jump in latency from reuse 1 to 2 still exists initial conv2dtranspose addition. Output is permuted as of now. output in correct order. using large array to buffer output though fix up conv1dtranspose a bit to pad correctly. fix up stream instructions for both 1d and 2d transposes fix allowed reuse factors for transpose layers update to new conv methods for io_parallel. Still some issues with multiple filters as well as some padding issues clean up error with multiple filters and larger kernels optimize conv transpose resource to get it working reasonably well. may still have slight optimization left fix output to conv1d transpose resource add conv2dtranspose io_parallel implementation. Can still be optimized small changeup to data storage in conv1d parallel fix zero padding pass addition for transpose stream layers move transposing of weight matrix to resource_strategy for transpose layers change how stream loads in weights to be like parallel for conv transposes. unroll all stride steps completely fix output of 1d transpose parallel to be faster change 1d transpose weight input to be 2-dimensional (passed from python code) change 2d transpose weight input to be 3-dimensional (passed from python code) small changes to transposes Revert "fix nondefault project name handling (#626)". The commit breaks the Vivado Accelerator workflow, and the fix is unclear to me right now. This reverts commit e8f048a. steps towards getting integer inputs to work
Why do we need multidimensional weights here? Passing pointers doesn't work as expected? I like the idea, we'll need it in the future for Vitis, but the |
# (W,F,C) => (F,W,C) | ||
data = np.transpose(data, axes=[1, 0, 2]) | ||
# now split the kernel into stride width kernels (F, W, C) -> (S, F, W/S, C) | ||
n_filts, kern_width, n_chan = data.shape | ||
new_weights = np.zeros((self.attributes['stride_width'], n_filts, self.attributes['trfilt_width'], n_chan)) | ||
for i_sw in range(self.attributes['stride_width']): | ||
for i_fw in range(self.attributes['trfilt_width']): | ||
filt_ind = i_sw + (self.attributes['trfilt_width'] - i_fw - 1) * self.attributes['stride_width'] | ||
for i_nf in range(n_filts): | ||
for i_nc in range(n_chan): | ||
if filt_ind < kern_width: | ||
new_weights[i_sw][i_nf][i_fw][i_nc] = data[i_nf][filt_ind][i_nc] | ||
data = new_weights |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vloncar I think the "need" for the multidimensional weights comes from this transpose. I think this can still be done even if all the weights are kept single dimensional, but maybe it's less "clean" or "obvious" what's going on?
maybe @Jonathan-Shoemaker can also comment more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and @Jonathan-Shoemaker's reply:
What is the meaning of "keep_dims"?
keep_dims keeps the weight matrix from being entirely flattened, keeping the first "keep_dims" dimensions as is, flattening along all other dimensions.
The reason for this is that the ConvTranspose is computed as the interleaving of "stride" number of Conv layers. The dimensions kept are for indexing into these different Conv layers. The idea was that the weight matrix of a ConvTranspose layer can be thought of as a disjoint set of weight matrices for Conv layers and treating it as such was easier.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You want to essentially consistently use disjoint submatrices of the weight matrix for sub-computations, and interleave the results from these sub-computations. My understanding is the alternative would be to pass large slices of the matrix. I can't remember if I tested this, but assume it would work, I think I just found it messier.
I pushed a first version of a test and tried to make initial fixes, though there are many remaining errors. |
Looking at the new test, the results see to be:
and |
The output dimensions inconsistency is for padding = "valid". For "same" the output is flattened but there are the correct number of values. However, they still don't match. |
I am having trouble getting this to work. I wrote a python implementation of conv2d_transpose that works and I think I understand the im2col style io_parallel implementation that's here, but it's not coming together. The keep_dim weights that are written out actually look strangely reordered. I may branch and try to use the code here while dropping the keep_dim concept. |
One of the purposes of keep_dims is that the weights are purposefully written in a different order than in the original order of the conv_transpose weight matrix. Purpose of this is that the kernel of conv transpose is better computed as separate kernels of smaller conv layers. These kernels are within the transpose kernel but are not continuous in the normal transpose matrix output. |
With these few changes conv2dtranspose seems to work for io_parallel. @Jonathan-Shoemaker, do you want to take a look at why the other cases in the pytest are failing? |
Actually, the conv1dtranspose io_parallel test was due to the pytest being misconfigured, but the io_stream tests are still failing. |
I think the issue is with padding. With padding="same" it seems to work well. |
Also needed is checking for valid reuse factors for conv transpose. |
Description
This is a rebased version of #644. Please look there for details, but in summary it adds support for both io_stream and io_parallel compilation of Conv1DTranspose and Conv2DTranspose.
Type of change
Tests
Still a to-do.
Checklist
pre-commit
on the files I edited or added.