-
Notifications
You must be signed in to change notification settings - Fork 419
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Support for ConvTranspose Layers (1D and 2D) #644
Add Support for ConvTranspose Layers (1D and 2D) #644
Conversation
For the nondefault project name handling, it may be good to rebase with the current main branch. I think those things have been solved, though of course, there are no guarantees. |
2be0bd7
to
0604fb9
Compare
@Jonathan-Shoemaker since #600 has been merged, can you rebase? |
I was wondering about the status of this PR. We'll talk about the code status and release schedule this Friday, and the conv transpose layer is an important layer for us to support. |
The PR no longer is waiting on any others. There are still slight issues in optimization for 2D transpose. I can clean it up a little bit / rebase, etc. |
add new files for conv1dtranspose resource clean up so that conv code is reached. Still need to get the actual implementation matching keras implement conv1dtranspose super inefficiently (gets correct answer though) try to fix indices to make code work make the c code work for conv1dtranspose reduce weight dimensions to properly reflect transposed kernel size clean up so that transpose filter width is passes around from config fix code such that simple transpose layer gets synthesized move variables out of loops, optimize slightly and add in alternative method of computation to compute by kernel (that option is not optimized as of now) add in conv1d transpose linebuffer format code. seems to work, unsure of if it is optimized yet trying to fix stream behavior get transpose compilation working mostly as expected. weird jump in latency from reuse 1 to 2 still exists initial conv2dtranspose addition. Output is permuted as of now. output in correct order. using large array to buffer output though fix up conv1dtranspose a bit to pad correctly. fix up stream instructions for both 1d and 2d transposes fix allowed reuse factors for transpose layers update to new conv methods for io_parallel. Still some issues with multiple filters as well as some padding issues clean up error with multiple filters and larger kernels optimize conv transpose resource to get it working reasonably well. may still have slight optimization left fix output to conv1d transpose resource add conv2dtranspose io_parallel implementation. Can still be optimized small changeup to data storage in conv1d parallel fix zero padding pass addition for transpose stream layers move transposing of weight matrix to resource_strategy for transpose layers change how stream loads in weights to be like parallel for conv transposes. unroll all stride steps completely fix output of 1d transpose parallel to be faster change 1d transpose weight input to be 2-dimensional (passed from python code) change 2d transpose weight input to be 3-dimensional (passed from python code) small changes to transposes Revert "fix nondefault project name handling (fastmachinelearning#626)". The commit breaks the Vivado Accelerator workflow, and the fix is unclear to me right now. This reverts commit e8f048a. steps towards getting integer inputs to work
Hi @Jonathan-Shoemaker, I squashed your commits + rebased to main and tried to decouple the nonrelated changes on my branch, diff here: main...jmduarte:hls4ml:conv_tr_parallel Can I push it here and we can proceed to review it? |
sounds good to me. I can work on adding tests |
0604fb9
to
970ee1c
Compare
Great! sounds good. I'll also review what's here soon, have some minor comments/questions. Also don't worry about running pre-commit yet, we can run that at the end after we're done reviewing (to not introduce large diffs). |
I think we want to support this for version 0.8. I will try rebasing it on the current main. |
What is the meaning of "keep_dims"? |
The rebase is at https://github.com/fastmachinelearning/hls4ml/tree/conv_tr_rebase. There were lots of merge conflicts so please take a look. We can replace this PR with that one, or force push it. |
#844 is the version of this PR based on my rebased attempt. I wanted to make the PR to see how the tests go. |
keep_dims keeps the weight matrix from being entirely flattened, keeping the first "keep_dims" dimensions as is, flattening along all other dimensions. The reason for this is that the ConvTranspose is computed as the interleaving of "stride" number of Conv layers. The dimensions kept are for indexing into these different Conv layers. The idea was that the weight matrix of a ConvTranspose layer can be thought of as a disjoint set of weight matrices for Conv layers and treating it as such was easier. |
Hello does this mean currently Convd2DTranspose from keras is not supported in hls4ml |
Not yet, but hopefully in a few days (in the main branch, not release). |
I think we moved to the rebased request (#844), so I will close this one |
hi @MODISENEISO. It looks like you're using a conda environment. Did you do You can also install any branch of hls4ml as follows: pip install git+https://github.com/fastmachinelearning/hls4ml@conv_tr_rebase In this case, Thanks, |
Description
This adds support for ConvTranspose layers. Specifically, it adds support for both io_stream and io_parallel compilation of Conv1DTranspose and Conv2DTranspose (as of now, only converted from keras).
The strategy roughly follows that of non-transposed convolution layers. We treat a conv transpose as a group of
stride_width
bystride_height
convolutions, with their outputs interlaced. Thus, we essentially do a normal conv implementation where each kernel producesstride_width * stride_height
outputs. Perhaps the most unintuitive part of how things are currently set up is that the weight matrix is transformed substantially (in the python code). This is done to split up the kernel into what amounts to thestride_width * stride_height
smaller kernels.The draft PR depends on PR #600 due to use of that PR's new implementation of io_parallel conv layers. Thus, all of the changes from that PR are currently included in this draft (will change once it is merged).
As of now, it seems both parallel and stream 1D implementations are working well, with performance matching that of the non-transposed layers. There are slight latency increases in the 2D implementations that may need to be worked out (namely the writing of data in both cases is a bit slow - in parallel it seems to have trouble writing to the output in the order the implementation wants to and in stream there are often multiple writes that get queued up which causes the implementation to take extra cycles).
Type of change
Tests
Still have to add tests to this PR.
Test Configuration:
Testing was done by compiling models consisting of single ConvTranspose layers, and comparing the performance of those layers to analogous Conv layers (i.e. a layer that maps the conv transpose output to its input).
Checklist