-
Notifications
You must be signed in to change notification settings - Fork 432
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Depthwise 1D and 2D Resource strategy for io_stream #1079
Depthwise 1D and 2D Resource strategy for io_stream #1079
Conversation
Thanks. I should mention that there is also an effort to develop depthwise (and seperable, which gets split) for oneAPI, which will replace the Quartus backend eventually. It's in a PR to the oneAPI PR. |
test/pytest/test_depthconv1d.py
Outdated
# ('Vitis', 'io_stream', 'latency'), | ||
('Vivado', 'io_stream', 'resource'), | ||
('Vitis', 'io_stream', 'resource'), | ||
# ('Catapult', 'io_stream', 'latency'), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a debugging setup that got committed. We should uncomment the useful checkps
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jmitrevs because of the addition of the reuse factor and input size options, the tests increase exponentially. I could include those options only to the resource
strategy tests to reduce the total number of tests but it won't be easy to maintain. Any ideas?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need all these reuse factor and input size options, i.e. how likely is it that the code would break in different ways for the different cases? I think we will probably need to live with testing only 1 or 2 or these combinations every time, otherwise this would get quite out of hand.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reduced the number of tests to the minimum in order to cover all the cases of the resource implementation. Tests won't probably break unless something changes in the rf
validation logic. When I first started the implementation, sepconv
layer had a single rf
which caused problems with dense-resource
, now that sepconv
is split, it works ok.
There was also an idea only to use rf
s that trigger the 2nd case (rem0), instead of the 3rd one for better QoR.
test/pytest/test_depthconv2d.py
Outdated
# ('Vitis', 'io_stream', 'latency'), | ||
('Vivado', 'io_stream', 'resource'), | ||
('Vitis', 'io_stream', 'resource'), | ||
# ('Catapult', 'io_stream', 'latency'), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same issue here. The others should be uncommented.
Updated to the latest main branch since I think it fixes the issue that caused the test failure. |
Pipelining over the kernel size might produce better QoR. Pending to test this hypothesis |
What is blocking this PR to be merged? |
@nghielme It is still marked as a draft, so no one took the time to thoroughly review it 😉. Also, it will come after 1.0.0 |
With this patch my Vivado HLS 2019.2 fails with following output, regardless of resource or latency mode: ERROR: [HLS 200-70] '#pragma HLS ARRAY_PARTITION variable=&acc type=complete' is not a valid pragma.
ERROR: [HLS 200-70] '#pragma HLS ARRAY_RESHAPE variable=&weights type=block factor=block_factor' is not a valid pragma.
ERROR: [HLS 200-70] '#pragma HLS ARRAY_RESHAPE variable=&data type=block factor=block_factor' is not a valid pragma.
ERROR: [HLS 200-70] '#pragma HLS ARRAY_PARTITION variable=&acc type=complete' is not a valid pragma.
ERROR: [HLS 200-70] '#pragma HLS ARRAY_PARTITION variable=&acc type=complete' is not a valid pragma. The issue can be solved by removing |
…ls4ml into depthwise_convolution_resource
thanks @xtreme8000! fixed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks very good so far, needs minor changes.
@@ -86,7 +86,7 @@ void separable_conv_1d_cl(hls::stream<data_T> &data, hls::stream<res_T> &res, | |||
#pragma HLS DATAFLOW | |||
|
|||
hls::stream<dw_res_T> depthwise_res; | |||
unsigned res_depth = CONFIG_T::depthwise_config::out_width; | |||
const unsigned res_depth = CONFIG_T::depthwise_config::out_width; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does using const
here bring any benefit? And since this is a newer compiler, why not constexpr
?
const int multiplier_limit = DIV_ROUNDUP(nin, multfactor); | ||
const int block_factor = DIV_ROUNDUP(nin, rufactor); | ||
|
||
assert((multiplier_limit == block_factor) && "This function is correct only for RF <= N_IN"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be great if this message made it more clear what N_IN
is. Same for N_OUT
in others. We should update all functions to make it more clear, not just resource
ones.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should I change the non-resource functions in this PR or open a separate one?
assert((rufactor > nout) && "This function is correct only for RF > N_IN"); | ||
|
||
#pragma HLS function_instantiate variable=weights,biases | ||
//#pragma HLS RESOURCE variable=weights core=RAM_2P_BRAM Commenting out the deisgnation HLS seems to choose correctly |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should get rid of these comments across the file.
void depthwise_product(data_T data[CONFIG_T::kernel_size * CONFIG_T::n_chan], res_T res[CONFIG_T::n_chan], | ||
typename CONFIG_T::weight_t weights[CONFIG_T::kernel_size * CONFIG_T::n_chan], | ||
typename CONFIG_T::bias_t biases[CONFIG_T::n_chan]) { | ||
void depthwise_product_resource_rf_lt_nchan(data_T data[CONFIG_T::kernel_size * CONFIG_T::n_chan], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the sake of making this file clearer, I propose we move all depthwise product functions into a separate file, nnet_depthwise_product.h
. Don't forget that you'll have to then add this file in sepconv include list in convolution_templates.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. I think changing convolution_templates.py
is not needed? Tests seem to work. Is there another reason I am missing?
test/pytest/test_depthconv1d.py
Outdated
@@ -13,6 +13,8 @@ | |||
strides_options = [(1), (2)] | |||
kernel_options = [(2), (3)] | |||
bias_options = [False] | |||
rf_options = [1, 16, 24] # each rf corresponds to one of the three cases of depthwise resource for io_stream | |||
input_size_options = [16] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why 16? This only makes the test run longer. Can we choose a smaller number?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reduced to 3
and 4
respectively in order to make sure all 3 cases of the implementation are triggered
test/pytest/test_depthconv2d.py
Outdated
@@ -8,12 +8,13 @@ | |||
|
|||
test_root_path = Path(__file__).parent | |||
|
|||
padds_options = ['same', 'valid'] | |||
padds_options = ['same'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why remove valid
?
The only thing I see lacking here is the use of function pointers for depnse instead of the current After extensive testing of both approaches, I've made a few observations:
Since the issues are specific to a use case, I'm thinking to merge the function pointer changes into this and then merge it to main. (there's also a trivial change to tests to avoid creating directory within a directory.) If the issue appears, it is trivial to manually tweak around it to avoid the use of function pointer and/or tweak the inline pragma. The 3rd case can also be avoided by the user if they observe that issue. If no one objects I'll do this tomorrow. |
This is now ready. Let's just wait to see if I messed up the tests before we merge. |
Description
const
to the depth variable of the FIFOs between the Depthwise and the Pointwise convolutions to fix a Vitis HLS (2023.2) c-synthesis warningSimilar to the dense resource implementation, there are 3 separate implementations depending on the reuse factor and number of channels
The common functionality is similar to the dense_resource layer
calculate the
block_factor
initialize the accumulator array with the biases
in_index
: data-weight index, always incremented by thereuse factor
out_index
: output indexrf < n_chan
the
out_index
follows thein_index
and there is just a check forout_index
if it's out of boundsrf >= n_chan
andrf % n_chan == 0
the
out_index
does not change in theblock_factor
loop, it's only incremented in therufactor
loop. It should be the least resource-demanding implementationrf > n_chan
the
out_index
is incremented with the remainderrf % n_chan
and there is just a check ifout_index
is out of boundsType of change
Tests
Checklist
pre-commit
on the files I edited or added.