Depthwise 1D and 2D Resource strategy for io_stream #1079

steltze · 2024-10-11T15:30:25Z

Description

Related to Add Resource, io_parallel, depth_multiplier>1 and Quartus support for (Q)DepthwiseConv #859
Added const to the depth variable of the FIFOs between the Depthwise and the Pointwise convolutions to fix a Vitis HLS (2023.2) c-synthesis warning

Similar to the dense resource implementation, there are 3 separate implementations depending on the reuse factor and number of channels

The common functionality is similar to the dense_resource layer

calculate the block_factor
initialize the accumulator array with the biases
in_index: data-weight index, always incremented by the reuse factor
out_index: output index

rf < n_chan

the out_index follows the in_index and there is just a check for out_index if it's out of bounds

rf >= n_chan and rf % n_chan == 0

the out_index does not change in the block_factor loop, it's only incremented in the rufactor loop. It should be the least resource-demanding implementation

rf > n_chan

the out_index is incremented with the remainder rf % n_chan and there is just a check if out_index is out of bounds

Type of change

New feature (non-breaking change which adds functionality)

Tests

Apart from the numerical correctness in the pytests comparing Keras to hls4ml, executed successfully the c-synthesis
Checked the numerical correctness and c-synthesis of a 10k parameter UNet that has 20 Separable 2D Convolutions

Checklist

I have read the guidelines for contributing.
I have commented my code, particularly in hard-to-understand areas.
I have made corresponding changes to the documentation.
My changes generate no new warnings.
I have installed and run pre-commit on the files I edited or added.
I have added tests that prove my fix is effective or that my feature works.

…tis HLS warnings

steltze · 2024-10-11T15:33:36Z

Some results from vivado synthesis

jmitrevs · 2024-10-11T15:44:10Z

Thanks. I should mention that there is also an effort to develop depthwise (and seperable, which gets split) for oneAPI, which will replace the Quartus backend eventually. It's in a PR to the oneAPI PR.

jmitrevs · 2024-10-16T18:57:42Z

test/pytest/test_depthconv1d.py

+        # ('Vitis', 'io_stream', 'latency'),
+        ('Vivado', 'io_stream', 'resource'),
+        ('Vitis', 'io_stream', 'resource'),
+        # ('Catapult', 'io_stream', 'latency'),


This is a debugging setup that got committed. We should uncomment the useful checkps

@jmitrevs because of the addition of the reuse factor and input size options, the tests increase exponentially. I could include those options only to the resource strategy tests to reduce the total number of tests but it won't be easy to maintain. Any ideas?

Do we need all these reuse factor and input size options, i.e. how likely is it that the code would break in different ways for the different cases? I think we will probably need to live with testing only 1 or 2 or these combinations every time, otherwise this would get quite out of hand.

Reduced the number of tests to the minimum in order to cover all the cases of the resource implementation. Tests won't probably break unless something changes in the rf validation logic. When I first started the implementation, sepconv layer had a single rf which caused problems with dense-resource, now that sepconv is split, it works ok.

There was also an idea only to use rfs that trigger the 2nd case (rem0), instead of the 3rd one for better QoR.

jmitrevs · 2024-10-16T18:58:29Z

test/pytest/test_depthconv2d.py

+        # ('Vitis', 'io_stream', 'latency'),
+        ('Vivado', 'io_stream', 'resource'),
+        ('Vitis', 'io_stream', 'resource'),
+        # ('Catapult', 'io_stream', 'latency'),


Same issue here. The others should be uncommented.

jmitrevs · 2024-10-16T19:01:20Z

Updated to the latest main branch since I think it fixes the issue that caused the test failure.

steltze · 2024-10-17T19:20:06Z

Pipelining over the kernel size might produce better QoR. Pending to test this hypothesis

nghielme · 2024-11-21T10:39:29Z

What is blocking this PR to be merged?

vloncar · 2024-11-21T14:41:52Z

@nghielme It is still marked as a draft, so no one took the time to thoroughly review it 😉. Also, it will come after 1.0.0

xtreme8000 · 2025-01-01T23:04:52Z

With this patch my Vivado HLS 2019.2 fails with following output, regardless of resource or latency mode:

ERROR: [HLS 200-70] '#pragma HLS ARRAY_PARTITION variable=&acc type=complete' is not a valid pragma.
ERROR: [HLS 200-70] '#pragma HLS ARRAY_RESHAPE variable=&weights type=block factor=block_factor' is not a valid pragma.
ERROR: [HLS 200-70] '#pragma HLS ARRAY_RESHAPE variable=&data type=block factor=block_factor' is not a valid pragma.
ERROR: [HLS 200-70] '#pragma HLS ARRAY_PARTITION variable=&acc type=complete' is not a valid pragma.
ERROR: [HLS 200-70] '#pragma HLS ARRAY_PARTITION variable=&acc type=complete' is not a valid pragma.

The issue can be solved by removing type= from these pragmas as done elsewhere in the code base.

…ls4ml into depthwise_convolution_resource

steltze · 2025-01-09T15:13:07Z

thanks @xtreme8000! fixed

vloncar

Looks very good so far, needs minor changes.

vloncar · 2025-01-31T17:43:20Z

hls4ml/templates/vitis/nnet_utils/nnet_sepconv1d_stream.h

@@ -86,7 +86,7 @@ void separable_conv_1d_cl(hls::stream<data_T> &data, hls::stream<res_T> &res,
    #pragma HLS DATAFLOW

    hls::stream<dw_res_T> depthwise_res;
-    unsigned res_depth = CONFIG_T::depthwise_config::out_width;
+    const unsigned res_depth = CONFIG_T::depthwise_config::out_width;


Does using const here bring any benefit? And since this is a newer compiler, why not constexpr?

vloncar · 2025-01-31T17:45:39Z

hls4ml/templates/vivado/nnet_utils/nnet_sepconv_stream.h

+    const int multiplier_limit = DIV_ROUNDUP(nin, multfactor);
+    const int block_factor = DIV_ROUNDUP(nin, rufactor);
+
+    assert((multiplier_limit == block_factor) && "This function is correct only for RF <= N_IN");


Would be great if this message made it more clear what N_IN is. Same for N_OUT in others. We should update all functions to make it more clear, not just resource ones.

Should I change the non-resource functions in this PR or open a separate one?

vloncar · 2025-01-31T17:46:30Z

hls4ml/templates/vivado/nnet_utils/nnet_sepconv_stream.h

+    assert((rufactor > nout) && "This function is correct only for RF > N_IN");
+
+    #pragma HLS function_instantiate variable=weights,biases
+    //#pragma HLS RESOURCE variable=weights core=RAM_2P_BRAM Commenting out the deisgnation HLS seems to choose correctly


We should get rid of these comments across the file.

vloncar · 2025-01-31T17:48:28Z

hls4ml/templates/vivado/nnet_utils/nnet_sepconv_stream.h

-void depthwise_product(data_T data[CONFIG_T::kernel_size * CONFIG_T::n_chan], res_T res[CONFIG_T::n_chan],
-                       typename CONFIG_T::weight_t weights[CONFIG_T::kernel_size * CONFIG_T::n_chan],
-                       typename CONFIG_T::bias_t biases[CONFIG_T::n_chan]) {
+void depthwise_product_resource_rf_lt_nchan(data_T data[CONFIG_T::kernel_size * CONFIG_T::n_chan],


For the sake of making this file clearer, I propose we move all depthwise product functions into a separate file, nnet_depthwise_product.h. Don't forget that you'll have to then add this file in sepconv include list in convolution_templates.py

Done. I think changing convolution_templates.py is not needed? Tests seem to work. Is there another reason I am missing?

vloncar · 2025-01-31T17:49:05Z

test/pytest/test_depthconv1d.py

@@ -13,6 +13,8 @@
 strides_options = [(1), (2)]
 kernel_options = [(2), (3)]
 bias_options = [False]
+rf_options = [1, 16, 24]  # each rf corresponds to one of the three cases of depthwise resource for io_stream
+input_size_options = [16]


Why 16? This only makes the test run longer. Can we choose a smaller number?

Reduced to 3 and 4 respectively in order to make sure all 3 cases of the implementation are triggered

vloncar · 2025-01-31T17:51:23Z

test/pytest/test_depthconv2d.py

@@ -8,12 +8,13 @@

 test_root_path = Path(__file__).parent

-padds_options = ['same', 'valid']
+padds_options = ['same']


Why remove valid?

…conv FIFO depth

vloncar · 2025-02-24T23:40:51Z

The only thing I see lacking here is the use of function pointers for depnse instead of the current if (strategy == latency) do_latency_dense() else do_resource_dense() approach. We moved the rest of the codebase towards that model as it allows the dense function to be generated and implemented in special ways (HGQ and resource_unrolled use this). I've made the changes needded and put them in vloncar/stelios_depthwise_resource.

After extensive testing of both approaches, I've made a few observations:

In some rare instances function pointer adds a clock cycle to the main loop of depthwise function. I didn't manage to figure out why, the schedule viewer shows both approaches resulting in identical sequence of operations, but with function pointer one having one operation after depthwise_product function scheduled in the next cycle for no apparent reason. I'm thinking to follow this up with AMD.
Sometimes the 3rd case (rf > n_chan and rf % n_chan != 0) fails to meet timing. This is regardless of the approach used. We could consider enforcing RF that avoids it
Depending on the size of the layer/input, the pragma HLS INLINE recursive just prior to calling depthwise_resource may have a detrimental effect (going from added cycles to missing timing). Again, regardless of the approach. FWIW, this also affects regular Conv1D/2D.

Since the issues are specific to a use case, I'm thinking to merge the function pointer changes into this and then merge it to main. (there's also a trivial change to tests to avoid creating directory within a directory.) If the issue appears, it is trivial to manually tweak around it to avoid the use of function pointer and/or tweak the inline pragma. The 3rd case can also be avoided by the user if they observe that issue. If no one objects I'll do this tomorrow.

…esource

vloncar · 2025-02-26T20:34:43Z

This is now ready. Let's just wait to see if I messed up the tests before we merge.

Code updated

stzelepi added 6 commits October 7, 2024 13:46

Declare FIFO depth from depthwise to pointwise as const to supress Vi…

a3d8b6e

…tis HLS warnings

Split stream depthwise resource in 3 cases

d9cdb76

Clean hls implementation, extend depthwise test

00c6d00

Pass depthwise2d and sepconv2d tests for various filters and rfs

27a8791

Include tests to all relevant test files

4ecbd11

Run pre-commit

2bab916

jmitrevs added the please test Trigger testing by creating local PR branch label Oct 11, 2024

jmitrevs reviewed Oct 16, 2024

View reviewed changes

Merge branch 'main' into depthwise_convolution_resource

02f3fd6

jmitrevs added please test Trigger testing by creating local PR branch and removed please test Trigger testing by creating local PR branch labels Oct 16, 2024

steltze changed the title ~~Depthwise 1D and 2D Resource strategy for io_stream~~ Draft: Depthwise 1D and 2D Resource strategy for io_stream Oct 17, 2024

steltze changed the title ~~Draft: Depthwise 1D and 2D Resource strategy for io_stream~~ Depthwise 1D and 2D Resource strategy for io_stream Oct 17, 2024

steltze marked this pull request as draft October 17, 2024 20:39

steltze added 2 commits January 9, 2025 15:50

Clean code

26467dc

Merge branch 'depthwise_convolution_resource' of github.com:steltze/h…

2a1d697

…ls4ml into depthwise_convolution_resource

steltze marked this pull request as ready for review January 9, 2025 15:07

steltze added 4 commits January 9, 2025 16:14

Fix vivado hls synthesis

d7f7116

Run precommit

17c7ef6

Fix depthwise tests to include only 4 cases

5694425

Format code

bae1940

steltze and others added 3 commits January 10, 2025 12:34

Fix sepconv and depthwise tests to include only 3 cases

5b16b70

Merge branch 'main' into depthwise_convolution_resource

7b12d6e

Remove unused test options

6b18722

JanFSchulte added please test Trigger testing by creating local PR branch and removed please test Trigger testing by creating local PR branch labels Jan 10, 2025

vloncar previously requested changes Jan 31, 2025

View reviewed changes

steltze and others added 6 commits February 3, 2025 15:39

Restore "valid" on testbench, use constexpr instead of const for sep …

1e5b655

…conv FIFO depth

Reduce number of filters

4e1d61e

Move implementation into a different file

255e4ae

Merge branch 'main' into depthwise_convolution_resource

60e9cbe

Correct assertion comments

3b6ec84

Run precommit

8b2e64c

steltze requested a review from vloncar February 10, 2025 16:18

Use the dense function pointer in depthwise convolution

e048421

Merge remote-tracking branch 'upstream/main' into stelios_depthwise_r…

a12bebf

…esource

vloncar added please test Trigger testing by creating local PR branch and removed please test Trigger testing by creating local PR branch labels Feb 26, 2025

vloncar approved these changes Feb 26, 2025

View reviewed changes

vloncar merged commit 898d53d into fastmachinelearning:main Feb 26, 2025
8 of 10 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Depthwise 1D and 2D Resource strategy for io_stream #1079

Depthwise 1D and 2D Resource strategy for io_stream #1079

steltze commented Oct 11, 2024 •

edited

Loading

steltze commented Oct 11, 2024 •

edited

Loading

jmitrevs commented Oct 11, 2024

jmitrevs Oct 16, 2024

steltze Jan 9, 2025

JanFSchulte Jan 9, 2025

steltze Jan 10, 2025

jmitrevs Oct 16, 2024

jmitrevs commented Oct 16, 2024

steltze commented Oct 17, 2024 •

edited

Loading

nghielme commented Nov 21, 2024

vloncar commented Nov 21, 2024

xtreme8000 commented Jan 1, 2025

steltze commented Jan 9, 2025

vloncar left a comment

vloncar Jan 31, 2025

vloncar Jan 31, 2025

steltze Feb 10, 2025

vloncar Jan 31, 2025

vloncar Jan 31, 2025

steltze Feb 10, 2025 •

edited

Loading

vloncar Jan 31, 2025

steltze Feb 10, 2025

vloncar Jan 31, 2025

vloncar commented Feb 24, 2025

vloncar commented Feb 26, 2025

Depthwise 1D and 2D Resource strategy for io_stream #1079

Depthwise 1D and 2D Resource strategy for io_stream #1079

Conversation

steltze commented Oct 11, 2024 • edited Loading

Description

Type of change

Tests

Checklist

steltze commented Oct 11, 2024 • edited Loading

jmitrevs commented Oct 11, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jmitrevs commented Oct 16, 2024

steltze commented Oct 17, 2024 • edited Loading

nghielme commented Nov 21, 2024

vloncar commented Nov 21, 2024

xtreme8000 commented Jan 1, 2025

steltze commented Jan 9, 2025

vloncar left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

steltze Feb 10, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vloncar commented Feb 24, 2025

vloncar commented Feb 26, 2025

steltze commented Oct 11, 2024 •

edited

Loading

steltze commented Oct 11, 2024 •

edited

Loading

steltze commented Oct 17, 2024 •

edited

Loading

steltze Feb 10, 2025 •

edited

Loading