Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Transformer in FINN: Scaled Dot-Product Attention #13

Merged
merged 90 commits into from
Feb 6, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
90 commits
Select commit Hold shift + click to select a range
9e7a475
Start sketching out the scaled dot-product attention custom op
iksnagreb Jul 9, 2023
7f97332
[Attention] Add __init__ method to custom op
iksnagreb Jul 10, 2023
e77ad2b
[Attention] Add datatype and shape queries to custom op
iksnagreb Jul 10, 2023
c95b397
[Attention] Add stream/bit-width queries to custom op
iksnagreb Jul 10, 2023
4a0e98e
[Attention] Add refactored node attributes matching HLS op template
iksnagreb Aug 4, 2023
c3ea73e
[Attention] Adapt the custom op to the new folding concept
iksnagreb Aug 7, 2023
602f1ca
[Attention] Fix get_ap_int_max_w output and mask stream width
iksnagreb Aug 7, 2023
ad17b1b
[Attention] Start filling some of the HLSCustomOp abstract methods
iksnagreb Aug 7, 2023
0de1bce
[Attention] Fill out includes and defines for C++ code generation
iksnagreb Aug 8, 2023
de9dc73
[Attention] Add IP generation C++ source generation step to test
iksnagreb Aug 8, 2023
f21a47c
[Attention] Add some interface pragmas for C++ code generation
iksnagreb Aug 8, 2023
8e94cfe
[Attention] Add stream declarations for C++ simulation code generation
iksnagreb Aug 8, 2023
03ddfb2
[Attention] Add attention function body to C++ code generation
iksnagreb Aug 8, 2023
295ab25
[Attention] Add C++ simulation code feeding the input streams from files
iksnagreb Aug 8, 2023
b6a26e1
[Attention] Add C++ simulation code saving the output stream to file
iksnagreb Aug 9, 2023
5d800e7
[Attention] Add missing "" to generated C++ strings
iksnagreb Aug 9, 2023
906a8c5
[Attention] Add missing bit width cases to get_ap_int_max_w
iksnagreb Aug 9, 2023
acaa9b2
Some clean up and "# noqa" to calm the IDE
iksnagreb Aug 9, 2023
b41575d
[Attention] Get C++ simulation to compile and prepare inputs
iksnagreb Aug 10, 2023
a718bf6
[Attention] Move dummy model wrapper construction out of custom op
iksnagreb Aug 10, 2023
189a415
[Attention] Refactor the cppsim unit test using thresholds in python sim
iksnagreb Aug 14, 2023
b00c64a
[Attention] Switch to the HLS function-call operator style
iksnagreb Aug 15, 2023
094f920
[Attention] Refactor towards thresholds HLS code generation
iksnagreb Aug 16, 2023
5d2836a
[Attention] Generate HLS code for all three activation thresholds
iksnagreb Aug 17, 2023
76e5e0e
[Attention] Initialize the attention operator using generated thresholds
iksnagreb Aug 17, 2023
b152f23
[Attention] Numpy softmax matching overflow behavior of the HLS operator
iksnagreb Aug 17, 2023
8bd5a20
[Attention] Satisfy attention output type constraint
iksnagreb Aug 21, 2023
65de26d
[Attention] Increase test bitwidth to see some more interesting behavior
iksnagreb Aug 21, 2023
ce1e19b
[Attention] Remove python mode node execution
iksnagreb Aug 21, 2023
7500606
[Attention] Fix cppsim test accumulator bitwidth
iksnagreb Aug 22, 2023
096364c
[Attention] Increase test bitwidth to see some more interesting behavior
iksnagreb Aug 22, 2023
e0f3fcd
[Attention] Add RTL simulation unit test
iksnagreb Aug 23, 2023
969da0a
[Attention] Sketch multi-head splitting and merging custom ops
iksnagreb Nov 24, 2023
6abb537
[Attention] Add cppsim and python tests for head splitting and merging
iksnagreb Nov 28, 2023
eb07a2b
[Attention] Add RTL node execution to head splitting and merging
iksnagreb Nov 29, 2023
81f0e0c
[Attention] Simplify shapre inference and extend to rank-3 tenors
iksnagreb Nov 30, 2023
2df11c2
[Attention] Fix code generation, interface names and FIFO depths
iksnagreb Dec 1, 2023
4c60ce4
[Attention] Prevent absorbing thresholds into MVAU after forking matmul
iksnagreb Dec 1, 2023
b94aa7a
[Attention] Add interface name generator to head splitting and merging
iksnagreb Dec 1, 2023
f8af260
[Attention] Add default FIFO depths to head splitting and merging
iksnagreb Dec 4, 2023
0c772a4
[Attention] Refactor node execution to have a separate function per mode
iksnagreb Dec 4, 2023
fe48336
[Attention] Implement python mode node execution
iksnagreb Dec 5, 2023
f241b07
[Attention] Add missing out_bias of thresholds absorbed into attention
iksnagreb Dec 6, 2023
fbfa27f
[Attention] Fix python softmax sum axis
iksnagreb Dec 7, 2023
8155603
[Attention] Introduce ReplicateStream custom operation
iksnagreb Dec 8, 2023
8ecebb6
[Attention] Update dtype attribute of multi-heads and stream replication
iksnagreb Dec 12, 2023
1e932da
[Attention] Set default node execution mode to python
iksnagreb Dec 13, 2023
db412af
[Attention] Introduce 'const' attention mask mode
iksnagreb Dec 15, 2023
da6f663
[Attention] Add python exec of all and HLS exec of causal mask modes
iksnagreb Jan 9, 2024
d64bb8c
[Attention] Add "const" mask mode to input names
iksnagreb Jan 10, 2024
72f4792
[Attention] Add HLS code generation of constant attention masks
iksnagreb Jan 10, 2024
f8c0477
[Attention] Rework minimize_accumulator_width similar to MVAU
iksnagreb Jan 28, 2024
5311b9c
[Attention] Set HLS_CONSTEXPR_ENABLE in code generation templates
iksnagreb Jan 28, 2024
0a95208
[Attention] Improve stability of softmax by subtracting row-wise maximum
iksnagreb Jan 28, 2024
f84e5f4
[Attention] Fix threshold tensor code generation
iksnagreb Feb 7, 2024
c4a456e
Correct test function name fixing copy and paste error
iksnagreb Feb 8, 2024
dba8986
[Refactor] Patch all attention ops to work with new class hierarchy
iksnagreb Apr 4, 2024
fa13f46
[Refactor] Disentangle HWCustomOp and HLSBackend parts of attention ops
iksnagreb Apr 4, 2024
5f1ee21
[Refactor] Fix ScaledDotProductAttention C++ simulation test
iksnagreb Apr 4, 2024
d139c17
[Attention] Fix broadcasting of maximum subtraction in numpy softmax
iksnagreb Apr 4, 2024
f77ba1d
[Refactor] Fix Split/MergeMultiHeads and ReplicateStream C++/RTL tests
iksnagreb Apr 8, 2024
22891f5
[ReplicateStream] Introduce configurable PE parallelism for streams
iksnagreb Apr 8, 2024
117a1b3
[Streamline] Add CollapseRepeatedTranspose transformation
iksnagreb Apr 25, 2024
649087a
[Streamline] Add MoveTransposePastEltwise transformation
iksnagreb Apr 25, 2024
2b29bfc
[Streamline] Add RemoveIdentityReshape and RemoveIdentityTranspose
iksnagreb Apr 25, 2024
28f84a5
[Attention] Add Squeeze "cleanup" transformation
iksnagreb Apr 25, 2024
69dc954
[Attention] Add graph transformations for handling attention operators
iksnagreb Apr 25, 2024
6ae63f9
Add missing ModelWrapper import to reorder.py
iksnagreb Apr 25, 2024
15246c8
[Streamline] Fix eager access to potentially empty successors list
iksnagreb Apr 26, 2024
5eda0f6
[Attention] Implement get_exp_cycles for attention-related HWCustomOps
iksnagreb May 3, 2024
0b00f69
Add support for ReplicateStream_hls as a PE-operation to SetFolding
iksnagreb May 3, 2024
7fba682
[Attention] Add method to get the number of folded inputs
iksnagreb May 10, 2024
6207057
[Attention] Make use of resource type attributes for buffers and MACs
iksnagreb May 16, 2024
174c098
[Attention] Make use of resource type attributes for embedded thresholds
iksnagreb May 16, 2024
4f7072b
[Attention] Add resource attribute for the attention mask in const mode
iksnagreb May 16, 2024
2b9d94b
[Attention] Refactor RAM_STYLES dictionary
iksnagreb May 17, 2024
b5bd0ff
[Attention] Redirect RTL simulation of attention to Python execution
iksnagreb May 21, 2024
aa742c7
[Attention] Add missing constant mask mode to input shape query
iksnagreb Jun 8, 2024
2bf164a
[Attention] Fix Resource::URAM typo
iksnagreb Jun 14, 2024
95f29b0
[Attention] Add data layout checks to InferMultiHeads transformation
iksnagreb Aug 5, 2024
ca6cc33
Fix SplitMultiHeads shape inference is shape is None
iksnagreb Aug 7, 2024
9f90cce
[Streamline] Allow RemoveIdentityReshape for fork-nodes
iksnagreb Aug 7, 2024
a07815c
[Attention] Rework Squeeze to explicitly insert Squeeze operations
iksnagreb Aug 7, 2024
5548b49
[Streamline] Prevent MoveTransposePastEltwise from transposing scalars
iksnagreb Aug 28, 2024
a98f594
Merge remote-tracking branch 'xilinx/dev' into feature/attention
iksnagreb Jan 20, 2025
15963e0
[Deps] Add attention-hlslib dependency to fetch-repos.sh
iksnagreb Jan 20, 2025
6d56c61
Make Squeeze interact properly with Im2Col, Split and initializers
iksnagreb Jan 21, 2025
50544ef
[Streamline] Fix MoveTransposePastEltwise permutation
iksnagreb Jan 21, 2025
6cee1ec
[Deps] Update attention-hlslib dependency
iksnagreb Jan 28, 2025
1fa862c
Merge remote-tracking branch 'eki-project/dev' into feature/attention
iksnagreb Feb 6, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions fetch-repos.sh
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,7 @@ XIL_BDF_COMMIT="8cf4bb674a919ac34e3d99d8d71a9e60af93d14e"
RFSOC4x2_BDF_COMMIT="13fb6f6c02c7dfd7e4b336b18b959ad5115db696"
KV260_BDF_COMMIT="98e0d3efc901f0b974006bc4370c2a7ad8856c79"
EXP_BOARD_FILES_MD5="226ca927a16ea4ce579f1332675e9e9a"
ATTENTION_HLSLIB_COMMIT="afc9720f10e551e1f734e137b21bb6d0a8342177"

QONNX_URL="https://github.com/iksnagreb/qonnx.git"
FINN_EXP_URL="https://github.com/Xilinx/finn-experimental.git"
Expand All @@ -51,6 +52,7 @@ AVNET_BDF_URL="https://github.com/Avnet/bdf.git"
XIL_BDF_URL="https://github.com/Xilinx/XilinxBoardStore.git"
RFSOC4x2_BDF_URL="https://github.com/RealDigitalOrg/RFSoC4x2-BSP.git"
KV260_BDF_URL="https://github.com/Xilinx/XilinxBoardStore.git"
ATTENTION_HLSLIB_URL="https://github.com/iksnagreb/attention-hlslib.git"

QONNX_DIR="qonnx"
FINN_EXP_DIR="finn-experimental"
Expand All @@ -63,6 +65,7 @@ AVNET_BDF_DIR="avnet-bdf"
XIL_BDF_DIR="xil-bdf"
RFSOC4x2_BDF_DIR="rfsoc4x2-bdf"
KV260_SOM_BDF_DIR="kv260-som-bdf"
ATTENTION_HLSLIB_DIR="attention-hlslib"

# absolute path to this script, e.g. /home/user/bin/foo.sh
SCRIPT=$(readlink -f "$0")
Expand Down Expand Up @@ -126,6 +129,7 @@ fetch_repo $AVNET_BDF_URL $AVNET_BDF_COMMIT $AVNET_BDF_DIR
fetch_repo $XIL_BDF_URL $XIL_BDF_COMMIT $XIL_BDF_DIR
fetch_repo $RFSOC4x2_BDF_URL $RFSOC4x2_BDF_COMMIT $RFSOC4x2_BDF_DIR
fetch_repo $KV260_BDF_URL $KV260_BDF_COMMIT $KV260_SOM_BDF_DIR
fetch_repo $ATTENTION_HLSLIB_URL $ATTENTION_HLSLIB_COMMIT $ATTENTION_HLSLIB_DIR

# Can skip downloading of board files entirely if desired
if [ "$FINN_SKIP_BOARD_FILES" = "1" ]; then
Expand Down
8 changes: 7 additions & 1 deletion src/finn/custom_op/fpgadataflow/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -61,8 +61,9 @@ def register_custom_op(cls):

# Import the submodule containing the Unsqueeze operation
import finn.custom_op.fpgadataflow.unsqueeze

from finn.custom_op.fpgadataflow.addstreams import AddStreams
from finn.custom_op.fpgadataflow.attention import ScaledDotProductAttention
from finn.custom_op.fpgadataflow.attention_heads import MergeMultiHeads, SplitMultiHeads
from finn.custom_op.fpgadataflow.channelwise_op import ChannelwiseOp
from finn.custom_op.fpgadataflow.concat import StreamingConcat
from finn.custom_op.fpgadataflow.convolutioninputgenerator import (
Expand All @@ -77,6 +78,7 @@ def register_custom_op(cls):
from finn.custom_op.fpgadataflow.lookup import Lookup
from finn.custom_op.fpgadataflow.matrixvectoractivation import MVAU
from finn.custom_op.fpgadataflow.pool import Pool
from finn.custom_op.fpgadataflow.replicate_stream import ReplicateStream
from finn.custom_op.fpgadataflow.split import StreamingSplit
from finn.custom_op.fpgadataflow.streamingdataflowpartition import (
StreamingDataflowPartition,
Expand Down Expand Up @@ -116,3 +118,7 @@ def register_custom_op(cls):
custom_op["StreamingEltwise"] = StreamingEltwise
custom_op["StreamingMaxPool"] = StreamingMaxPool
custom_op["UpsampleNearestNeighbour"] = UpsampleNearestNeighbour
custom_op["ScaledDotProductAttention"] = ScaledDotProductAttention
custom_op["SplitMultiHeads"] = SplitMultiHeads
custom_op["MergeMultiHeads"] = MergeMultiHeads
custom_op["ReplicateStream"] = ReplicateStream
Loading
Loading