-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AC Int Tutorial #1
base: master
Are you sure you want to change the base?
Conversation
DirectProgramming/DPC++FPGA/Tutorials/Features/ac_types/ac_int/src/CMakeLists.txt
Outdated
Show resolved
Hide resolved
DirectProgramming/DPC++FPGA/Tutorials/Features/ac_types/ac_int/src/CMakeLists.txt
Outdated
Show resolved
Hide resolved
DirectProgramming/DPC++FPGA/Tutorials/Features/ac_types/ac_int/src/ac_int.cpp
Outdated
Show resolved
Hide resolved
DirectProgramming/DPC++FPGA/Tutorials/Features/ac_types/ac_int/src/ac_int.cpp
Outdated
Show resolved
Hide resolved
DirectProgramming/DPC++FPGA/Tutorials/Features/ac_types/ac_int/src/ac_int.cpp
Outdated
Show resolved
Hide resolved
DirectProgramming/DPC++FPGA/Tutorials/Features/ac_types/ac_int/README.md
Outdated
Show resolved
Hide resolved
DirectProgramming/DPC++FPGA/Tutorials/Features/ac_types/ac_int/sample.json
Show resolved
Hide resolved
DirectProgramming/DPC++FPGA/Tutorials/Features/ac_types/ac_int/src/ac_int.cpp
Outdated
Show resolved
Hide resolved
DirectProgramming/DPC++FPGA/Tutorials/Features/ac_types/ac_int/src/ac_int.cpp
Outdated
Show resolved
Hide resolved
DirectProgramming/DPC++FPGA/Tutorials/Features/ac_types/ac_int/src/ac_int.cpp
Show resolved
Hide resolved
class SetBitSlice; | ||
|
||
using ac_int4 = ac_intN::int4; | ||
using ac_int14 = ac_intN::int14; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These typedefs seem unnecessary, since we already have them in the ac_int header file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will let Paul weigh in on this one as well. I added these because they simplify the functions below and because I have seen non oneAPI code samples do it. I don't have a strong preference either way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @paul-white-intel please review this part.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with Ajay. I would use typedef
s if you wanted to wrap a particular parameterization of ac_int
, e.g. ac_int<14, false>
, but if you're using the convenience type, then it's kind of redundant. What's the point of the convenience type then? :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To me, the fewer characters looked better. I didn't want to see ac_intN:: everywhere. If it doesn't add any value in your opinion, I will remove them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
then when you typedef
, I think you should use the ac_int<x, b>
parameterization.
accessor x{inp1, h, read_only}; | ||
accessor y{inp2, h, read_only}; | ||
accessor res{result, h, write_only}; | ||
h.single_task<Mult>([=] { res[0] = x[0] * y[0]; }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we instead use one function to demonstrate all of these? I’m worried we’re adding a lot of boilerplate to show something relatively simple.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps one function per class of operation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tanner pointed this out as well. I want to know what Paul thinks about this. One function can be used for the add, sub, mult operations - it could even be just one kernel.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @paul-white-intel please comment on this one as well, thank you!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can make one function for everything using the right C++ fun. I can chat test something out in Godbolt and send it to Abhishek.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like having them separate from an IPA point of view; it makes the reports trivial to understand.
An alternative would be to put each of the functions in a separate CPP file, like I did for each of the parts here: https://github.com/intel-sandbox/whitepau.simple-oneapi
I’m worried we’re adding a lot of boilerplate to show something relatively simple.
this is the nature of SYCL. lots of boilerplate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd say no to extra files, since it doesn't actually fix the problem, but rather distribute it to different files.
this is the nature of SYCL. lots of boilerplate.
Untrue. Poorly written SYCL has lots of boilerplate. SYCL has more boilerplate than HLS, sure, but that is because it does more than an HLS component. C++ can have lots of boiler plate too, if it is written poorly. We shouldn't avoid using common C++ features because we are worried about our users not understanding C++. If we think there is a gap, we should explain what is happening instead of avoiding a method to improve code reuse.
// create the SYCL device queue | ||
queue q(selector, dpc_common::exception_handler); | ||
|
||
// Initialize two random ints |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use a fixed srand() value so that we’re not truly random so it makes it easier to reproduce results here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would not use rand()
at all... I would want to explicitly show a negative value and a positive value.
Also, why not use ac_int
types for t1
and t2
instead of native int
types? the truncation should happen automatically.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not use ac_int types for t1 and t2 instead of native int types?
I think it is to emphasize that the int vs ac_int comparison
How about I replace it with :
int t1 = 624
int t2 = -254
{
int golden = t1 + t2;
ac_int14 a = t1;
ac_int14 b = t2;
ac_int15 c;
TestAdd(q, a, b, c);
...
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can this be written as
ac_int14 a = 624;
ac_int14 b = -254;
{
int golden = a + b;
ac_int15 c;
TestAdd(q, a, b, c);
...
}
DirectProgramming/DPC++FPGA/Tutorials/Features/ac_types/ac_int/src/ac_int.hpp
Outdated
Show resolved
Hide resolved
DirectProgramming/DPC++FPGA/Tutorials/Features/ac_types/ac_int/README.md
Outdated
Show resolved
Hide resolved
DirectProgramming/DPC++FPGA/Tutorials/Features/ac_types/ac_int/README.md
Outdated
Show resolved
Hide resolved
make fpga | ||
``` | ||
|
||
3. (Optional) As the above hardware compile may take several hours to complete, FPGA precompiled binaries (compatible with Linux* Ubuntu* 18.04) can be downloaded <a href="https://iotdk.intel.com/fpga-precompiled-binaries/latest/ac_int.fpga.tar.gz" download>here</a>. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we offer precompiled binaries for all the code samples? I don't recall seeing this before.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes
DirectProgramming/DPC++FPGA/Tutorials/Features/ac_types/ac_int/README.md
Outdated
Show resolved
Hide resolved
```cpp | ||
#include <sycl/ext/intel/ac_types/ac_int.hpp> | ||
``` | ||
Additionally, you must use the flag `-qactypes` in order to ensure that the headers are correctly included. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cue the 'You didn't have to do that in HLS' complaints :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You didn't have to for SYCL before either.
I think the change that resulted in this would affect HLS too, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, HLS branches are separate hence HLS is not affected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right. I am saying, if HLS wasn't being deprecated, it would have been updated too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tyoungsc that... doesn't make it better.
do you at least get a clear actionable error message if you try to use an ac-type without including -qactypes
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You'd get an include error - it will not be able to find the ac_types header.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm. does the error literally say, "make sure you compile with -qactypes
"? otherwise, I can see this causing problems.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately no. But that is clearly highlighted on the first page of the AC types Documentation. Let me see what we can do to generate that error message.
DirectProgramming/DPC++FPGA/Tutorials/Features/ac_types/ac_int/README.md
Outdated
Show resolved
Hide resolved
DirectProgramming/DPC++FPGA/Tutorials/Features/ac_types/ac_int/README.md
Outdated
Show resolved
Hide resolved
DirectProgramming/DPC++FPGA/Tutorials/Features/ac_types/ac_int/README.md
Outdated
Show resolved
Hide resolved
- If the user knows that the shift direction and that the shift value is always positive, it is advisable to use an unsigned datatype for the shift value to obtain better QoR. | ||
- Shift values of greater than the width of the datatypes are treated as a shift equal to the width of the datatype. | ||
|
||
Further, the operation can be done more efficiently by specifying the amount to shift with the smallest possible ac_int. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a similar follow-on to your recommendation to use unsigned datatypes for shift operators, so it should be a bullet in the list above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated.
DirectProgramming/DPC++FPGA/Tutorials/Features/ac_types/ac_int/README.md
Outdated
Show resolved
Hide resolved
## License | ||
|
||
Code samples are licensed under the MIT license. See | ||
[License.txt](https://github.com/oneapi-src/oneAPI-samples/blob/master/License.txt) for details. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need a license for the ac_data_types_ref.pdf?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me follow up on this.
std::cout << "\nBitwise Operations:\n"; | ||
// Shift operator | ||
{ | ||
t1 = rand() & ((1 << 13) - 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does it make sense to put the testbench code into the TestXXX()
functions? that would make the main()
function tidier. I believe the main point of separating them in the hls version was to separate the device code from testbench code, but that abstraction doesn't really work in oneAPI anyway, since the TestXXX()
functions already contain host code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea, I think that makes sense. I'll update it once we reach consensus on whether or not to use templated functions for the TestXXX() functions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could make functions that accept accessors and then all of the code in the "single_task" lamda would be a call to a function (passing it accessors). Then your functions would be just device code.
@@ -11,21 +11,21 @@ The [oneAPI Programming Guide](https://software.intel.com/en-us/oneapi-programmi | |||
| OS | Linux* Ubuntu* 18.04/20.04, RHEL*/CentOS* 8, SUSE* 15; Windows* 10 | |||
| Hardware | Intel® Programmable Acceleration Card (PAC) with Intel Arria® 10 GX FPGA <br> Intel® FPGA Programmable Acceleration Card (PAC) D5005 (with Intel Stratix® 10 SX) <br> Intel® FPGA 3rd party / custom platforms with oneAPI support <br> *__Note__: Intel® FPGA PAC hardware is only compatible with Ubuntu 18.04* | |||
| Software | Intel® oneAPI DPC++ Compiler <br> Intel® FPGA Add-On for oneAPI Base Toolkit | |||
| What you will learn | Using the ac_int data-type for basic operations <br> Efficiently using the left shift operation <br> Setting and reading certain bits of an ac_int number | |||
| What you will learn | Using the `ac_int` data-type for basic operations <br> Efficiently using the left shift operation <br> Setting and reading certain bits of an `ac_int` number | |||
| Time to complete | 20 minutes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do we quantify this 'time to complete'? I think all the code samples are just 20 minutes :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I view it as time to go through the README and observe the reports.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct, it is roughly "time to read to README, not including HW compile time". This is a crude estimate, but gives the reader a decent idea.
Internal review of ac_int tutorial. This is a port of the HLS ac_int/ac_int_basic_ops tutorial.
Testing:
[DONE] Linux Local compile
[TODO] Linux Local compile with CMAKE
[TODO] Linux Regtest
[TODO] Windows Local compile with CMAKE
[TODO] Windows Regtest