Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AC Int Tutorial #1

Open
wants to merge 11 commits into
base: master
Choose a base branch
from
Open

AC Int Tutorial #1

wants to merge 11 commits into from

Conversation

tiwaria1
Copy link
Owner

@tiwaria1 tiwaria1 commented Oct 5, 2021

Internal review of ac_int tutorial. This is a port of the HLS ac_int/ac_int_basic_ops tutorial.

Testing:
[DONE] Linux Local compile
[TODO] Linux Local compile with CMAKE
[TODO] Linux Regtest
[TODO] Windows Local compile with CMAKE
[TODO] Windows Regtest

class SetBitSlice;

using ac_int4 = ac_intN::int4;
using ac_int14 = ac_intN::int14;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These typedefs seem unnecessary, since we already have them in the ac_int header file.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will let Paul weigh in on this one as well. I added these because they simplify the functions below and because I have seen non oneAPI code samples do it. I don't have a strong preference either way.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @paul-white-intel please review this part.

Copy link

@whitepau whitepau Oct 6, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with Ajay. I would use typedefs if you wanted to wrap a particular parameterization of ac_int, e.g. ac_int<14, false>, but if you're using the convenience type, then it's kind of redundant. What's the point of the convenience type then? :)

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To me, the fewer characters looked better. I didn't want to see ac_intN:: everywhere. If it doesn't add any value in your opinion, I will remove them.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

then when you typedef, I think you should use the ac_int<x, b> parameterization.

accessor x{inp1, h, read_only};
accessor y{inp2, h, read_only};
accessor res{result, h, write_only};
h.single_task<Mult>([=] { res[0] = x[0] * y[0]; });

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we instead use one function to demonstrate all of these? I’m worried we’re adding a lot of boilerplate to show something relatively simple.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps one function per class of operation?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tanner pointed this out as well. I want to know what Paul thinks about this. One function can be used for the add, sub, mult operations - it could even be just one kernel.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @paul-white-intel please comment on this one as well, thank you!

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can make one function for everything using the right C++ fun. I can chat test something out in Godbolt and send it to Abhishek.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like having them separate from an IPA point of view; it makes the reports trivial to understand.

An alternative would be to put each of the functions in a separate CPP file, like I did for each of the parts here: https://github.com/intel-sandbox/whitepau.simple-oneapi

I’m worried we’re adding a lot of boilerplate to show something relatively simple.

this is the nature of SYCL. lots of boilerplate.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say no to extra files, since it doesn't actually fix the problem, but rather distribute it to different files.

this is the nature of SYCL. lots of boilerplate.

Untrue. Poorly written SYCL has lots of boilerplate. SYCL has more boilerplate than HLS, sure, but that is because it does more than an HLS component. C++ can have lots of boiler plate too, if it is written poorly. We shouldn't avoid using common C++ features because we are worried about our users not understanding C++. If we think there is a gap, we should explain what is happening instead of avoiding a method to improve code reuse.

// create the SYCL device queue
queue q(selector, dpc_common::exception_handler);

// Initialize two random ints

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use a fixed srand() value so that we’re not truly random so it makes it easier to reproduce results here.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

Copy link

@whitepau whitepau Oct 6, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would not use rand() at all... I would want to explicitly show a negative value and a positive value.
Also, why not use ac_int types for t1 and t2 instead of native int types? the truncation should happen automatically.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not use ac_int types for t1 and t2 instead of native int types?
I think it is to emphasize that the int vs ac_int comparison

How about I replace it with :

int t1 = 624
int t2 = -254
{
  int golden = t1  + t2;
  ac_int14 a = t1;
  ac_int14 b = t2;
  ac_int15 c;
  TestAdd(q, a, b, c);
  ...
}

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can this be written as

ac_int14 a = 624;
ac_int14 b = -254;
{
  int golden = a  + b;
  ac_int15 c;
  TestAdd(q, a, b, c);
  ...
}

make fpga
```

3. (Optional) As the above hardware compile may take several hours to complete, FPGA precompiled binaries (compatible with Linux* Ubuntu* 18.04) can be downloaded <a href="https://iotdk.intel.com/fpga-precompiled-binaries/latest/ac_int.fpga.tar.gz" download>here</a>.
Copy link

@whitepau whitepau Oct 6, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we offer precompiled binaries for all the code samples? I don't recall seeing this before.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes

```cpp
#include <sycl/ext/intel/ac_types/ac_int.hpp>
```
Additionally, you must use the flag `-qactypes` in order to ensure that the headers are correctly included.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cue the 'You didn't have to do that in HLS' complaints :)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You didn't have to for SYCL before either.
I think the change that resulted in this would affect HLS too, no?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, HLS branches are separate hence HLS is not affected.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. I am saying, if HLS wasn't being deprecated, it would have been updated too.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tyoungsc that... doesn't make it better.

do you at least get a clear actionable error message if you try to use an ac-type without including -qactypes?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You'd get an include error - it will not be able to find the ac_types header.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm. does the error literally say, "make sure you compile with -qactypes"? otherwise, I can see this causing problems.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately no. But that is clearly highlighted on the first page of the AC types Documentation. Let me see what we can do to generate that error message.

- If the user knows that the shift direction and that the shift value is always positive, it is advisable to use an unsigned datatype for the shift value to obtain better QoR.
- Shift values of greater than the width of the datatypes are treated as a shift equal to the width of the datatype.

Further, the operation can be done more efficiently by specifying the amount to shift with the smallest possible ac_int.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a similar follow-on to your recommendation to use unsigned datatypes for shift operators, so it should be a bullet in the list above.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

## License

Code samples are licensed under the MIT license. See
[License.txt](https://github.com/oneapi-src/oneAPI-samples/blob/master/License.txt) for details.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need a license for the ac_data_types_ref.pdf?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me follow up on this.

std::cout << "\nBitwise Operations:\n";
// Shift operator
{
t1 = rand() & ((1 << 13) - 1);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does it make sense to put the testbench code into the TestXXX() functions? that would make the main() function tidier. I believe the main point of separating them in the hls version was to separate the device code from testbench code, but that abstraction doesn't really work in oneAPI anyway, since the TestXXX() functions already contain host code.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, I think that makes sense. I'll update it once we reach consensus on whether or not to use templated functions for the TestXXX() functions.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could make functions that accept accessors and then all of the code in the "single_task" lamda would be a call to a function (passing it accessors). Then your functions would be just device code.

@@ -11,21 +11,21 @@ The [oneAPI Programming Guide](https://software.intel.com/en-us/oneapi-programmi
| OS | Linux* Ubuntu* 18.04/20.04, RHEL*/CentOS* 8, SUSE* 15; Windows* 10
| Hardware | Intel® Programmable Acceleration Card (PAC) with Intel Arria® 10 GX FPGA <br> Intel® FPGA Programmable Acceleration Card (PAC) D5005 (with Intel Stratix® 10 SX) <br> Intel® FPGA 3rd party / custom platforms with oneAPI support <br> *__Note__: Intel® FPGA PAC hardware is only compatible with Ubuntu 18.04*
| Software | Intel® oneAPI DPC++ Compiler <br> Intel® FPGA Add-On for oneAPI Base Toolkit
| What you will learn | Using the ac_int data-type for basic operations <br> Efficiently using the left shift operation <br> Setting and reading certain bits of an ac_int number
| What you will learn | Using the `ac_int` data-type for basic operations <br> Efficiently using the left shift operation <br> Setting and reading certain bits of an `ac_int` number
| Time to complete | 20 minutes
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do we quantify this 'time to complete'? I think all the code samples are just 20 minutes :)

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I view it as time to go through the README and observe the reports.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct, it is roughly "time to read to README, not including HW compile time". This is a crude estimate, but gives the reader a decent idea.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants