-
Notifications
You must be signed in to change notification settings - Fork 755
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SYCL] Move Bfloat16 math functions out of experimental #11506
Conversation
Signed-off-by: JackAKirk <[email protected]>
Signed-off-by: JackAKirk <[email protected]>
Signed-off-by: JackAKirk <[email protected]>
`isnan`, `ceil`, `floor`, `cos`, `sin`, `exp`, `exp2`, `exp10`, `log`, `log2`, | ||
`log10`, `rint`, `sqrt`, `rsqrt` and `trunc` SYCL floating point math functions. | ||
These functions can be used as element wise operations on matrices, supplementing | ||
the `bfloat16` support in the sycl_ext_oneapi_matrix extension. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This last sentence seems confusing because this extension does not add support to these math functions for joint_matrix
types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I can remove that sentence.
@@ -83,9 +83,21 @@ tolerate lower precision. Some implementations may still perform operations | |||
on this type using 32-bit math. For example, they may convert the `bfloat16` | |||
value to `float`, and then perform the operation on the 32-bit `float`. | |||
|
|||
This extension also adds `bfloat16` support to the `fma`, `fmin`, `fmax`, `fabs`, | |||
`isnan`, `ceil`, `floor`, `cos`, `sin`, `exp`, `exp2`, `exp10`, `log`, `log2`, | |||
`log10`, `rint`, `sqrt`, `rsqrt` and `trunc` SYCL floating point math functions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we do not need to list each math function here. Instead, let's say something like:
This extension also adds
bfloat16
support to several of the SYCL math functions, which are listed below in the section "Math Functions".
Duplicating the names here raises the possibility that the list may get out of date in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure sounds good.
The bfloat16 type is supported on all devices. DPC++ currently supports this | ||
type natively on Intel Xe HP GPUs and Nvidia GPUs with | ||
The bfloat16 type and math functions are supported on all devices. DPC++ | ||
currently supports this type natively on Intel Xe HP GPUs and Nvidia GPUs with | ||
Compute Capability >= SM80. On other devices, and in host code, it is emulated | ||
in software. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yikes! How is it that this extension does not have a feature-test macro? There should be a section "Feature test macro" as defined in the template.
Does the implementation define a feature-test macro, and we just forgot to document it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure, but I can add one.
|
||
template <size_t N> | ||
sycl::marray<bool, N> isnan(sycl::marray<bfloat16, N> x); | ||
} // namespace sycl::ext::oneapi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This synopsis does not match the form we use for the existing isnan
overloads. That existing form looks like:
template<typename NonScalar>
/*return-type*/ isnan(NonScalar x)
With the following specification:
Constraints: Available only if all of the following conditions are met:
NonScalar
ismarray
,vec
, or the__swizzled_vec__
type; andThe element type is
float
,double
, orhalf
.Returns: If
NonScalar
ismarray
, returns true for each element ofx
only ifx[i]
has a NaN value. IfNonScalar
isvec
or the__swizzled_vec__
type, returns -1 for each element ofx
ifx[i]
has a NaN value and
returns 0 otherwise.
I'd like this extension to use the same form, essentially broadening the constraint to allow NonScalar
to be marray
with element type bfloat16
.
I also think we should extend the existing functions in the sycl
namespace, rather than adding new functions in the sycl::ext::oneapi
namespace. This is still consistent with the extension naming guidelines because the bfloat16
type is defined in the extension namespace. Thus, there is no danger of conflict with another extension and users still know they are using an extension when calling these functions with bfloat16
.
A complicating factor is that DPC++ is in a state of transition from the old form of these math function to the new SYCL 2020 conformant form. Currently, DPC++ only defines the functions in the conformant form if the user opts-in by predefining a macro. I think the most reasonable path for the bfloat16 versions of these functions is to follow suit. By default, they will have the old non-conformant form, and they will only have the conformant form when the user pre-defines the macro. @steffenlarsen might have some input here, though.
I'm not sure how to handle this in the spec. One option is to add a general disclaimer NOTE saying that DPC++ currently implements the math functions as defined in this extension spec when the application predefines the macro. When the macro is not defined, the math functions accept bfloat16 parameters in a manner that is consistent with the standard types. We can then show an example of a typical function to illustrate the difference.
One other note of interest is that the published SYCL 2020 spec (revision 7) does not document the new form of these functions. The WG changed the specification after revision 7 was published. I plan to publish revision 8 within the next week, and that will show the new forms.
I also have a question about vec
support. The math functions currently support both marray
and vec
whenever a non-scalar is supported. This extension currently only supports bfloat16
with marray
. Why are we excluding support for vec
? I wonder if it would actually be easier to implement if we supported both marray
and vec
since that's how the other types are handled.
This comment applies, obviously, to all the other functions below too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All makes sense to me. For the vec question. I don't think it was excluded. I didn't add it initially when creating the first iteration of the bfloat16 math function extension because we did not have a reason to add vec when we had marray support, at least in the cuda backend; but I imagined that at some point other backends would likely motivate the addition of vec bfloat16 math functions.
Even if they are only added in order to have math functions for scalar, vec, and marray types that are consistent with the rest of the types, I see that this could make sense.
I can make such a change to the spec, and add a generic implementation for bfloat16 math functions for vec type in this PR?
Or perhaps I could delay this till later if you think that is better?
I suppose features can still be added to supported extensions so long as they are new additions rather than changes to existing functions/types etc?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After reflection, I think it would make sense to wait until the revision 8 is published, and then use the new math function definition patterns for the bfloat16 math functions too.
Since bfloat16 math functions are new, would it be better to just support them using the new math function definition patterns (forms)? As far as I understand it the move to the new form doesn't actually change how a user would define the function in practice, just that the new forms use a mixture of overloading and templating instead of the previous situation that just used templates?
If the above is the case it would seem overly complex to introduce multiple macros for bfloat16 math functions, especially if they form part of the extension for the bfloat16 class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After reflection, I think it would make sense to wait until the revision 8 is published, and then use the new math function definition patterns for the bfloat16 math functions too.
I expect revision 8 to be published soon. Yes, I think it would be good for this extension specification to use the same pattern and specification style that revision 8 has.
Since bfloat16 math functions are new, would it be better to just support them using the new math function definition patterns (forms)? As far as I understand it the move to the new form doesn't actually change how a user would define the function in practice, just that the new forms use a mixture of overloading and templating instead of the previous situation that just used templates?
Most application code won't be impacted by the move from old to new form, but there are some edge cases that will be visible to applications. I think I may have been premature about the macro. Our plan is to have a macro in the headers, which selects between the old and new forms of the builtin functions, but I think it is not implemented yet.
I agree that in theory the support for bfloat16 could be added with only the new form (not selected by the macro) because we have no legacy code to support with the old form. I was just thinking that it might be hard to implement this way. I assume the current implementation of the math functions has a set of constraints for the NonScalar
template parameter. Therefore, I was thinking that you would simply broaden these constraints to allow NonScalar
to be marray<bfloat16>
. Once we add the macro, wouldn't this mean that the bfloat16
support was also affected by the macro?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume the current implementation of the math functions has a set of constraints for the
NonScalar
template parameter.
Therefore, I was thinking that you would simply broaden these constraints to allowNonScalar
to bemarray<bfloat16>
. > Once we add the macro, wouldn't this mean that thebfloat16
support was also affected by the macro?
At the moment the bfloat16 math functions support marray<bfloat16, N>
where {N} means any positive value of size_t type.
So as I understand it this wouldn't affect bfloat16 support.
However as you pointed out before there is no sycl::vec
support for bfloat16. I can only guess at whether Intel etc will want to introduce this or not. In relation to this question:
- I note that SYCLomatic is currently converting Nvidia's natively supported vec2 bfloat16 type to a sycl::marray of size 2: https://github.com/oneapi-src/SYCLomatic/pull/800/files#diff-201c4634f29044b5cca5bcd4a23eddf27ea55f77447c9ecf30e4c38f166a0567R422
- I know that at least intel cpus support bfloat16, but it isn't clear to me whether they would prefer sycl::vec types for bfloat16?
Although the vec question would as far as I can tell potentially just be an addition that I suppose could be added later; it is one of a few questions that seem worthwhile trying to address sooner rather than later.
Another issue that is not clear to me is where the bfloat16 functions in the intel::math
namespace fit in relation to this bfloat16 extension. To be more concrete here are a few questions I have:
- Will conversion functions that are translated from corresponding sycl bfloat16 library functions in syclomatic ([SYCLomatic] Support migration of 67 bf16 precision conversion APIs. oneapi-src/SYCLomatic#1351) become part of the bfloat16 extension? Note that currently there are no implementations for these functions in the cuda backend.
- Will e.g. bfloat16 arithmetic functions that not match
sycl::math
functions, but are presumably important for the target application for which bfloat16 was created (deep learning) be added? I note that there areintel::math
functions forhalf
matching some of these cuda functions 1:1 ( such as__hfma_relu
, see https://github.com/intel/llvm/blob/sycl/sycl/include/sycl/ext/intel/math/imf_half_trivial.hpp). There are also syclomatic mappings for these functions: oneapi-src/SYCLomatic@6b8429a. - Is the long term plan that bfloat16 type and associated functions be a core part of sycl, or available as a header only library?
Although the existing bfloat16 math functions that are detailed in this PR appear to be uncontroversial, maybe it makes sense to get clearer answers to the above before moving any bfloat16 math functions out of experimental.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can only answer this question:
Is the long term plan that bfloat16 type and associated functions be a core part of sycl, or available as a header only library?
Yes, I think we should consider eventually adding bfloat16 as a core part of SYCL. We don't want it to be a header-only library because it needs some underlying h/w support for efficiency.
We need someone to be the "owner" of the bfloat16 feature, who can decide what functions make sense to add to this extension.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM from RT source code side. Will approve after Greg's comments fix.
Removed overly verbose list of math functions supported in intro. Signed-off-by: JackAKirk <[email protected]>
This pull request is stale because it has been open 180 days with no activity. Remove stale label or comment or this will be automatically closed in 30 days. |
This pull request was closed because it has been stalled for 30 days with no activity. |
Bfloat16 math functions are generally supported via software implementation as described in the bfloat16 extension document.
These APIs are straightforward and should be future proof, as such it makes sense to mark them experimental ASAP.
This PR moves bfloat16 math functions back into the bfloat16 extension, and resultantly out of experimental namespace.
This makes it more straightforward to allow dpct to access bfloat16 optimizations available on intel and Nvidia hardware: oneapi-src/SYCLomatic#1341