Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add x, y, and z shorthands to id and range classes #679

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

Pennycook
Copy link
Contributor

The numbering of dimensions in SYCL is aligned with the numbering of dimensions in ISO C++, such that the highest-numbered dimension is the fastest-moving.

This numbering is inconvenient when working with generic functions compatible with one-, two-, or three-dimensional ranges, since developers must account somehow for differences in the numbering of the dimensions. One common solution is to define helper functions called x(), y() and z() to encapsulate this logic. This solution can also assist with the migration of code from other languages (e.g., OpenCL and CUDA), and may provide a simpler mental model for SYCL developers working with images or other forms of graphics interop.

This commit adds x(), y() and z() functions directly to the id and range classes, providing a consistent way for SYCL developers to use this indexing pattern, rather than relying on each SYCL code base to define and maintain compatible index abstractions.


I've proposed this as a SYCL-Next feature because, like #633, this is merely introducing a shorthand for something that is already supported by all SYCL implementations. Going through the full KHR process for this feature would require either function names like khr_x() or clones of range and id in the khr:: namespace, which both feel like bad solutions to the problem.

I'd also like to point out that use of these shorthands could be combined with the improved group interface being discussed in #638, allowing developers to write things like item.id().x() to access the fastest-moving component of an item's id.

Finally, since there may be some confusion and I want to be very clear: I am not suggesting that we revisit the linearization equation for multi-dimensional SYCL quantities, and this is in no way a breaking change. x(), y() and z() are defined in terms of the existing dimension numbering and are thus fully backwards-compatible.

The numbering of dimensions in SYCL is aligned with the numbering of dimensions
in ISO C++, such that the highest-numbered dimension is the fastest-moving.

This numbering is inconvenient when working with generic functions compatible
with one-, two-, or three-dimensional ranges, since developers must account
somehow for differences in the numbering of the dimensions. One common solution
is to define helper functions called x(), y() and z() to encapsulate this
logic. This solution can also assist with the migration of code from other
languages (e.g., OpenCL and CUDA), and may provide a simpler mental model for
SYCL developers working with images or other forms of graphics interop.

This commit adds x(), y() and z() functions directly to the id and range
classes, providing a consistent way for SYCL developers to use this indexing
pattern, rather than relying on each SYCL code base to define and maintain
compatible index abstractions.
Copy link
Contributor

@psalz psalz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't id default to returning 0 instead of 1? Otherwise the linearization equation for some index idx inside a 1D kernel of range rg doesn't work:

linear_idx = idx.z() * rg.y() * rg.x() + idx.y() * rg.x() + idx.x()
           = 1       * 1      * rg.x() + 1       * rg.x() + idx.x()
           = 2 * rg.x() + id.x()

instead of idx.x().

In general I agree that working with ranges/ids of different dimensionalities can be annoying, particularly in generic contexts. That being said, the letters x/y/z/ may introduce some unwanted semantic associations; at least in 2D, I'd say x is generally considered to be the horizontal axis, and y the vertical. In my experience things can get pretty confusing if the code one is working with has a different notion of what x/y/z means (for example when processing an image in column-major order). Perhaps i() / j() / k() could serve as a more neutral alternative?

@TApplencourt
Copy link
Contributor

TApplencourt commented Dec 10, 2024

First, I agree it's one of the differences with SYCL that people struggle with.

I personally hate x,y,z, one day we will want to get away from 3d ("nobody" does Graphic with Graphic Processing Unit anymore... ) and will need to introduce w or whatever. We try to move away from this hard-coded way with the new way of dealing with group / item (to avoid subgroup, subsubgroup, subsubsubgroup), so it feel "wrong" to go back to this 'hard-codding' way here.

I will prefer id.$get_with_not_cpp_but_cuda_opencl_indexing(0) or something.

In order to produce the correct result from the linearization equation, id
components should default to 0 while range components should default to 1.
@Pennycook
Copy link
Contributor Author

Shouldn't id default to returning 0 instead of 1?

Yes, it should. Good catch! Fixed in d642a37.

That being said, the letters x/y/z/ may introduce some unwanted semantic associations; at least in 2D, I'd say x is generally considered to be the horizontal axis, and y the vertical.

Why do you say this is unwanted?

I might have misunderstood how images work in SYCL (I've not used them), but I think the situation we're in today is that the coordinates passed to functions like read() and write() are expressed in terms of x and y coordinates (via an instance of vec). Exposing x() and y() from id makes it easier for a developer to directly map from a work-item index to the corresponding coordinates in an image.

In my experience things can get pretty confusing if the code one is working with has a different notion of what x/y/z means (for example when processing an image in column-major order). Perhaps i() / j() / k() could serve as a more neutral alternative?

I'm not strongly opposed to this, but I do think x()/y()/z() is simpler. It's aligned with graphics and memory layout (as you said, x() is assumed to be horizontal, and C/C++ is row-major), and it's also the same as a very popular existing programing model.

I don't think other names are as intuitive: for example, it may not be obvious to everybody that i() is the contiguous dimension, because some developers start loop indices at i and then move to j/k/etc, whereas others will save i for the inner loop.

@Pennycook
Copy link
Contributor Author

I personally hate x,y,z, one day we will want to get away from 3d ("nobody" does Graphic with Graphic Processing Unit anymore... ) and will need to introduce w or whatever. We try to move away from this hard-coded way with the new way of dealing with group / item (to avoid subgroup, subsubgroup, subsubsubgroup), so it feel "wrong" to go back to this 'hard-codding' way here.

I wouldn't have a problem with introducing w for 4D. I think that would be logically consistent with these changes.

But if we were to move beyond 4 dimensions, I would propose that x(), y() and z() are only available for specializations of id and range where Dimensions <= 3 (or 4, if we add w). SYCL wouldn't have to keep inventing new names for each dimension, but developers working with three-dimensional data would still have access to the shorthands. I didn't write that down as part of this proposal because the specification currently limits Dimensions to 3.

Copy link
Contributor

@gmlueck gmlueck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you intentionally not adding these to item?

adoc/chapters/programming_interface.adoc Outdated Show resolved Hide resolved
@Pennycook
Copy link
Contributor Author

Are you intentionally not adding these to item?

Yes, this is deliberate. item encapsulates both an id and a range, so there's no single x component.

As the specification is currently written, the intended use would be item.get_id().x() or item.get_range().x(). If we decide to move forward with integrating #638 in to the specification, we could add new member functions to item to enable the shorter item.id().x() and item.range().x() syntax there too.

@psalz
Copy link
Contributor

psalz commented Dec 12, 2024

I might have misunderstood how images work in SYCL (I've not used them), but I think the situation we're in today is that the coordinates passed to functions like read() and write() are expressed in terms of x and y coordinates (via an instance of vec). Exposing x() and y() from id makes it easier for a developer to directly map from a work-item index to the corresponding coordinates in an image.

Sorry, I meant plain image data, not sycl::image.

In my experience things can get pretty confusing if the code one is working with has a different notion of what x/y/z means (for example when processing an image in column-major order). Perhaps i() / j() / k() could serve as a more neutral alternative?

I'm not strongly opposed to this, but I do think x()/y()/z() is simpler. It's aligned with graphics and memory layout (as you said, x() is assumed to be horizontal, and C/C++ is row-major), and it's also the same as a very popular existing programing model.

I don't think other names are as intuitive: for example, it may not be obvious to everybody that i() is the contiguous dimension, because some developers start loop indices at i and then move to j/k/etc, whereas others will save i for the inner loop.

If I have a 2D SYCL buffer, to get coalesced memory access in a 2D kernel I need to index into the buffer like so: my_accessor[id.y(), id.x()]. I would argue that this looks a bit strange, because we're used to spelling x-y-z in that order. So I'm not sure this is intuitive, and probably needs teaching.

That being said, the letters x/y/z/ may introduce some unwanted semantic associations; at least in 2D, I'd say x is generally considered to be the horizontal axis, and y the vertical.

Why do you say this is unwanted?

If I for example wanted to do vertical edge detection on an image by computing the horizontal gradient, in a row-major buffer I would do it like so:

gradient = my_image_accessor[id.y(), id.x() + 1] - my_image_accessor[id.y(), id.x() - 1]

If the data is stored in column-major format however,

gradient = my_image_accessor[id.y() + 1, id.x()] - my_image_accessor[id.y() - 1, id.x()]

I suddenly need to do the offset computation on y, even though I'm still computing the horizontal gradient - confusing!

Because the letters x/y/z carry this semantic association, one is easily misled to conflate two unrelated concepts, memory (or thread) layout, and the semantic relationship of the data (or threads) to the real world.

@Pennycook
Copy link
Contributor Author

Because the letters x/y/z carry this semantic association, one is easily misled to conflate two unrelated concepts, memory (or thread) layout, and the semantic relationship of the data (or threads) to the real world.

I can see the argument, but I think it's also possible to flip it: if x/y/z carries the semantic association, a developer is more likely to choose to represent their data in row-major format, improving both the readability and performance of their code.

Before I ask this next question, I want to emphasize that I'm not proposing this change to try and turn SYCL into CUDA -- my goal is to make it easier for developers who struggle with the re-numbering of dimensions. But since we know that CUDA uses the x/y/z naming, do you think that CUDA suffers from the same problem?

@psalz
Copy link
Contributor

psalz commented Dec 17, 2024

I can see the argument, but I think it's also possible to flip it: if x/y/z carries the semantic association, a developer is more likely to choose to represent their data in row-major format, improving both the readability and performance of their code.

Okay sure, if they can freely choose the format, that might be true.

Before I ask this next question, I want to emphasize that I'm not proposing this change to try and turn SYCL into CUDA -- my goal is to make it easier for developers who struggle with the re-numbering of dimensions. But since we know that CUDA uses the x/y/z naming, do you think that CUDA suffers from the same problem?

The issue with the reverse ordering ([z, y, x]) doesn't apply because CUDA doesn't have buffers of course, but otherwise I would say yes.

To be clear, this is not a major issue for me, I just wanted to point out potential downsides of this proposal.

To me the most intuitive way of expressing memory order is always in terms of "fastest/slowest changing index", and I like that so far SYCL has steered clear of using terminology like "row-major", which is rooted in conventions. If we look to C++, as far as I can tell, the C++23 specification also doesn't use the terms "row-major" or "column-major"; mdspan uses the neutral layout_left and layout_right.

The super descriptive but probably too verbose option would be to go with something like id::fastest(), id::middle() and id()::slowest, or shorthands id::f(), id::m() and id::s(). But then we would run into trouble if we ever decide to add another dimension.

@Pennycook
Copy link
Contributor Author

To be clear, this is not a major issue for me, I just wanted to point out potential downsides of this proposal.

I appreciate the feedback! I'm sorry if I'm coming across as confrontational.

To me the most intuitive way of expressing memory order is always in terms of "fastest/slowest changing index", and I like that so far SYCL has steered clear of using terminology like "row-major", which is rooted in conventions. If we look to C++, as far as I can tell, the C++23 specification also doesn't use the terms "row-major" or "column-major"; mdspan uses the neutral layout_left and layout_right.

I agree that bringing row-major vs column-major into this would be even more confusing.

I like how mdspan handled the layout issue, and I think we should try to follow that precedent with future buffer/accessor extensions. I don't think we would want to introduce the concept of layout to thread numbering, though.

The super descriptive but probably too verbose option would be to go with something like id::fastest(), id::middle() and id()::slowest, or shorthands id::f(), id::m() and id::s(). But then we would run into trouble if we ever decide to add another dimension.

I had a similar thought along these lines: I wondered whether id::fastest(0), id::fastest(1) and so on would work. That would avoid the naming issue for the other dimensions, but I agree it's still pretty verbose.

I'm also worried that it could be more confusing to define id<2>::fastest(2) as returning 0 than to define id<2>::z() as returning 0. The mapping onto x/y/z is a purely conceptual thing, so it's relatively straightforward to explain to somebody why an "unused" component defaults to 0. Explaining what id<D>::fastest(n) means for n >= D seems harder, to me.

Finally, I'm worried that introducing two different numeric ways to reference the same dimensions could get really confusing. Today, if we say "The second dimension of a 3D kernel" it's unambiguous; if we had get(2) and fastest(2), we'll have to start qualifying which direction we're counting in.

@psalz
Copy link
Contributor

psalz commented Dec 18, 2024

I had a similar thought along these lines: I wondered whether id::fastest(0), id::fastest(1) and so on would work. That would avoid the naming issue for the other dimensions, but I agree it's still pretty verbose.

I'm also worried that it could be more confusing to define id<2>::fastest(2) as returning 0 than to define id<2>::z() as returning 0. The mapping onto x/y/z is a purely conceptual thing, so it's relatively straightforward to explain to somebody why an "unused" component defaults to 0. Explaining what id<D>::fastest(n) means for n >= D seems harder, to me.

Finally, I'm worried that introducing two different numeric ways to reference the same dimensions could get really confusing. Today, if we say "The second dimension of a 3D kernel" it's unambiguous; if we had get(2) and fastest(2), we'll have to start qualifying which direction we're counting in.

Yes I agree with your concerns. Unfortunately I'm out of ideas; we should probably discuss this with the WG!

@psalz psalz added the Agenda To be discussed during a SYCL committee meeting label Dec 18, 2024
@tomdeakin tomdeakin removed the Agenda To be discussed during a SYCL committee meeting label Jan 9, 2025
@tomdeakin
Copy link
Contributor

Speaking personally here, the choice of defaulting to 1/0 for the range/id is sensible given how these APIs behave elsewhere (e.g., OpenCL get_global_id(uint) https://registry.khronos.org/OpenCL/specs/3.0-unified/html/OpenCL_C.html#work-item-functions and CUDA).

I can see the value of providing this directly, rather than asking user code to write the template magic needed to pull out the dimension from the template parameter of these objects.

Another aspect here is thanks to the CTAD stuff we make use of in SYCL, for a lot of codes you don't really have to know all the C++ mechanics that are going on under the hood, and so if this is a way to reduce what would otherwise be friction to adoption when migrating to SYCL, then that is probably a net win.

I think this is just a symptom of aligning the fastest-moving dimension with C++ which means picking out the "last" one is painful, and C++ doesn't give us a solution there either.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants