[RFC] Say something about asynchronous behavior on host #685

etomzak · 2025-01-06T19:36:01Z

The idea that (most practical) SYCL implementations will do some things asynchronously on the host (either by using a worker thread spawned by the SYCL runtime, or by relying on a worker thread owned by a backend, or both) is central to SYCL. The idea is implicit throughout the whole spec, but there isn't a single place where it's authoritatively spelled out.

The first place where the word asynchronous appears is in §3.9.9 Error handling, and IMO this is logically too late in the spec. Error handling shouldn't be used to introduce the idea of asynchronicity; the reader should go into the error handling section already understanding that SYCL has asynchronicity.

I suspect that the idea of asynchronicity is so obvious to everyone who's worked on the SYCL spec that it's been taken for granted, but I don't think we should assume that it will be obvious to all readers.

I've taken a stab at adding a (hopefully uncontroversial) paragraph to §3.7.1. SYCL application execution model just to spell out that yes, some things might happen asynchronously. I think there's scope for folding the paragraph into the previous subsection, or expanding it, or moving it somewhere else entirely. There are a number of places in the spec (like §3.9.9 Error handling) that could be made clearer and more succinct if they were to link to this new paragraph. There could also be scope for tying in the wording on asynchronicity with some of the forward progress wording.

I'm curious to hear people's thoughts.

TApplencourt · 2025-01-06T23:18:28Z

First thanks for the PR :) I Agree this needs some clarification!

But I think we are mixing up 2 things:

One is to clarify the "asynchronicity" / "non-blocking" of the SYCL API. I guess we should converge on one term. We said stuff like "All member functions of the [context] class are synchronous" but also "These calls will be non-blocking on the host, but enqueue operations to the queue that the command group is submitted to."
The second is to give some implementation details if that helps people understands (the thing you did and that looks IMO good)

Pennycook · 2025-01-07T10:48:37Z

One is to clarify the "asynchronicity" / "non-blocking" of the SYCL API. I guess we should converge on one term. We said stuff like "All member functions of the [context] class are synchronous" but also "These calls will be non-blocking on the host, but enqueue operations to the queue that the command group is submitted to."

I agree we need to be careful here. ISO C++ defines "block" (here) as:

wait for some condition (other than for the implementation to execute the execution steps of the thread of execution) to be satisfied before continuing execution past the blocking operation

The latest ISO C++ draft (C++26) additionally contains a new definition of "asynchronous operation" (here) which is too large to cut-and-paste. But it contains some things that I suspect we might not want to inherit: there's a note that asynchronous operations can execute synchronously in the host thread (which I don't think holds for SYCL commands); and asynchronous operations are defined entirely in terms of senders and receivers.

I've not had a chance to read through the entire draft and I may be misinterpreting these new terms, but I've seen enough to convince myself that we need to pay close attention to what's happening in C++26 before making big changes to the way we describe SYCL's execution model.

etomzak · 2025-01-07T11:35:15Z

* One is to clarify the "asynchronicity"  / "non-blocking" of the SYCL API.  I
  guess we should converge on one term. We said stuff like "All member
  functions of the [context] class are synchronous" but also "These calls will
  be non-blocking on the host, but enqueue operations to the queue that the
  command group is submitted to."

Interesting. I think you're thinking of a slightly different problem than what I had in mind (which is another good reason to try to clarify the terminology).

What I had in mind is this: Assume an OpenCL backend and some OpenCL function, say clSetMemObjectDestructorCallback() (I just picked a function at random). It's an arcane function that a SYCL implementation might use to "manage a backend API resource" for "reasons." What those reasons are and when a SYCL implementation might call the function are both implementation-defined, and the SYCL user shouldn't need to worry about it. A reasonable SYCL implementation could call this function from an application thread or from a SYCL implementation worker thread (or both). That's also implementation-defined. What the SYCL user needs to understand is that a SYCL implementation might be "managing backend resources" (regardless of what exactly that means) from both user threads and implementation worker threads.

In the specific case of error handling, what all this means is that an error reported by clSetMemObjectDestructorCallback() could arrive to the SYCL user as either a synchronous or an asynchronous exception. That's what §3.9.9 is saying in a roundabout way; I just think the idea that the SYCL runtime is managing backend resources both synchronously and asynchronously needs to be fleshed out earlier in the spec.

The way that the SYCL spec currently uses "member functions are asynchronous" and "calls will be non-blocking" is a related but different problem. IMO the salient piece of information the spec should convey is whether a function is blocking or non-blocking. What work a function performs synchronously and what work the function hands off to a worker thread to perform asynchronously should be implementation-defined.

gmlueck · 2025-01-07T14:27:42Z

IMO the salient piece of information the spec should convey is whether a function is blocking or non-blocking.

I agree with this. However, I don't see how the text you are proposing in this PR makes this more clear. It seems like the paragraph you are adding just gives some examples about how an implementation might work. It basically says that an implementation might have an internal thread that does some of the work. I don't see why we should put that in the spec. It's not required for an implementation to have such an internal thread, and I don't see how we could add any sort of CTS to verify the paragraph that you are adding.

FWIW, I do think the spec needs to be more clear about blocking and non-blocking behavior and also about the forward progress guarantees of commands (kernel invocations). At present, it's unclear whether it is legal for a SYCL kernel to synchronize with another kernel or with a host thread via atomic operations (e.g. spinning on a lock). This is related to the blocking behavior of certain host APIs because you need to understand exactly what "blocking" means in these cases in order to avoid deadlock scenarios.

etomzak · 2025-01-07T17:08:25Z

IMO the salient piece of information the spec should convey is whether a function is blocking or non-blocking.

I agree with this. However, I don't see how the text you are proposing in this PR makes this more clear.

That's my comment on the issue Thomas identified, which is different from the one I'd like to address in this PR.

It's not required for an implementation to have such an internal thread, and I don't see how we could add any sort of CTS to verify the paragraph that you are adding.

I'm using the RFC 2119 definition of may. A SYCL implementation is allowed to manage a backend API both synchronously and asynchronously. A SYCL application is not allowed to make assumptions about whether a particular act of backend API management occurs synchronously or asynchronously (unless explicitly specified elsewhere in the spec or in a backend spec). Both of these statements are important, but ATM neither is clear in the spec. If a reader reads the preceding section (§3.7.1.1) on all the things that a SYCL implementation manages and combines that with the "All member functions of the [context] class are synchronous"-type statements, the reader could easily come away with some very confused ideas about what happens synchronously and asynchronously in SYCL.

By far the clearest section of the spec that explains the asynchronous nature of SYCL is §4.13.1. Error handling rules. That information doesn't belong in a dark corner of the programming interface chapter; that information belongs front and center in the architecture chapter.

etomzak · 2025-01-07T17:26:55Z

I've not had a chance to read through the entire draft and I may be misinterpreting these new terms, but I've seen enough to convince myself that we need to pay close attention to what's happening in C++26 before making big changes to the way we describe SYCL's execution model.

I agree that C++26 alignment sounds hairy and we need to tread carefully. As a minimum, though, I think it should be possible to go through the SYCL spec as it is, collect the statements it already makes about asynchronous behavior, and summarize them in the SYCL Application Execution Model section where they belong. Most of the important information is currently spread throughout chapter 4 (in the queue and error handling sections), so a reader needs to know to look for that information and to piece it together themselves.

gmlueck · 2025-01-07T17:54:35Z

A SYCL application is not allowed to make assumptions about whether a particular act of backend API management occurs synchronously or asynchronously (unless explicitly specified elsewhere in the spec or in a backend spec).

I notice you say "act of backend API management". Are you concerned about applications that do interop between SYCL and the underlying backend? If that is the thing you want to clarify, then I think we need to add something to the backend interop specification(s).

If you are instead talking about SYCL APIs, then I disagree. The SYCL spec should make it clear which APIs are synchronous and which are asynchronous. For those that are asynchronous, the spec should clearly state what aspect of the API asynchronous. Applications should be able to rely on this, assuming that the implementation obeys the synchronous / asynchronous behavior that is specified.

etomzak · 2025-01-07T19:39:14Z

A SYCL application is not allowed to make assumptions about whether a particular act of backend API management occurs synchronously or asynchronously (unless explicitly specified elsewhere in the spec or in a backend spec).

I notice you say "act of backend API management". Are you concerned about applications that do interop between SYCL and the underlying backend? If that is the thing you want to clarify, then I think we need to add something to the backend interop specification(s).

I'm concerned with default assumptions that users and implementers can make in the absence of a specific backend spec. I'm using backend API management in the same way that it's used in §3.7.1.1.

If you are instead talking about SYCL APIs, then I disagree. The SYCL spec should make it clear which APIs are synchronous and which are asynchronous. For those that are asynchronous, the spec should clearly state what aspect of the API asynchronous. Applications should be able to rely on this, assuming that the implementation obeys the synchronous / asynchronous behavior that is specified.

What are your definitions for synchronous API and asynchronous API? Where are these definitions coming from?

TApplencourt · 2025-01-07T20:52:37Z

asynchronous operations can execute synchronously in the host thread (which I don't think holds for SYCL commands)

I don't know. It goes back to the "forward-progress" guarantee that we are promising.

bool run = True;
Q.single_task() { while (run) {} };
run  = False;
Q.wait();

I think we want this code to work... Does this code work because we make some "promise" that asynchronous means it doesn't execute synchronously in the host thread, or does it work "just" because we forbid "eager" execution?
Or do we say nothing, just that it should work no matter how implementation deals with it (but then how can we define what are correct code?)?

(of course, the same question is old if we are using "normal" command + atomic in some kind of USM scenario, as Greg pointed out).

In short: It's complicated. At least we should clarify what we mean by "blocking / non-blocking " or " sync / async". As a first step, I think we need only one term and not the two currently. 🤷🏽

Pennycook · 2025-01-08T09:41:09Z

bool run = True;
Q.single_task() { while (run) {} };
run  = False;
Q.wait();

There are a bunch of corner-cases in this snippet, but I think what you're asking is really just about whether the while loop is guaranteed to start executing before the call to Q.wait().

Just as you said, I believe that the SYCL specification as written forbids "eager" execution in the host thread: Section 3.9.11 says that submission of work does not block the host, which I interpret to mean that the while loop is not allowed to start executing in the host thread (until the host thread calls Q.wait()). It is valid for the while loop to begin executing immediately, but only if it executes in another thread.

From a correctness standpoint, a developer has to assume that the while loop will not begin executing until Q.wait() is called.

The reason I'm worried is that the new definition of "asynchronous operation" in C++26 seems to imply that an asynchronous operation should be allowed to execute in the host thread. If anybody has more information about how the ISO C++ committee decided upon this wording, I'd be interested to hear about it. I'll also ask around to see if I'm misreading anything here.

etomzak · 2025-01-08T12:34:36Z

I think we're getting a little sidetracked. To what extent SYCL should/will align with the new C++ term asynchronous operation is an interesting question, but we don't need to answer it here. The fact that the C++ term asynchronous operation exists doesn't prevent SYCL from using the word asynchronous in a more generic way. For example, the SYCL spec contains this statement:

SYCL applications are asynchronous in the sense that host and device code executions are decoupled from one another except at specific points.

I don't think the introduction of asynchronous operation in C++26 can or should change the meaning of this statement. I think the SYCL spec can continue to use the word asynchronous as it's used in this sentence (in quite a generic way) even when C++ has introduced the specific term asynchronous operation.

I'll give some context for this PR:

I'm working on the spec text for the SYCL SC error model, which needs to do a number of things differently from SYCL on account of functional safety requirements. Consequently, I need to rewrite §4.13. Error handling for SYCL SC. I've discovered that §4.13 provides explanations of asynchronous behavior in SYCL that are not specific to error handling, but apply to SYCL more broadly. I've also discovered that the statements in §4.13 don't exist elsewhere in the spec in as clear and succinct form. This means that when I rewrite §4.13, I can either include those same statements in my rewrite (less preferable) or move those statements to a location in the spec where they logically fit better into the structure of the spec and are easier to link to and find (more preferable).

Since moving the statements is an editorial change that could, in principle, be shared by SYCL and SYCL SC, I'm trying to work out what exactly that could look like. A straight copy-paste is possible, but I think it would be better to state the general design principles in the architecture chapter so that the programming interface chapter can refer back to them.

Ideally, SYCL and SYCL SC will describe host-side asynchronous behavior with the same wording, but if SYCL really wants to explain decoupled and asynchronous execution in the error handling section, then the SYCL SC spec will probably need to diverge in this regard.

Pennycook · 2025-01-08T13:43:23Z

I think the SYCL spec can continue to use the word asynchronous as it's used in this sentence (in quite a generic way) even when C++ has introduced the specific term asynchronous operation.

I don't disagree, I'm just urging caution. The SYCL specification already contains five instances of "asynchronous operation", because that didn't previously mean anything -- if we're not careful, we might accidentally introduce more wording that is intended to clarify behavior but actually makes things more confusing. Colloquial use of "asynchronous" is probably fine, but introducing new terminology tied to "asynchronous" is probably a bad idea at this point.

Since moving the statements is an editorial change that could, in principle, be shared by SYCL and SYCL SC, I'm trying to work out what exactly that could look like. A straight copy-paste is possible, but I think it would be better to state the general design principles in the architecture chapter so that the programming interface chapter can refer back to them.

I agree that moving the statements from Section 4.13 to earlier in the specification makes sense. Personally, I'd prefer that change to introducing new wording about "resource management".

TApplencourt · 2025-01-08T15:14:02Z

Oh, I didn't understand the PR then. The current PR add implementation details, AFAIK It doesn't move the description of what asynchronously mean is or clarify its meaning. I miss-understandood the PR sorry.

I like the idea of having a special section about "asynchronously" ("resource management," for me, makes me think about buffer /accessor) to describe what we mean by it (or at least put that in the glosary section)

I guess, I agree, we should move "SYCL applications are asynchronous in the sense that host and device code executions are decoupled from one another except at specific points. For example, device code executions often begin when dependencies in the SYCL task graph are satisfied, which occurs asynchronously from host code execution" to before in the spec section.

Are we all agreeing that we are using non-blocking / asynchronously interchangeably?

Pennycook · 2025-01-08T16:31:12Z

Are we all agreeing that we are using non-blocking / asynchronously interchangeably?

I agree that people have a tendency to mix these up, but I don't think they should be used interchangeably.

Whether an API is blocking or non-blocking is about providing a contract to let a developer know whether the calling thread waits within the function for a specified event to occur, or whether the calling thread runs the function and then calls return. When we say that things may be happening asynchronously, we mean something like "they might be happening somewhere else, concurrently".

The fact that q.single_task() is non-blocking just means that the function returns as soon as the task has been added to the queue. Some implementations may choose to start executing work submitted to a queue asynchronously (i.e., the task may execute somewhere else while the host thread is doing other things), but that isn't guaranteed by the fact the function is non-blocking.

TApplencourt · 2025-01-08T16:45:43Z

I see. (This is similar to concurrency versus parallelism, I suppose). Thanks for the clarification!

etomzak · 2025-01-08T17:04:56Z

I agree that moving the statements from Section 4.13 to earlier in the specification makes sense. Personally, I'd prefer that change to introducing new wording about "resource management".

I borrowed the term resource management from the preceding section (§3.7.1.1. Backend resources managed by the SYCL application), but I'm happy to phrase it in some other way. I'm open to suggestions.

What I'd like to achieve in this PR is that by the end of chapter 3, the reader understands that some interactions between the SYCL runtime and a backend happen synchronously and some interactions happen asynchronously (WRT a user thread). The general rule is that the SYCL user cannot make assumptions about when and where a SYCL runtime calls any given backend API function. There can be exceptions to this general rule, particularly when backend specs are involved.

Then the error handling explanation in chapter 4 becomes a simple case of saying something to the effect of: "Some things happen synchronously and some happen asynchronously. This is why there are synchronous and asynchronous exceptions, and a SYCL application needs to deal with both. Except where explicitly specified, it is implementation-defined whether a given exception is thrown synchronously or asynchronously. See [text in chapter 3]."

Is there consensus that §3.7.1. SYCL application execution model is the correct place for this information, even if we're not yet sure of the wording?

etomzak · 2025-01-08T17:31:09Z

Are we all agreeing that we are using non-blocking / asynchronously interchangeably?

I'm trying very hard to not open that can of worms in this PR 😁

Are we all agreeing that we are using non-blocking / asynchronously interchangeably?

I agree that people have a tendency to mix these up, but I don't think they should be used interchangeably.

I completely agree. IMO the terms blocking and non-blocking are the correct ones for describing SYCL functions.

I don't think the terms synchronous and asynchronous provide a meaningful distinction when applied to an API function. SYCL is fundamentally an asynchronous API. Every SYCL API function has some effect synchronously. The majority of SYCL API functions will also have effects that are realized later, asynchronously. E.g., the options that a buffer is constructed with could later influence how a kernel using that buffer is submitted to a backend, so even though a buffer constructor doesn't block, its effects are realized both synchronously and asynchronously. Implementations need latitude to decide what happens synchronously and what happens asynchronously (for performance reasons, to accommodate different backends, etc), so SYCL needs to be careful not to over-specify this.

gmlueck · 2025-01-08T22:37:02Z

I'm working on the spec text for the SYCL SC error model, which needs to do a number of things differently from SYCL on account of functional safety requirements. Consequently, I need to rewrite §4.13. Error handling for SYCL SC. I've discovered that §4.13 provides explanations of asynchronous behavior in SYCL that are not specific to error handling, but apply to SYCL more broadly. I've also discovered that the statements in §4.13 don't exist elsewhere in the spec in as clear and succinct form. This means that when I rewrite §4.13, I can either include those same statements in my rewrite (less preferable) or move those statements to a location in the spec where they logically fit better into the structure of the spec and are easier to link to and find (more preferable).

Can you say in more detail which statements in section 4.13 you think are general in nature and don't belong in that section? This might help understand the motivation for this PR.

etomzak · 2025-01-09T11:14:25Z

Can you say in more detail which statements in section 4.13 you think are general in nature and don't belong in that section? This might help understand the motivation for this PR.

I specifically have the following paragraph in mind (the second in the section). It's phrased WRT exception handling, but it's the most complete description of asynchronous (or "decoupled") behavior I've found in the spec. It's surprising that the SYCL Application Execution Model section doesn't contain this information.

SYCL applications are asynchronous in the sense that host and device code executions are decoupled from one another except at specific points. For example, device code executions often begin when dependencies in the SYCL task graph are satisfied, which occurs asynchronously from host code execution. As a result of this the errors that occur on a device cannot be thrown directly from a host API call, because the call enqueueing a device action has typically already returned by the time that the error occurs. Such errors are not detected until the error-causing task executes or tries to execute, and we refer to these as asynchronous errors.

gmlueck · 2025-01-09T15:48:29Z

I suggest that we instead tweak the introductory paragraph to section 3.7.1. SYCL application execution model. The last two sentences currently say:

When a command group is submitted to a SYCL queue, the requirements of the kernel execution are captured. The implementation can start executing a kernel as soon as its requirements have been satisfied.

Let's change this to:

When a command group is submitted to a SYCL queue, the request to submit the command group returns even before the command is executed. Instead, the requirements of the command group are captured and the command is executed later once these requirements are satisfied.

Pennycook · 2025-01-09T16:02:07Z

When a command group is submitted to a SYCL queue, the request to submit the command group returns even before the command is executed. Instead, the requirements of the command group are captured and the command is executed later once these requirements are satisfied.

I think this is a lot better than what we have. Being really nit-picky, I think the very last sentence here could say something like:

Instead, the requirements of the command group are captured and the command is executed at some point in the future, after these requirements are satisfied.

I read "once" as meaning "as soon as", and there's no guarantee that the command begins executing right away.

gmlueck · 2025-01-09T17:16:23Z

Using "after" instead of "once" seems better. The part about "at some point in the future" seems unnecessarily verbose. How about just:

When a command group is submitted to a SYCL queue, the request to submit the command group returns even before the command is executed. Instead, the requirements of the command group are captured and the command is executed after these requirements are satisfied.

Pennycook · 2025-01-09T17:18:57Z

When a command group is submitted to a SYCL queue, the request to submit the command group returns even before the command is executed. Instead, the requirements of the command group are captured and the command is executed after these requirements are satisfied.

You're right, my proposal was unnecessarily verbose. This looks good to me.

Preliminary wording on asynchronicity

25d87ca

etomzak added the editorial Some purely editorial problem label Jan 6, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFC] Say something about asynchronous behavior on host #685

[RFC] Say something about asynchronous behavior on host #685

etomzak commented Jan 6, 2025

TApplencourt commented Jan 6, 2025

Pennycook commented Jan 7, 2025

etomzak commented Jan 7, 2025

gmlueck commented Jan 7, 2025

etomzak commented Jan 7, 2025

etomzak commented Jan 7, 2025

gmlueck commented Jan 7, 2025

etomzak commented Jan 7, 2025

TApplencourt commented Jan 7, 2025 •

edited

Loading

Pennycook commented Jan 8, 2025

etomzak commented Jan 8, 2025

Pennycook commented Jan 8, 2025

TApplencourt commented Jan 8, 2025

Pennycook commented Jan 8, 2025

TApplencourt commented Jan 8, 2025

etomzak commented Jan 8, 2025

etomzak commented Jan 8, 2025

gmlueck commented Jan 8, 2025

etomzak commented Jan 9, 2025

gmlueck commented Jan 9, 2025

Pennycook commented Jan 9, 2025

gmlueck commented Jan 9, 2025

Pennycook commented Jan 9, 2025

[RFC] Say something about asynchronous behavior on host #685

Are you sure you want to change the base?

[RFC] Say something about asynchronous behavior on host #685

Conversation

etomzak commented Jan 6, 2025

TApplencourt commented Jan 6, 2025

Pennycook commented Jan 7, 2025

etomzak commented Jan 7, 2025

gmlueck commented Jan 7, 2025

etomzak commented Jan 7, 2025

etomzak commented Jan 7, 2025

gmlueck commented Jan 7, 2025

etomzak commented Jan 7, 2025

TApplencourt commented Jan 7, 2025 • edited Loading

Pennycook commented Jan 8, 2025

etomzak commented Jan 8, 2025

Pennycook commented Jan 8, 2025

TApplencourt commented Jan 8, 2025

Pennycook commented Jan 8, 2025

TApplencourt commented Jan 8, 2025

etomzak commented Jan 8, 2025

etomzak commented Jan 8, 2025

gmlueck commented Jan 8, 2025

etomzak commented Jan 9, 2025

gmlueck commented Jan 9, 2025

Pennycook commented Jan 9, 2025

gmlueck commented Jan 9, 2025

Pennycook commented Jan 9, 2025

TApplencourt commented Jan 7, 2025 •

edited

Loading