This post is part of a series where we discuss some features of hipSYCL 0.9.1. Today’s topic is interoperability between buffers and USM pointers.
Why it matters
SYCL 2020 features two major memory management models, both of which are supported by hipSYCL:
- The traditional buffer-accessor model that has already been available in the old SYCL 1.2.1. In this model, a task graph is constructed automatically based on access conflicts between access specifications described by
accessorobjects are also used to access data in kernels. The buffer-accessor model provides the SYCL runtime with a lot of information about how much data is used and how it is used. This can help scheduling, and enables automatic optimizations such as overlap of data transfers and kernels.
- The pointer-based USM model that was introduced in SYCL 2020. Here, allocations are managed explicitly and (unless shared allocations are used) data must be copied explicitly between host and device. The USM model provides more control to the user compared to the buffer-USM model, at the cost of requiring the user to do work that the runtime can do automatically in the buffer-accessor model. It also forces the programmer to think in a model of a host-device dichotomy, which may not be an ideal fit when CPUs are targeted. On the other hand, it is usually considerably easier to port existing pointer-based code to SYCL using the USM model compared to the buffer-accessor model.
It is apparent that both models have valid use cases and are complementary. However, in SYCL 2020, there is barely any interoperability between the two. Accessing data that is stored in a buffer using a USM pointer requires launching a custom kernel that explicitly copies all data elements from the buffer into a USM pointer. This is both cumbersome and comes at a performance cost.
Consequently, once a codebase has started using one particular model, it is effectively locked into it. This is problematic for several reasons:
- As the SYCL software ecosystem grows, there is a real danger of ecosystem bifurcation if no mechanisms are provided to cross from USM-land to buffer-land and vice versa. A SYCL library with a USM pointer API will be of little use for a SYCL application that is written using buffers and accessors.
- SYCL is all about taking control when you want it, and letting SYCL do what it thinks is best otherwise. This allows to combine the best of two worlds: Low-level kernel optimizations for critical code paths, and the convenience of a high-level programming model for the remaining program. Consequently, it should be possible to use USM pointers whenever we want detailed low-level control, and move to a more high-level model for other parts of the program. Not having interoperability between them can block potential incremental optimization paths during software development.
- Which model will be better in terms of performance or clarity is not always apparent, and might be different for different parts of the program. As outlined above, both have strengths and weaknesses, and are complementary. We should therefore be able to mix buffers and USM pointers.
To address these issues, hipSYCL 0.9.1 has introduced a comprehensive API for interoperability between USM pointers and buffers. In hipSYCL, you can always construct a buffer on top of existing USM pointers, or extract a USM pointer from a buffer – completely without additional data copies.
hipSYCL is the first SYCL implementation to expose such a feature, and the reason is found easily: Buffer-USM interoperability in a meaningful, convenient and efficient way requires guarantees about the internal buffer behavior and SYCL implementation design that far exceed anything the SYCL specification guarantees.
We have therefore introduced an additional hipSYCL runtime specification that more rigorously defines buffer behavior. In particular hipSYCL makes the following guarantees that are crucial for buffer-USM interoperability:
- Buffers use USM pointers internally. All allocations a buffer performs are USM allocations, and buffers are entirely implemented on top of USM pointers.
- Allocations are persistent. Buffers guarantee that allocations, once they have been made, will remain valid at least until the end of buffer lifetime. Buffers will manage exactly one allocation per (physical) device.
- Buffers allocate lazily. When the buffer is used for the first time on a particular device, it will allocate memory large enough for all of the data such that no reallocations are needed for the lifetime of the buffer.
There are two cases to distinguish for buffer-USM interoperability:
- Temporal composition: Here we just move memory allocations from USM pointers into a buffer or vice versa; at each point in time only either a USM pointer or a buffer exists for a given allocation.
- The more complex case: Simultaneously accessing the same allocation as USM pointer and buffer. This is more complicated as it requires some correctness considerations by the programmer.
Let’s focus on the simple case first: Assume we only want to turn an existing buffer into a USM pointer (or vice versa), but don’t want to use them simultaneously. hipSYCL has a fairly intuitive API for that:
buffer::get_pointer() to extract USM pointers and a special buffer constructor that accepts USM pointers.
Simultaneous USM pointers and buffers
If we want to have both USM pointers and buffers accessing the same allocation simultaneously, things get more complicated. In this scenario, it is crucial to understand that
- Buffers automatically calculate dependencies to other operations by detecting conflicting accessors. If operations use the same allocation but without going through accessors, buffers cannot know about these additional dependencies – the programmer must insert them manually.
- Buffers automatically calculate necessary data transfers by tracking whether data is valid or outdated on a particular device. If data is modified through USM pointers without the buffer knowing of it, the internal data tracking of the buffer is off and no longer reflects reality. This can cause the buffer to emit data transfers that shouldn’t take place, or omit data transfers when they might actually be required. To avoid this, we need to manually update the buffer’s data tracking.
Here’s an example that shows how it’s done.
So far no surprises – we just had to insert dependencies manually as expected. Let’s now look at submitting work to a different device. When submitting USM operations to another device, we need to inform the buffer that there are writes taking place on that device, and that it should consider allocations on other devices as outdated after this point. We again use
handler::update() for this.
In summary, even using buffers and USM pointers simultaneously for the same data is possible, but requires a solid understanding of SYCL and the guarantees that hipSYCL makes specifically.
Remember that buffers cannot know about USM kernels that utilize the same allocations, so always, always make sure to insert correct dependencies. Also, make sure to inform the buffer that an allocation has been modified so that it can correctly emit data transfers when an accessor is used for the buffer on a different device (including the host device). This can be done by constructing a accessor with a suitable access mode – either by using
handler::update(), or by submitting a kernel that uses accessors.
In practice, this might be much simpler. If you are not working with complex task graphs, you could just use a SYCL 2020 in-order queue to avoid having to insert all those dependencies manually. And if you are only working on a single device, your
handler::update() calls might not be required anymore.
For the full API reference, see the hipSYCL documentation.