Notes about VK_EXT_custom_resolve

Posted on . Updated on .

Vulkan 1.4.333, released on November 14th, includes VK_EXT_custom_resolve, an extension that is closely related to the work we have been doing at Igalia to help Valve release the Steam Frame. It’s an extension that allows you to use a custom fragment shader to specify how to resolve a multisample image into a single-sample image, typically so the latter can be presented on screen. CTS tests for that extension were written by yours truly in close collaboration with Turnip developers, and are now public.

When using classic render passes from Vulkan, the extension only adds a flag that you can use to mark a subpass as a custom resolve one. That subpass will typically use the multisample attachment as an input attachment in the fragment shader, and the single-sample attachment as the output. Nothing you could do already without the extension.

However, by marking the subpass as a custom resolve, you’re basically promising that, in practice, you will only draw a single full-screen quad in that subpass, and use that draw to resolve the multisample image. In tiling GPUs, as used on mobile devices like the one that powers the Steam Frame, that allows the driver to make sure everything is worked on a tile-by-tile basis and is super-optimized to be as efficient as possible.

Naturally, a custom resolve is not going to be faster than the fixed-function hardware that normally performs resolves so, if you’re only interested in doing a standard resolve operation, do not bother using the extension and programming a fragment shader. However, sometimes applications want to do more than just the resolve. For example, you may be interested in applying tone-mapping to the final resolved image. Without a custom resolve, you’d be forced to wait for the resolve operation to complete and perform tone-mapping in a separate render pass, which implies new inefficient tile load and store operations. Mobile vendors always insist that you use subpasses if possible, as that allows them to work more efficiently per-tile, avoiding costly loads and stores to and from tile memory. With custom resolves, you can do the resolve and the tone-mapping operations in a single efficient shader, giving every hint to the Vulkan driver about what you are doing so the GPU usage is optimal.

The extension is based on VK_QCOM_render_pass_shader_resolve, but Valve contractors and Mesa contributors Mike Blumenkrantz and Connor Abbott spearheaded an effort to improve it and to support dynamic rendering cases. This moved the extension from being vendor-specific to being a multi-vendor one, with implementations on Turnip, RADV, ANV, lavapipe and NVIDIA, so far. This paves the way for applications to require this extension unconditionally, avoiding the need to have multiple code paths depending on extension support, and making sure their code performs optimally everywhere.

If the application uses dynamic rendering instead of classic render passes, the extension allows you to start using custom resolves efficiently. The way it works with dynamic rendering is interesting and merits some words, but the essentials are described in the extension proposal document.

When you start a dynamic render pass, you can specify two image views for each attachment: the main view, which is multisample when doing multisample rendering, and the resolve view (which is null if the main view is already single-sample). The extension adds a new command that you can call in the middle of a dynamic render pass: vkCmdBeginCustomResolveEXT. This command remaps the output attachments of the fragment shader so that the location corresponding to a particular attachment is connected to the single-sample resolve view instead of the usual multisample view. This remapping does not affect how input attachments work with dynamic rendering: reading from an attachment as an input attachment continues to read from the multisample view. However, writing to it redirects writes to the single-sample view instead. This is how you can easily read values from the multisample view while writing the result directly, using a shader, to the resolve view. Again, you could already do that in Vulkan with a separate dynamic render pass and separate attachments (or descriptors) for multi-views and single-sample views, but this again may lead to inefficient loads and stores on mobile GPUs, derived from having to use multiple render passes. Now your dynamic render pass is effectively split into two pseudo-subpasses: regular draws come before the begin-custom-resolve command, and resolve draws come after that command. As with classic render passes, typically the resolve part contains a single draw for a full-screen quad, and the fragment shader uses it to read from multisample view N as an input attachment while writing to output location N, which is now the single-sample resolve view.

Load comments