Vulkan Ray Tracing Resources and Overview

Posted on .

Filed under: igalia

As you may know, I’ve been working on VK-GL-CTS for some time now. VK-GL-CTS the Conformance Test Suite for Vulkan and OpenGL, a large collection of tests used to verify implementations of the Vulkan and OpenGL APIs work as intended by the specification. My work has been mainly focused on the Vulkan side of things as part of Igalia's ongoing collaboration with Valve.

Last year, Khronos released the official specification of the Vulkan ray tracing extensions and I had the chance to participate in the final stages of the process by improving test coverage and fixing bugs in existing CTS tests, which is work that continues to this day mixed with other types of tasks in my backlog.

As part of this effort I learned many bits of the new Vulkan Ray Tracing API and even provided some very minor feedback about the spec, which resulted in me being listed as contributor to the VK_KHR_acceleration_structure extension.

Now that the waters are a bit more calm, I wanted to give you a list of resources and a small overview of the main concepts behind the Vulkan version of ray tracing.

General Overview

There are a few basic resources that can help you get acquainted with the new APIs.

  1. The official Khronos blog published an overview of the ray tracing extensions that explains some of the basic concepts like acceleration structures, ray tracing pipelines (and what their different shader stages do) and ray queries.

  2. Intel’s Jason Ekstrand gave an excellent talk about ray tracing in Vulkan in XDC 2020. I highly recommend you to watch it if you’re interested.

  3. For those wanting to get their hands on some code, the Khronos official Vulkan Samples repository includes a basic ray tracing sample.

  4. The official Vulkan specification text (warning: very large HTML document), while intimidating, is actually a good source to learn many new parts of the API. If you’re already familiar with Vulkan, the different sections about ray tracing and ray tracing pipelines are worth reading.

Acceleration Structures

The basic idea of ray tracing, as a tool, is that you must be able to choose an arbitrary point in space as the ray origin and a direction vector, and ask your implementation if that ray intersects anything along the way given a minimum and maximum distance.

In a modern computer or console game the number of triangles present in a scene is huge, so you can imagine detecting intersections between them and your ray can be very expensive. The implementation typically needs to organize the scene geometry in a hierarchical tree-like structure that can be traversed more efficiently by discarding large amounts of geometry with some simple tests. That’s what an Acceleration Structure is.

Fortunately, you don’t have to organize the scene geometry yourself. Implementations are free to choose the best and most suitable acceleration structure format according to the underlying hardware. They will build this acceleration structure for you and give you an opaque handle to it that you can use in your app with the rest of the API. You’re only required to provide the long list of geometries making up your scene.

You may be thinking, and you’d be right, that building the acceleration structure must be a complex and costly process itself, and it is. For this reason, you must try to avoid rebuilding them completely all the time, in every frame of the app. This is why acceleration structures are divided in two types: bottom level and top level.

Bottom level acceleration structures (BLAS) contain lists of geometries and typically represent whole models in your scene: a building, a tree, an object, etc.

Top level acceleration structures (TLAS) contain lists of “pointers” to bottom level acceleration structures, together with a transformation matrix for each pointer.

In the diagram below, taken from Jason Ekstrand’s XDC 2020 talk[1], you can see the blue square representing the TLAS, the red squares representing BLAS and the purple squares representing geometries.

Picture showing a hand-drawn cowboy, cactus and cow. A blue square surrounds the whole picture. Orange squares surround the cowboy, cactus and cow. Individual pieces of the cowboy, cactus and cow are surrounded by purple squares.

The whole idea behind this is that you may be able to build the bottom level acceleration structure for each model only once as long as the model itself does not change, and you will include this model in your scene one or more times. Each time, it will have an associated transformation matrix that will allow you to translate, rotate or scale the model without rebuilding it. So, in each frame, you may only have to rebuild the top level acceleration structure while keeping the bottom level ones intact. Other tricks you can use include rebuilding the top level acceleration structure in a reduced frame rate compared to the app, or using a simplified version of the world geometry when tracing rays instead of the more detailed model used when rendering the scene normally.

Acceleration structures, ray origins and direction vectors typically use world-space coordinates.

Ray Queries

In its most basic form, you can access the ray tracing facilities of the implementation by using ray queries. Before ray tracing, Vulkan already had graphics and compute pipelines. One of the main components of those pipelines are shader programs: application-provided instructions that run on the GPU telling it what to do and, in a graphics pipeline, how to process geometry data (vertex shaders) and calculate the color of each pixel that ends up on the screen (fragment shaders).

When ray queries are supported, you can trace rays from those “classic” shader programs for any purpose. For example, to implement lighting effects in a fragment shader.

Ray Tracing Pipelines

The full power of ray tracing in Vulkan comes in the form of a completely new type of pipeline, the ray tracing pipeline, that complements the existing compute and graphics pipelines.

Most Vulkan ray tracing tutorials, including the Khronos blog post I mentioned before, explain the basics of these pipelines, including the new shader stages (ray generation, intersection, any hit, closest hit, etc) and how they work together. They cover acceleration structure traversal for each ray and how that triggers execution of a particular shader program provided by your app. The image below, taken from the official Vulkan specification[2], contains the typical representation of this traversal process.

Ray Tracing Acceleration Structure traversal diagram showing the ray generation shader initiating the traversal procedure, the miss shader called when the ray does not intersect any geometry and the intersection, any hit and closest hit shaders called when an intersection is found

The main difference between the traditional graphics pipelines and ray tracing pipelines is the following one. If you’re familiar with the classic graphics pipelines, you know the app decides and has full control over what is being drawn at any moment. Your command stream usually looks like this.

  1. Begin render pass (I’ll be using this depth buffer to discard overlapping geometry on the screen and the resulting pixels need to be written to this image)

  2. Bind descriptor sets (I’ll be using these textures and data buffers)

  3. Bind pipeline (This is how the whole process looks like, including the crucial part of shader programs: let me tell you what to do with each vertex and how to calculate the color of each resulting pixel)

  4. Draw this

  5. Draw that

  6. Bind pipeline (I’ll be using different shader programs for the next draws, thank you)

  7. Draw some more

  8. Draw even more

  9. Bind descriptor sets (The textures and other data will be different from now on)

  10. Bind pipeline (The shaders will be different too)

  11. Additional draws

  12. Final draws (Almost there, buddy)

  13. End render pass (I’m done)

Each draw command in the command stream instructs the GPU to draw an object and, because the app is recording that command, the app knows what that object is and the appropriate resources that need to be used to draw that object, including textures, data buffers and shader programs. Before recording the draw command, the app can prepare everything in advance and tell the implementation which shaders and resources will be used with the draw command.

In a ray tracing pipeline, the scene geometry is organized in an acceleration structure. When tracing a ray, you don’t know, in advance, which geometry it’s going to intersect. Each geometry may need a particular set of resources and even the shader programs may need to change with each geometry or geometry type.

Shader Binding Table

For this reason, ray tracing APIs need you to create a Shader Binding Table or SBT for short. SBTs represent (potentially) large arrays of shaders organized in shader groups, where each shader group has a handle that sits in a particular position in the array. The implementation will access this table, for example, when the ray hits a particular piece of geometry. The index it will use to access this table or array will depend on several parameters. Some of them come from the ray tracing command call in a ray generation shader, and others come from the index of the geometry and instance data in the acceleration structure.

There’s a formula to calculate that index and, while it’s not very complex, it will determine the way you must organize your shader binding table so it matches your acceleration structure, which can be a bit of a headache if you’re new to the process.

I highly recommend to take a look at Will Usher’s Shader Binding Table Tutorial, which includes an interactive SBT builder tool that will let you get an idea of how things work and fit together.

The Shader Binding Table is complemented in Vulkan by a Shader Record Buffer. The concept is that entries in the Shader Binding Table don’t have a fixed size that merely corresponds to the size of a shader group handle identifying what to run when the ray hits that particular piece of geometry. Instead, each table entry can be a bit larger and you can put arbitrary data after each handle. That data block is called the Shader Record Buffer, and can be accessed from shader programs when they run. They may be used, for example, to store indices to resources and other data needed to draw that particular piece of geometry, so the shaders themselves don’t have to be completely unique per geometry and can be reused more easily.


As you can see, ray tracing can be more complex than usual but it’s a very powerful tool. I hope the basic explanations and resources I linked above help you get to know it better. Happy hacking!


[1] The Acceleration Structure representation image with the cowboy, cactus and cow is © 2020 Jason Ekstrand and licensed under the terms of CC-BY.

[2] The Acceleration Structure traversal diagram in a ray tracing pipeline is © 2020 The Khronos Group and released under the terms of CC-BY.

Jumped from i3 to Gnome in my personal desktop

Posted on .

According to the blog archives, I had been using the i3 window manager on my personal computer since 2013. However, a few days ago I decided to switch to Gnome, for several reasons. The first one is to avoid further confusion in my muscle memory. For the last couple of years, my work laptop has been running Fedora and, when I decided which variant or environment to set up, I went with a vanilla installation of Fedora Workstation, which uses Gnome as its desktop environment.

Installing Fedora Workstation was a simple decision: I was already using Fedora in the rest of my systems and, for my work laptop, I wanted to use something tested, widely deployed and that would get out of my way. The work laptop needs to just work and I tend to avoid fiddling too much with the computers I use to get my job done. If I have a software problem, I’d prefer it to be a common issue experienced by others so I can find a solution quickly and easily.

So, as I said above, that’s what I’ve been running for the last couple of years, using it 7+ hours per day from Monday to Friday. That’s much more than the amount of time I use my personal desktop nowadays. And I had started to experience some muscle memory confusion when working on i3, like trying to switch windows using the Win+Tab keyboard shortcut that I use all the time in Gnome.

The other reason I switched is that my most common windowing setups in i3 are also easily supported under Gnome. After all these years with i3, I found myself using virtual desktops a lot to easily group windows by task and avoid minimizing applications all the time. Inside each desktop, I normally use i3’s tabbed layout to keep most of my applications full-screen. If not using the tabbed layout, it’s rare for me to use more than two windows side by side. And all of that is more or less easily available on Gnome. Virtual desktops are supported and you can easily cycle between them using Win+PageUp and Win+PageDown. And, if you want to put two windows side by side, that’s also relatively easy to do with Win+Left and Win+Right, which sticks the active window to the left or right half of the screen. Win+Shift+Left, Win+Shift+Right, Win+Shift+PageUp and Win+Shift+PageDown also allow you to move windows to the left or right screens, or to the virtual desktop above or below.

So, all in all, I did learn a few things about window management and discovered some personal tastes in all these years using i3, and I’ll be applying that in my Gnome sessions.

Slimbook Essential 14" Review

Posted on .

At the end of 2020 I decided to buy a laptop that could be used by everyone at home. The reason was my wife and I both have personal computers but they’re both desktops, sitting in our study together with our work laptops, which is where we prefer to be when we work from home. Due to the pandemic, we were afraid at some point in time one or both of our children would have to stay at home confined and, at least in the case of my son, would have to do some school-related activities on a computer. I couldn’t imagine one of us working in the same room where my son would be, talking to the other parent and using the computer. We needed a laptop that could be moved and easily carried around the house, and it could also be handy if we had to take our computer with us somewhere in the future.

Because it would be used by children, which is always prone to accidents, I didn’t want it to be too expensive. At the same time, since it was unlikely to be used for heavy-duty tasks, I wanted it to last as much as possible and I wanted to be able to install Linux on it without much trouble. I aimed for it to have 16GB of RAM and a medium-sized SSD. I soon realized every laptop I could find on the market with 16GB of RAM had a premium price tag on it, and those which did not were out of stock.

Those of you living in the US can get very decent laptops with Linux preinstalled from Dell (but Linux models are usually expensive) or from other more specialized shops like System76. Fortunately for me, there’s a small company in Spain called Slimbook that offers affordable and well-built laptops and other computers oriented to Linux users, so I got myself one of the most modest models, the Slimbook Essential 14, with an Intel i5 processor, 16GB of RAM and a 500GB SSD. Its final price, shipping included, was below 700 euros.

It’s very nicely built, has an aluminum body and a comfortable trackpad and keyboard. Connectivity is pretty good too, sporting a couple of USB 3.2 ports (USB-C and the classic USB-A), another extra USB 2.0 type A port, an HDMI connector, audio jack and even a Gigabit Ethernet port (yay!).

On the not-so-good part, because they’re a small shop and work basically on-demand, getting your laptop built, tested and delivered to your door usually takes several weeks (normally 4-6, around 6 in my case). The global pandemic has not helped them at all, with some supply issues that sometimes even delayed a bit their usual schedule.

Apart from that, the speakers are just there. They’re good enough if you need someone else to listen to something with you, but using headsets or earbuds is preferred, in my opinion. Laptop webcams are usually not very good and, in this model, while quality is not atrocious, it falls a bit on the lower end, in my opinion. Nevertheless, it’s good enough for videoconferencing, specially if your room is nicely lit and the other end is going to look at you using a phone, tablet or a small-ish square in a larger screen.

On the bright side, apart from the good build quality and connectivity I mentioned above, the bottom cover of the laptop is easily removed and gives you easy access to the RAM and SSD slots, making them easy to upgrade and repair if needed.

All in all, I’m pretty satisfied with the purchase so far, so I’ll wait and see how it evolves as time passes.

Year-end donations round, 2020 edition

Posted on .

There’s no doubt 2020 has been a special year, often for the worse. Due to the ongoing global pandemic, many people have lost their jobs, have reduced healthcare options or have had and will continue to have trouble getting food and feeding their families. And this is on top of many other ongoing problems, which have only gotten worse this year. For those of us lucky enough to have something to spare, donating to NGOs and putting our grain of sand in the pile is critical.

On a personal level, I donate to several “classic” NGOs, so to speak. On a professional level, at Igalia we collaborate with a wide variety of NGOs, in some cases on very specific projects.

However, at the end of the year I always like to make a small round of personal donations to projects and organizations which are also important for our digital lives on a daily basis. This year I’ve selected the following ones:

  • EFF has done a superb job, as usual. Apart from their crucial defense of civil liberties and digital rights, we owe them many things you may be using everyday. Let’s not forget they were involved in starting the Let’s Encrypt project and created Privacy Badger and HTTPS Everywhere, among other tools. This year, they went way beyond their call of duty representing the current maintainers of youtube-dl and helped getting the project restored on GitHub.

  • Signal is, to me, an essential tool I use everyday to communicate with my friends and family. It’s a free and open source software project providing us an easy-to-use messaging application pushing the state of the art in end-to-end encryption protocols for your text messages and your audio and video calls.

  • Internet Archive is, to me, another essential project. Somewhat connected in spirit to youtube-dl, it’s playing a critical role in cultural preservation and providing free access to millions of works of art and science.

  • Free Software Foundation Europe promotes free software in the European Union, also running campaigns to increase its use in public administrations and their computers, as well as attempting to encourage publicly-funded projects to be released as free software.

I didn’t donate to Wikipedia this year because I prefer to chip in when they run their fundraising campaigns. In the past I’ve also donated to Mozilla but I understand it may be a bit controversial. The best thing you can do for Mozilla and for the open web is to keep using and promoting Firefox, in my humble opinion.

In addition, I encourage you to donate to small free and open source software projects you may be using everyday, in which the impact of a large amount of small donations can be significant. I was about to donate money to uBlock Origin but they politely reject donations in their README file. However, maybe you develop software professionally on Windows and happen to use WinSCP very frequently, for example. Just think of the free software projects you use everyday. Probably, some of them may not have large corporate sponsors behind. They may also offer support contracts as their main revenue source, and these could be useful for your employer.

Embedding YouTube videos without making your site fatter

Posted on . Updated on .

Making this site lighter and improving load times for my readers has been a priority for some years. I’ve stopped embedding web fonts, I’ve started using Unicode icons instead of relying on Font Awesome and I’ve also started loading Disqus comments on demand, which also has a positive impact on the privacy of anyone reading these pages.

However, on a few occasions I’ve wanted to embed a YouTube video in one of the posts, and I had never realized this can heavily impact page sizes and load times. Take, for example, the following HTML document.

<head><title>Embedded YouTube Video</title></head>
    allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture"

The iframe code you see above is almost verbatim copied from the code YouTube gives you when you right-click on the video and select “Copy embed code”. If you store that document locally and try to open it with Firefox using its network inspection tool, you’ll discover it attempts to load, as of the time this text is being written, around 1.84MB of data, and that’s with uBlock Origin blocking some additional requests. The largest piece of that being YouTube’s base Javascript library.

Firefox network inspection tool showing the base YouTube Javascript library weighting 1.46MB

On the one hand, it’s likely many people already have that file, and some others, in their browser cache. On the other hand, I don’t feel comfortable making that assumption and throwing my hands up. This prompted me to try to find a way to embed YouTube videos without adding so much data by default, and it turns out other people have found solutions to the problem, which I’ve slightly tuned and I’m re-sharing here. Instead of using the previous iframe code, I use something slightly more convoluted.

<head><title>Embedded YouTube Video</title></head>
    title="Video: Why Can't You Download Videos on YouTube? How a 20-Year-Old Law Stops youtube-dl Users AND Farmers"
    allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture"
            * {
                padding: 0;
                margin: 0;
                overflow: hidden;
            html, body {
                height: 100%;
            img, span {
                /* All elements take the whole iframe width and are vertically centered. */
                position: absolute;
                width: 100%;
                top: 0;
                bottom: 0;
                margin: auto;
            span {
                /* This mostly applies to the play button. */
                height: 1.5em;
                text-align: center;
                font-family: sans-serif;
                font-size: 500%;
                color: white;
        <!-- The whole frame is a link to the proper embedded page with autoplay. -->
        <a href=''>
                alt='Video: Why Cant You Download Videos on YouTube? How a 20-Year-Old Law Stops youtube-dl Users AND Farmers'
            <!-- Darken preview image laying this on top. Also makes the play icon stand out. -->
            <span style='height: 100%; background: black; opacity: 75%'></span>
            <!-- Play icon. -->

The first few lines are almost identical up to the src property. I’ve only added the title property for accessibility reasons. Special care needs to be taken if the video title contains double quotes or single quotes. The src property contains the classic URL but it’s only used as a fallback for browsers that do not support the srcdoc property that starts on the next line. srcdoc allows you to specify an inline document that will be used instead of loading the frame from an external URL. Support for it is widespread nowadays. As you can see, the embedded inline document contains a style element followed by an a element pointing to the real embedded iframe, only this time with the autoplay parameter set to 1, so the video will immediately start playing when the frame is loaded.

The provided style sheet makes sure the link fills the entire embedded iframe, so clicking anywhere on it loads the video and starts playing it. Inside the link you’ll find 3 items. The first one is the video thumbnail in YouTube’s “high quality” version, which is actually lightweight and gets the job done as the background image. My guess is the name predates HD and FHD content. On top of that image I’ve placed a span element with a black background color and 75% opacity that, again, fills the whole iframe and darkens the background image, making the play button stand out. Finally, another span element is laid out on top of those, containing the Unicode character for a triangle pointing to the right in a large font. This serves as the aforementioned play button and gives readers the visual clue they need to click on the video to start playing it.

With those changes and for this particular video, the browser only needs to load around 9KB of data. You can see what it looks like below.

Edit 2020-12-08: The autoplay parameter is ignored by YouTube on mobile and users will need to tap twice to watch the video: once to click on your link and load the iframe and once more to start playing the video from it. Still, I think double tapping is worth saving almost 2MB by default. On an iPad I only need to tap once, so it depends on the exact device and what’s considered “mobile”.