particles-webgpu
This project a basic particles simulation running fully on GPU, using the new WebGPU API. Particles evolve independently, following simple gravitational rules. There can be several attraction points at once. You can control one with your mouse by pressing the left mouse button.
Chrome 94 brings experimental support for the new WebGPU API. To enable it, a flag is available in chrome://flags/#enable-unsafe-webgpu
. If you can't find this flag, try using Chrome Canary instead.
This is actually a WebGPU port of my old particles-gpu project. The main differences are that this one uses compute shaders, and stores the particles in a GPUBuffer instead of a texture.
See it live here.
Preview
Notes
This is my first WebGPU project, so I probably made a few mistakes. Also, I am discovering behaviours that surprise me. Here is a documentation of most of them. I guess what follows is obvious to someone familiar modern APIs such as Vulkan (or even late OpenGL).
Explicit errors
The first thing I noted is that the browser seems to perform tons of validity checks for each instruction, which makes debugging way easier than WebGL. Moreover, the messages clearly specify which validity check failed, why these checks were performed, etc.
var
and let
Keywords var
and let
don't mean the same thing than in Javascript. In WGSL, var
declares an object that can vary, while let
declares an immutable object
Specification here.
Alignment of struct
I encountered a unexpected behaviour in a shader with a struct
I used to describe a uniform buffer.
Issue
Here is how I described it in the shader (wgsl):
struct Uniforms {
singleFloat: f32;
vecFloat: vec2<f32>;
};
@group(0) @binding(0) var<uniform> uniforms: Uniforms;
and here is how I first created it and included it (javascript):
const gpuBuffer = device.createBuffer({
size: Float32Array.BYTES_PER_ELEMENT * (1 + 2), // one f32 and one vec2<f32>
usage: GPUBufferUsage.UNIFORM | GPUBufferUsage.COPY_DST,
mappedAtCreation: false,
});
device.createBindGroup({
layout: pipeline.getBindGroupLayout(0),
entries: [{
binding: 0,
resource: { buffer: gpuBuffer }
}]
});
To my surprise, Chrome outputed the following error:
Binding size (12) is smaller than the minimum binding size (16).
- While validating entries[0] as a Buffer
- While validating [BindGroupDescriptor] against [BindGroupLayout]
- While calling [Device].CreateBindGroup([BindGroupDescriptor]).
In an attempt of fixing it, I naively complied and padding the buffer to make its size 16 bytes. It "fixed" this specific issue but things still didn't behave as I expected.
// creation
const gpuBuffer = device.createBuffer({
size: Float32Array.BYTES_PER_ELEMENT * (1 + 2 + 1), // one f32 and one vec2<f32> and one padding float
usage: GPUBufferUsage.UNIFORM | GPUBufferUsage.COPY_DST,
mappedAtCreation: false,
});
// ... add it to a bindgroup ;.. //
// writing
const a = ..., b = ..., c = ...;
const floatBuffer = new Float32Array([a, b, c]);
const buffer = floatBuffer.buffer;
device.queue.writeBuffer(this.renderpassUniformsBuffer, 0, buffer);
// ... bind the bindgroup to a renderpass ... //
During the shader execution, I expected to have:
(uniforms.singleFloat == a) && (uniforms.vecFloat == vec2<f32>(b, c))
but instead got:
(uniforms.singleFloat == a) && (uniforms.vecFloat == vec2<f32>(c, 0.0))
Cause and fix
This issue came from alignment of types in WGSL. This behaviour is described in the Structure Layout Rules and Alignment and Size sections of the WGSL spec.
Each type has their own alignment requirements. In this case, the relevant info is: AlignOf(f32) == 4
and AlignOf(vec2<f32>) == 8
. This means that when I write:
struct Uniforms {
singleFloat: f32;
vecFloat: vec2<f32>;
};
because of the 8-bytes alignment of vec2<f32>
, it is actually translated as:
struct Uniforms {
singleFloat: f32;
unusablePaddingFloat: f32;
vecFloat: vec2<f32>;
};
This explains why
- Chrome was requesting a minimum binding size of 16
- and when using the buffer
[a, b, c, 0]
as my uniform buffer,uniforms.vecFloat
was filled with(c,0)
instead of(b,c)
.
Since f32
are 4-bytes aligned and I don't care about the alignment/stride of the struct
itself, the fix I used is surprisingly simple: switch my two properties:
struct Uniforms {
vecFloat: vec2<f32>;
singleFloat: f32;
};
It is worth noting that one can explicit alignment and stride of each property with by decorating them with the WGSL attributes @align(X)
and @stride(Y)
.
Usage of Float16
Float16 in WGSL
I was surprised to see that according to the Floating Point Evaluation section of the spec, WebGPU uses Float32 precision for shader computations (and 32-bits types in general). What a difference with WebGL where some devices only supported lowp
!
According to the Plain types section of the WGSL spec, the base types are bool
, u32
, i32
, f32
but there is no f16
.
For my computations, storing the positions and velocities as Float16 would be enough. However since I store them in buffers, I have to use Float32, so each particle takes 16 bytes (one vec2<f32>
for position, one vec2<f32>
for velocity), which is a lot.
One way to use Float16 for storage would be to use a rgba16float
texture and manipulating it texel by texel with textureLoad
and textureStore
.
Float16 in Javascript
One issue I would have if I used a rgba16float
texture for storing the position/velocity in a texel would be to initialize this texture. One way to fill a texture is to use the GPUDevice.GPUQueue.writeTexture
method, by passing it an ArrayBuffer
. However, sadly in Javascript there is no TypedArray for Float16: there are amongst others Uint8Array
, Uint16Array
, Uint32Array
, BigUint64Array
, Float32Array
and Float64Array
however there is no Float16Array
. I don't know why this type is lacking but I feel like it would be useful. To avoid this issue, I would have either
- to craft on CPU-side Float16 with bit manipulations
- or to initialize the texture on the GPU. Initial positions are random and having nice random in shaders is a bit tricky. WGSL support bit operations on ui32, so I suppose a simple PCG random number generator could be implemented in a compute shader (maybe by using built-ins such as
GlobalInvocationId
as part of the seed ?).
Data packing
WebGPU offers nice data packing builtin functions (see spec here) such as pack4x8unorm
and unpack4x8unorm
, which allow compact packing of vec4<f32>
into u32
(useful for storing colors for instance).