This summer, I engineered a series of high-performance, real-time particle simulations using Vulkan's compute shader capabilities. The project explores the power of GPU-accelerated computation by moving the entire physics simulation from the traditional vertex shader to a dedicated compute pipeline. This architecture enables complex particle interactions on a massive scale, rendering over 2 million particles in a single simulation with smooth, interactive frame rates.
The simulations include three distinct behaviors:
-
Classical Newtonian Physics: A foundational simulation of particles moving with basic velocity and position integration.
-
Multi-Frequency Wave Simulation: Creates complex animated surfaces using summed sine waves with independent parameters.
-
Boids Flocking Simulation: Simulates the emergent flocking behavior of birds (alignment, cohesion, and separation).
Technical Architecture & Key Features:
​
1. Dual-Pipeline Vulkan Setup:
-
Compute Pipeline: A dedicated pipeline that runs a compute shader (comp.spv, comp_boid.spv, comp_gerstner.spv). This shader has exclusive read/write access to the particle data (positions, velocities, colors) stored in Shader Storage Buffer Objects (SSBOs). It executes in parallel workgroups, updating the state of thousands of particles simultaneously every frame.
-
Graphics Pipeline: A standard rendering pipeline that consumes the results of the compute shader. It reads the updated particle data from the SSBOs and renders them as point primitives using a simple vertex/fragment shader pair (vert.spv, frag.spv).
2. Synchronization and Data Flow:
Managing the data race between the compute shader writing to the SSBO and the graphics shader reading from it was a critical challenge. The solution implements a robust synchronization strategy using Vulkan semaphores and fences:
-
A computeFinishedSemaphore signals that the compute shader has completed its work and the SSBOs are safe to read.
-
The graphics pipeline waits on this semaphore before beginning its rendering pass, ensuring correct frame-to-frame data dependency.
This prevents the graphics queue from drawing particles while the compute queue is still updating them, a classic producer-consumer problem solved at the GPU level.
3. Particle Data Management:
A double-buffering technique with MAX_FRAMES_IN_FLIGHT sets of SSBOs is used. The compute shader reads from the previous frame's SSBO and writes to the current frame's SSBO, enabling accurate simulation steps that depend on the last state (e.g., velocity integration).
4. Interactive Camera System:
To allow for immersive inspection of the simulations, I implemented a first-person camera controller using GLFW input callbacks. The system processes mouse movement for looking around and keyboard input (WASD, Space, Shift) for moving through the 3D space. The view and projection matrices are passed to both the graphics and compute pipelines via a Uniform Buffer Object (UBO).
1. Classical Newtonian Physics
Before moving on to the more complex particle simulations, I decided to start by implementing a very basic random particle system that simulates the positions and velocities of each particle through a compute shader. Each particle is given an intial random value for its position, velocity and color. When the particle reaches the edge of the simulation space it is reflected back.


2. Multi Frequency Wave Simulation
This compute shader simulates a fluid-like vertical wave motion on a particle system using a sinusoidal height function. Each thread updates the Y-position of a single particle by sampling a 2D sine wave moving across the XZ-plane. This wave is based on the classic Gerstner wave equation, simplified using sine waves and allowing control only of vertical displacement for a visually convincing wave effect.
-
Wave Equation Evaluation
-
For each particle:
-
The shader calculates the phase of the wave at the particle's XZ position using a dot product with a directional wave vector.
-
It computes the wave height as a sine function of the spatial phase and simulation time
-
The result is scaled by a tunable amplitude to produce the vertical offset.
-
-
-
Original Height Preservation
-
To allow the wave to oscillate around a fixed base Y-position (useful for terrain-following or water level control), each particle's initial Y-position is stored in velocity.y. The calculated wave height is added to this base.
-
-
Parallel Update and Write
-
Updated positions are written to an output buffer, again following a ping-pong SSBO pattern (particlesIn[] to particlesOut[]) to prevent race conditions and enable smooth frame-to-frame motion.
-



3. Boid Flocking Simulation
To simulate realistic flocking behavior among particles (boids), I implemented a GPU-accelerated algorithm using a compute shader. The algorithm is based on Craig Reynolds’ classic boid model, which consists of three primary behavioral rules: separation, alignment, and cohesion.
Each GPU thread is responsible for updating a single boid. For each frame, the shader performs the following steps:
-
Neighborhood Search:
-
Each boid iterates over all other boids to identify neighbors within a specified perception radius. For each neighbor:
-
A separation vector is computed to steer away from nearby boids.
-
An alignment vector accumulates the average velocity of nearby boids.
-
A cohesion vector accumulates the positions of nearby boids to steer toward the group's center.
-
-
-
Force Calculation:
-
The separation, alignment, and cohesion vectors are weighted using tunable uniform parameters (separationWeight, alignmentWeight, cohesionWeight) and combined to produce an updated velocity.
-
-
Velocity Limiting:
-
The resulting velocity is clamped to a maximum speed to maintain visual and physical stability.
-
-
Position Update:
-
The boid’s position is updated based on its new velocity and the global deltaTime.
-
-
Boundary Wrapping:
-
To keep boids within a finite 3D space, a simple wrap-around boundary condition is applied. Boids that move past one edge of the simulation space reappear on the opposite side, maintaining continuous motion.
-
-
Result Output:
-
Updated position, velocity, and color are written to a second Shader Storage Buffer Object (SSBO) in a ping-pong fashion, which avoids race conditions during simultaneous read/write operations.
-



