Compute shaders provide a mechanism for performing general-purpose calculations on the GPU, outside the traditional graphics pipeline. This capability is essential for data-parallel tasks that do not directly map to vertex or fragment processing, such as physics simulations, advanced culling, and post-processing.
Differences Between Compute and Graphics Shaders
The graphics pipeline is a fixed-function sequence (with programmable stages like vertex and fragment) designed for transforming geometry into pixels. Compute shaders, however, operate on a more flexible execution model:
- No Fixed Input/Output: Unlike vertex shaders (which receive attributes) or fragment shaders (which output colors), compute shaders read from and write to arbitrary memory buffers (SSBOs) or images.
- Workgroup Execution: Compute work is organized into "workgroups." Each workgroup contains multiple "local invocations" (threads). This hierarchy allows for efficient data sharing and synchronization within a group.
- Explicit Dispatch: The application must explicitly define the number of workgroups to execute in three dimensions (X, Y, Z), rather than relying on geometry or screen resolution.
Using the Compute API
SBgl provides a streamlined API for creating and executing compute pipelines.
Creating a Compute Pipeline
A compute pipeline is created from a single shader stage. The shader must be loaded with the SBGL_SHADER_STAGE_COMPUTE stage.
sbgl_Shader sbgl_LoadShader(sbgl_Context *ctx, sbgl_ShaderStage stage, const uint32_t *bytecode, size_t size)
Loads a shader from SPIR-V bytecode.
sbgl_ComputePipeline sbgl_CreateComputePipeline(sbgl_Context *ctx, sbgl_Shader shader)
Creates a compute pipeline.
uint32_t sbgl_Shader
Handle for a shader module.
@ SBGL_SHADER_STAGE_COMPUTE
uint32_t sbgl_ComputePipeline
Handle for a compute pipeline.
Dispatching Work
Execution is triggered using sbgl_DispatchCompute. The parameters specify the number of workgroups to launch.
void sbgl_BindComputePipeline(sbgl_Context *ctx, sbgl_ComputePipeline pipeline)
Binds a compute pipeline for subsequent dispatch calls.
void sbgl_DispatchCompute(sbgl_Context *ctx, uint32_t groupCountX, uint32_t groupCountY, uint32_t groupCountZ)
Dispatches a compute workload.
Memory Synchronization
Because the GPU may execute commands asynchronously, synchronization is required when one operation depends on the results of another. sbgl_MemoryBarrier ensures that memory writes are visible to subsequent operations.
void sbgl_MemoryBarrier(sbgl_Context *ctx, sbgl_BarrierType type)
Injects a memory barrier to synchronize compute and graphics operations.
@ SBGL_BARRIER_GRAPHICS_TO_COMPUTE
@ SBGL_BARRIER_COMPUTE_TO_GRAPHICS
GPU-Driven Frustum Culling
One of the most powerful applications of compute shaders in SBgl is GPU-driven culling. In traditional engines, the CPU iterates over every object to determine visibility. For scenes with millions of voxels or instances, this becomes a major bottleneck.
SBgl shifts this responsibility to the GPU:
- Visibility Testing: A compute shader tests each chunk or object against the camera's view frustum.
- Indirect Draw Generation: The compute shader populates a buffer with
VkDrawIndexedIndirectCommand structures. Only visible objects are added to this buffer.
- Single Draw Call: The CPU issues one
vkCmdDrawIndexedIndirect call. The GPU then renders exactly the number of visible instances determined by the compute shader.
This approach minimizes CPU overhead and eliminates the need to transfer visibility results back from the GPU.
Other Applications
The compute API supports various other advanced techniques:
- Particle Systems: Simulating millions of particles with complex physics (gravity, collisions) entirely on the GPU.
- Post-Processing: Implementing sophisticated image filters like Bloom, Depth of Field, or Temporal Anti-Aliasing (TAA) through compute-based image manipulation.
- Terrain Generation: Procedurally generating heightmaps or voxel data on the fly based on player proximity.
- Physics Integration: Resolving collisions and constraints for large-scale simulations directly within the graphics memory space.