SBgl 0.1.0
A graphics framework in C99
Loading...
Searching...
No Matches
Voxel Rendering Architecture

The system utilizes a "Pure Procedural" GPU-driven pipeline to render massive voxel environments with minimal CPU overhead. By leveraging Vulkan 1.3 features such as Multi-Draw Indirect (MDI) and Buffer Device Address (BDA), the engine can synthesize geometry mathematically on the GPU rather than streaming it from host memory.

Comparison: Procedural vs. Traditional Meshing

Most voxel engines (like Minecraft) use CPU-side Meshing. The procedural approach used in SBgl prioritizes memory bandwidth and CPU cycles over per-face culling.

Feature CPU-Side Meshing (Traditional) SBgl Procedural Pipeline
Data Flow CPU generates mesh -> GPU uploads VBO GPU synthesizes mesh from metadata
Memory Usage High (stores vertices for every chunk) Near-Zero (only stores height/metadata)
CPU Load High (Greedy meshing, face culling) Zero (CPU only submits a draw count)
GPU Load Low (pre-culled geometry) Moderate (renders all cube faces)
Responsibility CPU manages geometry GPU manages geometry

While the procedural method renders more triangles (as internal faces are not culled by the CPU), modern GPUs are highly efficient at processing vertices. The trade-off is a massive reduction in memory bandwidth and CPU stalls during chunk loading.

Procedural Geometry Synthesis

Traditional rendering requires the CPU to upload vertex and index data for every mesh. The voxel system eliminates this bandwidth bottleneck by synthesizing the cube topology directly in the Vertex Shader using the gl_VertexIndex built-in variable.

Implementation Snippet

The following logic (simplified from voxel.vert) demonstrates how a single index can be decoded into a 3D cube vertex:

// Cube topology defined as a local lookup table
const vec3 CUBE_VERTS[8] = vec3[](
vec3(-0.5, -0.5, -0.5), vec3( 0.5, -0.5, -0.5), ...
);
const int CUBE_INDICES[36] = int[](
0,3,2, 2,1,0, // Back face
...
);
void main() {
// Decode the vertex ID into a specific cube and its local vertex
int cubeId = gl_VertexIndex / 36;
int vertId = gl_VertexIndex % 36;
// Map the cubeId to its 2D position within a chunk
int localX = cubeId % 32;
int localZ = cubeId / 32;
// Fetch the height and synthesize the world position
float worldY = get_height(worldX, worldZ);
vec3 localPos = CUBE_VERTS[CUBE_INDICES[vertId]];
gl_Position = pc.viewProj * vec4(localPos.x + worldX,
localPos.y + worldY,
localPos.z + worldZ, 1.0);
}
int main(void)
Definition batch_main.c:14

Data-Oriented Metadata Passthrough

To maintain performance, the system avoids frequent descriptor set updates. Instead, it utilizes the Instance Storage Buffer as a metadata channel.

  • Chunk Offsets: The current camera chunk coordinates $(X, Z)$ are embedded within the transform field of the first instance entry.
  • Voxel Retrieval: The shader uses a 64-bit GPU virtual address (passed via Push Constants) to read the compact instance indices generated by the Shell Extraction pass. These indices are then decoded into 3D grid coordinates to position the procedural cubes.

Stateful 3D Cached Engine (Voxel3D)

While the procedural heightmap engine is ideal for vast landscapes, SBgl also includes a Stateful 3D Voxel Engine (voxel3D_main) designed for complex, vertically-dense environments (caves, floating islands, buildings).

Architectural Differences

Unlike the heightmap engine which is stateless, the 3D engine utilizes a Chunk Pool system:

  • Persistent Caching: Chunks are generated once and stored in a fixed-capacity GPU slot pool (e.g., 256 slots).
  • LRU Recycling: The system uses a Least Recently Used (LRU) policy to recycle GPU memory as the camera moves through the world.
  • Binary Meshing (Shell Extraction): Instead of rendering every voxel, a compute shader (voxel_mesh.comp) merges contiguous vertical runs of voxels into single elongated instances. This reduces draw overhead by up to 90% in dense structures.

GPU-Driven Pipeline

The 3D engine employs a three-stage GPU pipeline:

  1. Generation Phase: A compute shader samples 3D noise (FBM) and populates a high-density bitmask representing voxel occupancy.
  2. Meshing (Shell) Phase: A second compute pass scans the bitmask and extracts "shells" (visible vertical strips), packing them into instance data buffers.
  3. Culling Phase: A final culling shader performs frustum and distance culling on the entire slot pool, generating an Indirect Draw Buffer for single-call submission.

Memory & Synchronization Requirements

Because the 3D engine is stateful and high-bandwidth, it requires specific configuration:

  • Managed Heap Size: Large 3D environments require significant GPU memory. The SBGL_MANAGED_HEAP_SIZE is configured to 512MB to accommodate the instance and mask buffers for high-radius rendering.
  • Pipeline Barriers: Correct operation depends on strict synchronization between stages.
    • SBGL_BARRIER_COMPUTE_TO_COMPUTE: Used to ensure chunk deactivation (clearing masks) is visible before culling.
    • SBGL_BARRIER_HOST_TO_GRAPHICS: Used to ensure that chunk AABBs updated by the CPU are visible to the vertex shader.

Performance Characteristics

Component Traditional Method SBgl Voxel Method
CPU Submission ~100,000 Draw Calls Exactly 1 MDI Call
Geometry Bandwidth ~4.0 MB / frame Zero (Shader-Synthesized)
Memory Allocation Per-Chunk Buffers Zero (Persistent Unit Cube)

Procedural Shading

Surface normals are approximated in the shader by analyzing the local vertex position relative to the cube center. This allows for dynamic coloring (e.g., Grass tops vs. Dirt sides) without storing normal attributes in memory.

Seamless Tiling & Periodicity

To ensure the procedurally generated world tiles seamlessly across the 2048x2048 heightmap boundaries, the system synchronizes the Perlin noise frequencies with the internal wrapping parameters of the generator.

  • Power-of-Two Frequencies: The base frequency is calculated as $16.0 / WORLD_SIZE$. Utilizing a power-of-two numerator ensures that the period in the noise function's coordinate space is exactly 16. This alignment is critical because the underlying noise implementation (STB Perlin) requires power-of-two wrap periods for mathematical continuity at the edges.
  • Octave-Specific Wrapping: Each octave in the Fractional Brownian Motion (FBM) sum doubles the frequency, which effectively halves the period. To maintain seamlessness, the wrap parameter is dynamically adjusted for each octave (e.g., 16, 32, 64, 128, 256). These wrap values are capped at 256 to remain within the implementation's internal limits while preserving global periodicity.
  • Coordinate Mapping Optimization: The world $(X, Z)$ coordinates are mapped to the noise function's $(X, Y)$ slots, leaving the $ slot constant. This mapping utilizes the most precise coordinate paths in the noise generator, eliminating sub-pixel precision drifts that can occur at high-coordinate wrap boundaries.

Render Distance Control

Render distance is managed by a synchronized "Grid Radius" system between the application and the vertex shader.

  • Grid Radius ($R$): Defines the number of chunks rendered in every direction from the camera chunk.
  • Instance Count: The total number of chunks submitted is calculated as $(R \times 2 + 1)^2$.
  • Shader Synchronization: To ensure data integrity across engine updates, the $R$ value is embedded within the sbgl_InstanceData transform matrix (m[0][2]) rather than push constants.
  • Interactive Control: The render distance can be adjusted in real-time using the SBGL_KEY_EQUAL (+) and SBGL_KEY_MINUS (-) keys.

Performance & Clipping Considerations

Increasing the render distance has a quadratic impact on performance. For example, a radius of 5 renders 121 chunks (~4.4 million triangles), while a radius of 100 renders 40,401 chunks (~1.5 billion triangles).

Additionally, the camera's Far Clipping Plane must be adjusted to accommodate the expanded radius. If the grid extends 3200 units ($100 \times 32$) but the far plane is 2000 units, a large portion of the world will be clipped, resulting in empty space or the background clear color being visible.

By aligning the data structures and mathematical parameters in this manner, the engine eliminates the "cliff" artifacts typically found at procedural world boundaries, resulting in a continuous, infinite-feeling terrain.