|
SBgl 0.1.0
A graphics framework in C99
|
The system utilizes a "Pure Procedural" GPU-driven pipeline to render massive voxel environments with minimal CPU overhead. By leveraging Vulkan 1.3 features such as Multi-Draw Indirect (MDI) and Buffer Device Address (BDA), the engine can synthesize geometry mathematically on the GPU rather than streaming it from host memory.
Most voxel engines (like Minecraft) use CPU-side Meshing. The procedural approach used in SBgl prioritizes memory bandwidth and CPU cycles over per-face culling.
| Feature | CPU-Side Meshing (Traditional) | SBgl Procedural Pipeline |
|---|---|---|
| Data Flow | CPU generates mesh -> GPU uploads VBO | GPU synthesizes mesh from metadata |
| Memory Usage | High (stores vertices for every chunk) | Near-Zero (only stores height/metadata) |
| CPU Load | High (Greedy meshing, face culling) | Zero (CPU only submits a draw count) |
| GPU Load | Low (pre-culled geometry) | Moderate (renders all cube faces) |
| Responsibility | CPU manages geometry | GPU manages geometry |
While the procedural method renders more triangles (as internal faces are not culled by the CPU), modern GPUs are highly efficient at processing vertices. The trade-off is a massive reduction in memory bandwidth and CPU stalls during chunk loading.
Traditional rendering requires the CPU to upload vertex and index data for every mesh. The voxel system eliminates this bandwidth bottleneck by synthesizing the cube topology directly in the Vertex Shader using the gl_VertexIndex built-in variable.
The following logic (simplified from voxel.vert) demonstrates how a single index can be decoded into a 3D cube vertex:
To maintain performance, the system avoids frequent descriptor set updates. Instead, it utilizes the Instance Storage Buffer as a metadata channel.
transform field of the first instance entry.While the procedural heightmap engine is ideal for vast landscapes, SBgl also includes a Stateful 3D Voxel Engine (voxel3D_main) designed for complex, vertically-dense environments (caves, floating islands, buildings).
Unlike the heightmap engine which is stateless, the 3D engine utilizes a Chunk Pool system:
voxel_mesh.comp) merges contiguous vertical runs of voxels into single elongated instances. This reduces draw overhead by up to 90% in dense structures.The 3D engine employs a three-stage GPU pipeline:
Because the 3D engine is stateful and high-bandwidth, it requires specific configuration:
SBGL_MANAGED_HEAP_SIZE is configured to 512MB to accommodate the instance and mask buffers for high-radius rendering.SBGL_BARRIER_COMPUTE_TO_COMPUTE: Used to ensure chunk deactivation (clearing masks) is visible before culling.SBGL_BARRIER_HOST_TO_GRAPHICS: Used to ensure that chunk AABBs updated by the CPU are visible to the vertex shader.| Component | Traditional Method | SBgl Voxel Method |
|---|---|---|
| CPU Submission | ~100,000 Draw Calls | Exactly 1 MDI Call |
| Geometry Bandwidth | ~4.0 MB / frame | Zero (Shader-Synthesized) |
| Memory Allocation | Per-Chunk Buffers | Zero (Persistent Unit Cube) |
Surface normals are approximated in the shader by analyzing the local vertex position relative to the cube center. This allows for dynamic coloring (e.g., Grass tops vs. Dirt sides) without storing normal attributes in memory.
To ensure the procedurally generated world tiles seamlessly across the 2048x2048 heightmap boundaries, the system synchronizes the Perlin noise frequencies with the internal wrapping parameters of the generator.
wrap parameter is dynamically adjusted for each octave (e.g., 16, 32, 64, 128, 256). These wrap values are capped at 256 to remain within the implementation's internal limits while preserving global periodicity.Render distance is managed by a synchronized "Grid Radius" system between the application and the vertex shader.
sbgl_InstanceData transform matrix (m[0][2]) rather than push constants.SBGL_KEY_EQUAL (+) and SBGL_KEY_MINUS (-) keys.Increasing the render distance has a quadratic impact on performance. For example, a radius of 5 renders 121 chunks (~4.4 million triangles), while a radius of 100 renders 40,401 chunks (~1.5 billion triangles).
Additionally, the camera's Far Clipping Plane must be adjusted to accommodate the expanded radius. If the grid extends 3200 units ($100 \times 32$) but the far plane is 2000 units, a large portion of the world will be clipped, resulting in empty space or the background clear color being visible.
By aligning the data structures and mathematical parameters in this manner, the engine eliminates the "cliff" artifacts typically found at procedural world boundaries, resulting in a continuous, infinite-feeling terrain.