Tetra is a high-performance voxel renderer. It was inspired by the popular game Minecraft, but exceeds it in graphical fidelity and performance. It is capable of comfortably rendering billions of voxels on screen at once.
Esc - Exit
W - Move forward
A - Move left
S - Move backwards
D - Move right
Shift - Increase speed
Mouse - Turn
Tetra is an application made to generate and display voxel worlds. Voxels are similar to pixels - both are data points interspersed at a regular interval, with each point having unique data associated with it. In the case of pixels, they are aligned on a 2D rectangular grid. Each pixel has an associated color and is represented by a square. The collection of these colored squares is used to display an image. Voxels on the other hand are aligned on a 3D grid rather than a 2D one and are represented as cubes instead of squares. Each voxel can have a texture associated with it rather than just a color, and the collection of voxels is used to approximate a 3D space or object.
Just as pixels are often used to display pictures taken from a camera, voxels are often used to display scans of real-world 3D objects or spaces. Architects use laser scans to visualize buildings and doctors use CT scans to see inside patients’ bodies. The data points of these 3D scans can be loaded and then displayed digitally (rendered) as voxels. Rather than loading an existing scan, another popular use of voxel renderers is to display infinite digital worlds for people to explore. Although this use is more lighthearted, it is also more complicated as the world must be generated by the application itself rather than simply loaded in.
Noise functions are often used when generating voxel worlds because they output random but contiguous data. They generally work by generating random values and interpolating (smoothly transitioning between) these values. Tetra uses simplex noise because it performs well and is easy to work with for basic terrain generation, but there are many other noise functions (Perlin, Voronoi, Worley) which can be more appropriate for certain uses.
Simplex noise can be generated in any number of dimensions. Below are 2D slices of a 3D simplex noise set where values closer to -1 are represented by black and values closer to 1 are represented by white. Because of the interpolation, there is continuity from one slice to the next, yet the first and last slices are quite different.
There are some inputs to the simplex noise function that can be used to affect its appearance, mainly: the seed (used to create a different random set), frequency (density), and octaves (fractal granularity). Below are examples of different seeds, the frequency being increased, and the octaves being increased, respectively. To generate appealing terrain, Tetra uses many different noise sets each with a different seed and some changes in frequency and density.
A good place to start with voxel terrain generation is to create a heightmap by adding and multiplying different noise sets together. Look at the above noise sets and imagine white values as high terrain and dark values as low terrain. In Tetra, four noise sets are used to create the heightmap. One set is used to create large-scale changes in the height of the ground. Since this needs to consist of large, smooth variations, it has a low frequency and low octave count. The output of the noise function is then multiplied by 50. Remember the simplex noise function outputs values from -1 to 1, so this multiplication means the height of the terrain will now range from -50 to 50 voxels. Next, a little detail is added to the terrain with a noise set that has higher frequency but whose output is only multiplied by 2. Lastly, hills are added. Hills are rougher and more pronounced, so the octave count is higher and the output is multiplied by 100. However, if we add this as is, the whole terrain is filled with hills. To only have hills in certain places, a final noise set must be created. This “hill biome” noise set has a low frequency and a high octave count. Negative output values for the set are discarded, effectively clamping it to a 0 to 1 range where 0 represents an area with no hills and 1 represents an area with many hills. The hill height value is multiplied by the hill biome value to get the desired result.
To translate this heightmap into the voxel world, the 3D grid of voxels is iterated through, comparing the Z (height) location of a given voxel with the heightmap value at that voxel’s X (width), Y (depth) location. If the voxel is less than the heightmap’s value, it is considered to be a visible voxel. If it is above the generated value, the voxel is considered to be empty and is not displayed. Below, you can see effect each of the four noise sets described above has on the voxel world. At the end we have a nice terrain with mountains and flat meadows in between.
2D noise sets allow for manipulation of one axis, such as height. To make more complex features like caves and overhangs which have variation on all three axes, a 3D noise must be used. Tetra has some interesting cliff-like overhangs which jut out of the ground thanks to a 3D noise set. This overhang set has a low frequency but a high octave count. To translate a 3D noise set into voxels, the 3D voxel grid is iterated through and the 3D noise set value corresponding with each given voxel’s X, Y, and Z location is retrieved and compared against a cutoff value. If it is above the cutoff value, the voxel is considered to be visible. Otherwise, the voxel is considered to be empty. Raw 3D noise is quite messy, so the maximum height of the overhang set is limited using the hills set. As the plateaus reach their maximum height, the cutoff value decreases to make the overhangs denser as they reach their top. Below you can see the raw 3D plateau noise (flat on top because they are cut off by the world’s height limit), the height-limited overhangs on their own, and finally the height-limited overhangs added to the heightmap described in the previous section. When added to the heightmap, this 3D noise set creates some interesting formations. There are many possibilities for combining noise to create interesting terrain.
The shape of the terrain is interesting, but it’s looking a bit grey. Luckily, assigning voxels materials and adding details to this base form is relatively simple. For example, a 2D noise “tree density” map can be used to add trees on top of the terrain at random where the output value starts to become greater than a specified threshold. Water can be added by changing all empty voxels below a certain height into water voxels. Nonempty voxels that have empty voxels above them can be grass, and voxels up to three voxels below the grass can be dirt. Visible voxels between a certain depth range that aren’t water can be sand, and all other voxels can be stone. Below is the same scene from the previous section with the rules described above implemented.
We can’t display the entirety of an infinite world at once, so voxel renderers show only a portion of the world. To do this, the world is split into sections, or “chunks”. As the user moves around, chunks that get closer load in and chunks that are too far away are removed. Because loading chunks is an isolated, intensive task, the process is easily multithreaded by having each CPU core load its own chunk. By default, Tetra uses a chunk size of 1,283 (a volume of 2,097,152) voxels and displays 128 chunks at a time. That makes for a total volume of 268,435,456 voxels, which runs well on most laptops. On computers with mid-range GPUs, Tetra can display 2,048 chunks or more at once, making for a grid of 4,294,967,296 or more voxels. Below is an image where the world is in the process of loading in. Some of the lower chunks have loaded in before the chunks on top, giving you an idea of how the world is sectioned.
Computers generally use triangles to render 3D objects. To make a computer’s GPU draw what we want, we first need to give it a series of vertices making up triangles that are properly oriented in 3D space to create a scene. These triangles are called a mesh. Tetra can display a world made of many voxels smoothly thanks to the optimized meshes that are generated when loading each chunk.
First, the voxels in the chunk are separated by texture. For each of these groups, any voxels that are empty or are surrounded on all six sides by nonempty voxels are marked as culled because they are not visible. Next, the vertices for each non-culled voxel are added to the mesh. Finally, the mesh undergoes an optimization process called greedy meshing in which any faces which share the same plane and do not contribute to defining geometry are merged. Example meshes for a 643-voxel chunk are shown below, first with no optimization, then with culling, and finally with culling and greedy meshing. With these optimizations, the mesh goes from having so many triangles that it almost looks solid, to only having the number of triangles necessary to represent the geometry.
Along with the mesh, the GPU must be given shaders. Shaders are small programs that run on the GPU. Usually they take meshes and/or images as input and produce an output image. Tetra uses a series of different shaders to transform the input mesh and textures into an image with realistic lighting.
The first shader is the geometry buffer shader. It takes in all the opaque meshes, matrices that transform the mesh to the position of the camera, and all the textures of the different materials. It outputs four images (position, normal, albedo, and specular) describing everything useful information about the geometry to be used in calculations in subsequent shaders. In the position image, each pixel is effectively a 3D vector, with its red, green, and blue values representing the X, Y, and Z position in 3D space of the geometry at the given pixel. The normal image is similar, except the vectors represent the orientation of the geometry’s faces rather than their position in space. The albedo image is simply a rendering of the mesh with texture applied, and the specular image describes the shininess of the textures (brighter red means shiny, darker red means diffuse).
The second shader is the depth map shader. It takes in the mesh and a matrix that transforms the mesh to the orientation it would be in if looked at from the position of the sun. It outputs a texture describing the depth of the geometry. This is used later for calculating shadows.
The third shader is the translucent shader. It takes in all the translucent meshes and the matrices that transform the mesh to the position of the camera and outputs an image of the translucent geometry.
The fourth shader is the screen-space ambient occlusion (SSAO) shader. It takes in the position and normal textures generated by the first shader as well as an image with a random pattern on it used for dithering. It outputs an image that has corners and other close-together spaces darkened, simulating occlusion of light.
The fifth shader takes the SSAO image and blurs it to create a smoother result.
The sixth shader is the lighting shader. It takes the position, normal, albedo, translucent, SSAO, and depth map and outputs a realistically lit image with shadows, fog and SSAO.
The seventh shader is the bloom shader. It takes the lit image and outputs an image the has the bright areas blurred over six passes.
The eighth and final shader is the composition shader. It takes the lit image and the bloom image, combines them, and applies tone mapping to produce a final render.