Multi-Pass Forward Rendering

This rendering path is the conventional rendering approach used by the majority of real-time graphics applications. It is trivial to implement, very powerful and usually quite efficient. The problem lies in the overhead of executing multiple passes - especially as the main reason for multiple passes is the shading rather than geometry processing.

The key design detail is in the multiple-lights-with-multiple-passes method. Dynamic looping in a pixel shader has traditionally been a risky gamble as not all architectures have been good with this type of execution. However, looping over a set of lights may still be faster than looping over a number of passes.

By using a working set of N lights it is possible to render up to N light's worth of contributions per pass. If more than N lights are used then multiple passes will be required. This should be efficient and it should also allow for flexible code that scales across all hardware classes.

For example: if the working set per pass is limited to 4 lights then 10 active lights would require 3 passes (442), but if only 3 lights were active it would require a single pass. If a larger working set were allowed say 10 lights, it is possible to experiment with other combinations - 8 lights being rendered as 8, 4-4, 2-2-2-2, 1-1-1-1-1-1-1-1 and so on...

The exact number of lights per set is yet to be determined and will be limited by the number of per-light attributes that can fit into a Direct3D 10 cbuffer.

Flow Diagram

The following Visio diagram encapsulates the architecture of this rendering path as it is currently designed. At the time of writing this architecture has not been implemented and is therefore subject to change!


Expected Results

As described in a previous section the combinatorial options of lights-per-pass should allow empirical (or even automated) detection of a 'sweet spot' for performance. This could vary from GPU-to-GPU.

CPU-geometry requires simpler VS and therefore may perform faster than the instanced route. The initial geometry-generation phases is very expensive for the CPU-route but once the geometry is generated it can be blasted down the pipeline any number of times with little difficulty. The instanced route is almost the opposite - very low setup cost but a much more expensive vertex shader. Initially the instanced route will be faster due to being entirely GPU based, but at some level the one-time CPU overhead will be less than the continuous overhead of instanced rendering.

Last edited Aug 31, 2007 at 12:07 PM by JHoxley, version 2


No comments yet.