Background
When it comes to dynamic lighting, there are two common approaches:
- loop inside pixel shader, calculating contribution of each light;
- render the same geometry multiple times (for each light) using additive blend;
Normally, number of lights that affect a mesh is limited to something like 4 or 8 to save shader instructions or render passes.
Therefore, for each mesh one must calculate which lights can affect it. This can be tricky when dealing with large meshes, like terrain.
When large number of lights affect the same mesh, there can also be a problem with light flickering (when one light is replaced by another).
Deferred shading provides a nice and simple alternative (admittedly at a cost of different kind of problems, described later).
It works by performing lighting after all geometry is rendered. Hence the name.
All lighting function inputs are rendered into several buffers (in a single pass, using multiple render targets) which are then sampled by the light shader.
Implementation details
In the demo, I’m using four fp16 textures with this layout:
RT0: RGB – unlit diffuse colour, Alpha – specular level
RT1: RGB – light accumulation, Alpha – nothing
RT2: RGB – world-space normal, Alpha – nothing
RT3: RGB – world-space, not normalized view vector, Alpha – nothing
RT1 starts off with light contribution from main ambient and directional sources (sun).
Once we rendered our scene into MRTs, we can start applying dynamic lights.
To cull all unaffected pixels, lights are rendered as a convex 3d volumes. Only point lights are shown in the demo, they are rendered as low-poly spheres. Spot lights are also possible, using cones.
It is also possible for each light to cast a shadow. I will explore this topic in the future articles.
Stencil and Z culling
Each light volume is rendered it two passes:
Pass 1:
- Front faces only;
- Colour write disabled;
- No Z-write;
- Z function = Less/Equal;
- Z-Fail writes non-zero value to stencil buffer (increment-saturate);
- Stencil pass & fail don’t modify stencil buffer;
This pass creates a stencil mask for the areas of the light volume that are not occluded by scene geometry.
Pass 2:
- Back-faces only;
- Colour write enabled;
- No Z-write;
- Z function = Greater/Equal;
- Stencil function = Equal (stencil ref = zero);
- Always writes zeo to stencil;
This pass is where lighting actually happens. Every pixel that passes Z and Stencil tests is then added to light accumulation buffer (RT1). Standard Phong function is used in the demo.
Diagram below shows effects of Z and Stencil tests:
Blue – pixels which passes Z test in Pass 1 and have left the stencil buffer intact.
Red – pixels which passed Z test in Pass 2.
Green – pixels which passed Stencil test in Pass 2.
- Light 1 is culled by Z test in Pass 2.
- Light 2 will fail Z test in Pass 1, write to Stencil buffer and then will fail Stencil test in Pass 2.
- Light 3 will partially pass both tests and will go into the pixel shader.
After all lights have been rendered in this way, RT1 contains fully lit scene. It can now be used in post-process effects (like bloom).
That’s it.
Downsides
- Transparent geometry can not be lit in this way. All alpha-blended objects must be rendered into RT1 after deferred dynamic lighting pass, before post-processing. Standard dynamic lighting must be applied to it (loop in the shader and/or multipass).
- No hardware antialiasing.
- High video memory requirements – 4x 64 bits per pixel (fp16) textures at high resolutions is very expensive.
- High bandwidth requirements – a product of the previous point. Each pixel writes 4x data. Each light also samples from 3 textures.
This is especially noticeable when camera is close to a surface with high number(say, 50+) of overlapping lights.
- Requires fp16 blending support.
Some of those problems are possible to solve at some quality cost. I will come back this in the future articles.
Demo – 1024 point lights

[Download]
Demo controls:
WASD – up/down/left/right
EQ – forward/back
ZX – rotate
Arrows/Left mouse button – look around
Space – animate lights
No source code, but demo is NVPerfHud-friendly.