Why render hidden objects? Cull them with a software depth

Why render hidden objects? Cull them with a
software depth-buffer rasterizer FTW!
Charumathi Chandrasekaran
Graphics Software Engineer
GDC 2013
www.intel.com/software/gdc
Be Bold. Define the Future of Software.
Agenda
•
•
•
•
•
Algorithm overview
Depth Buffer rasterization
Depth testing
Optimizations
Performance results
GDC 2013
www.intel.com/software/gdc
2
Performance
•
•
Occluder size threshold = 1.5
Occludee size threshold = 0.01
Frame rate (fps)
Frame time (ms)
# of draw calls
Objects rendered
Occluders rasterized
Occludees Culled
Depth rasterizer (ms)
Depth test (ms)
Total Cull Time
Gain
GDC 2013
No optimization
SSE
Multi-threading +
Frustum Culling
7.51
133.15
23279
20802
-
19.56
51.12
7360
6494
2X+
Multi-threading +
Frustum Culling +
Depth test Culling
70.11
14.26
1831
1557
9
25468
0.7
0.67
1.37
9X+
www.intel.com/software/gdc
3
Sample Screenshot
GDC 2013
www.intel.com/software/gdc
4
Algorithm Overview
Occluders
Transform
vertices to
screen
space
Bin
triangles
Rasterize binned
triangles to
create depth
buffer
yes
Scene
objects
Occludees
Transform
8 vertices
of AABBox
to screen
space
Rasterize
AABBox
triangles and
depth test
Occluded?
no
GDC 2013
Do not render
Render
www.intel.com/software/gdc
5
Occluders
GDC 2013
www.intel.com/software/gdc
6
Occludees
GDC 2013
www.intel.com/software/gdc
7
Software Depth Buffer Rasterization
• Transform the occluder vertices to screen space on the CPU
• Bin the triangles to the frame buffer tiles
GDC 2013
www.intel.com/software/gdc
8
Pixel Traversal
• Rasterize the pixels within each tile
• Use bounding box traversal
• Rasterize 2x2 blocks for SSE
GDC 2013
www.intel.com/software/gdc
9
Line Equation
f ( x, y ) 
Ax

By

C
0
f ( x, y )  ( y0  y1 ) x  ( x1  x0 ) y  x0 y1  x1 y0  0
+ve
(1 )
-ve
y
(x1,y1)
A  y0  y1
(A,B)
B  x1  x0
p(x,y)
C  x0 y1  x1 y0
x
(x0,y0)
f ( x, y )  0 In; f ( x, y )  0; f ( x, y )  0 Out
GDC 2013
www.intel.com/software/gdc
10
Is pixel inside triangle?
• Triangle edge equation :
P0
line ( P1P2 )  A1 x  B1 y  C1  0
+ve
line ( P0 P1 )  A0 x  B0 y  C0  0
(2)
line ( P2 P0 )  A2 x  B2 y  C2  0
+ve
P2
p(x,y) +ve
P1
• For ‘p’ when all 3 edge equations >=
0 ‘p’ lies inside the triangle
GDC 2013
www.intel.com/software/gdc
11
Incremental pixel evaluation
f ( x, y ) 
Ax

By

C
0
f ( x, y )  ( y0  y1 ) x  ( x1  x0 ) y  x0 y1  x1 y0  0
Compute A, B and C once
Compute f ( x, y ) once
f(x  1,y)  A(x  1 )  By  C
 Ax  By  C  A ( 3 )
f(x  1, y)  f(x,y)  A
(4)
f(x,y  1 )  f(x,y)  B
(5)
GDC 2013
(x, y)
(x+1, y)
(x, y+1) (x+1, y+1)
www.intel.com/software/gdc
12
Triangle Area
1 x1  x0
Area 
2 y1  y0
x2  x1
y2  y1
(4)
P0 (x0,y0)
P2 (x2,y2)
1
 ( y0  y1 ) x2  ( x1  x0 ) y2  x0 y1  x1 y0  (5)
2
f ( x, y )  ( y0  y1 ) x  ( x1  x0 ) y  x0 y1  x1 y0  0 (6)
1
Area  f ( x, y ) at x  x2 , y  y2
2
GDC 2013
P1 (x1,y1)
www.intel.com/software/gdc
13
Cull back facing triangles
• Consider triangles T1 and T2:
1
Area (T1 )  f ( x, y ) at x  x2 , y  y2
2
1

Area (T2 )   f ( x, y )  at x  x '2 , y  y2'
2

P0 (x0,y0)
T1
T2
P2’ (x2’,y2’)
• P2’ is outside the triangle.
• f(x,y) will evaluate to a negative value.
• Cull triangles with area < = 0
GDC 2013
P2 (x2,y2)
P1 (x1,y1)
www.intel.com/software/gdc
14
Depth computation using Barycentric
coordinates
A0
A1
A2
 ,  ,
A
A
A
(7)
P0 (x0,y0)
z0
P2 (x2,y2)
z2
A1
A  A0  A1  A2
0  ( ,  ,  )  1
A2
    1
• Interpolate depth at triangle vertices
z p    z0    z1    z2
GDC 2013
(8)
p
A0
z1
P1 (x1,y1)
www.intel.com/software/gdc
15
CPU Rasterized Depth Buffer
GDC 2013
www.intel.com/software/gdc
16
Axis Aligned Bounding Box
• Use object space axis
aligned bounding box
(AABB)
• All occluders are
treated as occludees
• Transform and
rasterize the AABB
triangles: max 6 front
facing
GDC 2013
www.intel.com/software/gdc
17
Depth Testing
• Depth test the rasterized AABB triangles against the CPU
generated depth buffer.
• Assumption:
– AABB is visible, object inside may also be visible.
• AABB depth testing is conservative.
– May have false positives
• A clipper stage is not implemented.
• Objects clipped by near clip plane are marked visible.
GDC 2013
www.intel.com/software/gdc
18
Find near plane clipped objects
• Use homogeneous coordinate ‘w’ of
the AABB
• For objects in front of camera w > 1.0
• If any occludee BB vertex has w < 1.0
– object is clipped by near clip plane
GDC 2013
near plane
far plane
Wnear = 1.0
www.intel.com/software/gdc
19
Optimizations
•
•
•
•
•
•
•
GDC 2013
Binning
Frustum Culling
Vectorization with SSE
Multithreading
Pipelining
Occluder Size Threshold
Occludee Size Threshold
www.intel.com/software/gdc
20
Occluder / Occludee size threshold
h=r
• Avoid processing occluder / occludee
if their screen space size is too small
r1
1

 tan FOV 
w1
2

h
r
(9)




r

  threshold value ( 10 )
if

1

 w  tan FOV  
2


small  true
r1
1/2FOV
FOV
r2
w1
w2
www.intel.com/software/gdc
21
Scene 1
GDC 2013
www.intel.com/software/gdc
22
Performance scene 1
•
•
Occluder size threshold = 1.5
Occludee size threshold = 0.01
Frame rate (fps)
Frame time (ms)
# of draw calls
Objects rendered
Occluders rasterized
Occludees Culled
Depth rasterizer (ms)
Depth test (ms)
Total Cull Time
Gain
GDC 2013
No optimization
SSE
Multi-threading +
Frustum Culling
7.51
133.15
23279
20802
-
19.56
51.12
7360
6494
2X+
Multi-threading +
Frustum Culling +
Depth test Culling
70.11
14.26
1831
1557
9
25468
0.71
0.67
1.38
9X+
www.intel.com/software/gdc
23
Scene 2
GDC 2013
www.intel.com/software/gdc
24
Performance scene 2
•
•
Frame rate (fps)
Frame time (ms)
# of draw calls
Objects rendered
Occluders rasterized
Occludees Culled
Depth rasterizer (ms)
Depth test (ms)
Total Cull Time
Gain
GDC 2013
Occluder size threshold = 1.5
Occludee size threshold = 0.01
No optimization
Multi-threading +
Frustum Culling
9.07
110.25
19073
16698
-
9.22
108.45
18893
16518
1.01X+
Multi-threading +
Frustum Culling +
Depth test Culling
11.67
85.68
14443
12651
11
14374
1.05
0.94
1.99
1.28X+
www.intel.com/software/gdc
25
Future Work
•
•
•
•
•
GDC 2013
Compare against a non-binned rasterizer
Experiment with smaller depth buffer
Vary number of tiles in the binned version
Use AVX2 to vectorize rasterizer
Implement DrawIndexedInstanced() call
www.intel.com/software/gdc
26
Complete Solution
•
•
•
•
•
GDC 2013
Automatic occluder simplification
Fixed memory and bounded CPU time
Static and dynamic occluders and occludees
Shadow caster culling
Streaming
www.intel.com/software/gdc
27
Acknowledgements
• Engineers:
– Doug Mcnabb - Team Tech Lead
– David Houlton - Sr Graphics Software Engineer
• Artist:
– Glen Lewis
– Project Offset artists
• Fabian Giesen
– http://fgiesen.wordpress.com/2013/02/17/optimizing-sw-occlusionculling-index/
GDC 2013
www.intel.com/software/gdc
28
Questions ?
• Sample download
– http://software.intel.com/gamecode
– www.intel.com/software/gdc
• Charu Chandrasekaran
– [email protected]
• Doug Mcnabb
– [email protected]
GDC 2013
www.intel.com/software/gdc
29
Legal Disclaimers
INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL® PRODUCTS. EXCEPT AS PROVIDED IN INTEL'S TERMS AND CONDITIONS OF SALE FOR
SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER, AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY RELATING TO SALE AND/OR USE OF
INTEL PRODUCTS, INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY
PATENT, COPYRIGHT, OR OTHER INTELLECTUAL PROPERTY RIGHT.
Intel products are not intended for use in medical, life saving, life sustaining, critical control or safety systems, or in nuclear facility applications.
Intel Corporation may have patents or pending patent applications, trademarks, copyrights, or other intellectual property rights that relate to the presented subject
matter. The furnishing of documents and other materials and information does not provide any license, express or implied, by estoppel or otherwise, to any
such patents, trademarks, copyrights, or other intellectual property rights.
Intel may make changes to specifications, product descriptions, and plans at any time, without notice.
The Intel processor and/or chipset products referenced in this document may contain design defects or errors known as errata which may cause the product to deviate
from published specifications. Current characterized errata are available on request.
All dates provided are subject to change without notice. All dates specified are target dates, are provided for planning purposes only and are subject to change.
Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries.
* Other names and brands may be claimed as the property of others.
Copyright © 2012, Intel Corporation. All rights reserved.
GDC 2013
www.intel.com/software/gdc
30
GDC 2013
www.intel.com/software/gdc
31