| I started writing this very recently, therefore it is
not complete.
However, I find some of the ideas interesting.
I did think a lot on this, and here is the solution I came up
with.
I am assuming that we have T&L hardware, and the world is in
fast memory (AGP or video ram).
Culling for rendering:
Yes, we shall have a very fast culling tool. I am assuming we
have many cellars, dungeons, rooms, small objects, etc., probably
connected to each other using tunnels, doors, whatever. My first
idea was to use an octree or kd-tree on the entire scene. To avoid
problems with blending/multipass algorithms, etc., we have to assure
that a triangle appears in only a single node. To do this, we may
either use a loose-octree ( as described by Thatcher Ulrich ), or
clip triangles and split them to oct-tree bounding boxes, then stop
further dividing when we are below, say, 1000 triangles. Both work
fine (but I believe the second to be better ).
The drawback of this method appeared when I first implemented it.
Gettin the nodes in the frustum and rendering them has several
problems. To see, let's assume that the whole scene has two
different shaders, one for walls, one for floor. Then, possibly many
nodes will have triangles with two different shaders. If we have
1000 nodes in frustum, it means spoiling the texture cache 1000
times a frame, when only once should be necessary.
My nodes stored only indices to vertex data, and I stored the
vertex data in one large array ( better than storing all
vertex data in the node). Then, we may either render each node
alone, or combine the indices. And if, when combining indices
into a indice buffer, we sort the indices by shader, then we can
minimize the state changes. However, that introduces a CPU burden,
especially if we have a very large triangle count.
Then I came up with this idea: Every room, corridor, whatever
should be modelled as seperate meshes. Or, if artists want to
use many small meshes to build up a room, ask them to group them,
and give the group a name like 'room_1'. Then, we may cull these
rooms at object level, which will relieve both the GPU and the CPU.
But how to cull? My first idea was frustum culling again, which
would do. Then I noticed that these rooms are 'snapped' to each
other, and their intersection will only form a polygon. We may use
these polygons to use portal-style culling. Given two meshes,
pre-calculate their nearest point, if their distance is below an
epsilon, the two meshes are neighbors. Calculate all
vertice/edge/surface distances below that epsilon, connect these
edges/vertices, and you have the portal. To avoid cracks on joints,
we may need to check the vertices that overlap and equalize their
indices.
Then, render the world using the portal way. Draw the room you
are in - check if portals in that room are visible - if portal is
visible, render the associated rooms - check portals in the
new rooms - continue until all rooms are rendered and no more
portals are visible.
This method is easy to implement, and reduces the fill-rate
burden of the video card to a reasonable minimum, shader switching
to a great low value, and totally eliminates the CPU overhead.
However, because it is not a true portal engine, we will still need
the z-buffer test.
Culling for collision detection:
Assuming each room may have ten thousands of polygons, we cannot
simply check all polygons in the room the player is in. The
oct-tree/kd-tree seems good enough for this. Depending on your
collision and response method, you may or may not want to split
triangles at box boundries. I used the elipsoid method for
first-person for player-world collision. It is fine, but I had a
hard time getting it work right. It is VERY cpu hungry, so I divided
nodes until I reached a very low polygon count. An alternative to
octrees would be an OBB tree, however, creating them is not as easy
as using them.
Culling for Rendering - additions:
This came into my mind while thinking of a way to reduce cost of
per-vertex setup for cubemap based spotlight effect. I was basically
doing a diffuse+lightmap pass, then a second pass with
glBlendFunc(GL_ONE,GL_ONE). My problem was that calculating the
per-vertex texture coords for the second pass costed just too much,
because of the calculation of very dense and high-poly objects that
were not in the spotlight's cone.
Keeping a octree-style thing made up of split triangles would
have some benefits. In the cases of projective textures, spotlights,
or any attenuated dynamic light, you will probably have to do
multiple passes. For a blended second-pass for a spotlight that will
only illuminate a quarter of one room is not very nice. Most cards
have very hard times doing blending. Basicly, reading from the
framebuffer slows. Things like reading, multiplying, then writing to
the framebuffer will slow you down to a turtle.
A problem that arise, if we split the triangles to nodes, we will
fail on glDepthTest(GL_EQUAL) tests (we can make sure that our
divisions are not perfect). If we dont' split, some triangles will
be on two different nodes, they may be rendered twice, thus blended
twice, introducing artifacts (like making them 'visible').
Ulrich's loose oct-tree seems to me a very good solution. No
duplicate triangles, no splits.
So, if you are rendering a spotlight effect, get the loose-nodes
in the spot cone, and render them with blending. Assuming that there
will not be many nodes inside the cone, this will not slow down too
much.
Still, I think the use octrees for blending depends on the
complexity of scene. If you are only projecting light to the floor
(quake 3 style) using octrees willl speed up, especially if the rest
of the scene is complex.
Combination / Result:
I render the scene using some portal-like culling. If we
need per-vertex calculations or a second pass that will affect only
a little area, I use loose octrees. I use an oct-tree for
filtering out triangles for collision detection.
Maybe not be perfect way, but works fine. The thing I like with
it is that it allows the artists to design very complex worlds, and
no model must be 'closed'. If you remove/change the portal-based
method, it may even work on polygon soups.
|