
                           P3D Version 1.4
       Truecolor polygon rendering for Allegro 3.0 with WIP Oct 25
                          by Calin Andrian
                        <calin@ibd.dbio.ro>
                        <candrian@geocities.com>
            http://www.geocities.com/SiliconValley/Garage/9955/

        Allegro is a game programming library by Shawn Hargreaves.
            http://www.talula.demon.uk/


DISCLAIMER
#include <std_disclaimer.h>    /* you know the drill */


INTRO

(In this file, by "truecolor" I mean all depths > 8 , everything that
does not use a pallete - or is it palette ?.)

Main features - covered below in this doc:
- truecolor rendering for all depths, using MMX
- scene rendering package, with scanline-oriented z-sorting, very fast
- 3d clipping routine

This started as a truecolor extension of the 3D polygon rendering in
Allegro 3.0. As soon as I released it, the request was: why not for
WIP May 30 ? I downloaded WIP and learnt that it already had some
(limited) code for hicolor rendering. I was already on the move and
replaced the existing code (sorry) with mine, which covers all depths
and all rendering styles.

Later, I came to the current form: the routines are packed in a library
(libp3d.a) that replaces the similar routines in allegro. You don't have
to mess with the Allegro sources any more, and the library is now
working whether you have installed WIP May 30 or not.

Working on my own Allegro project I came over the general problem
of rendering whole 3D scenes, with removing hidden things. I made
my homework and realized that a scan-line algorithm is a good
choice. So I added a complete scene rendering engine to the package.


FILES IN THIS ZIP

p3d.txt         this file

makefile        obvious

libp3d.a        the library

p3d.h           public defines

internal.h      non-public defines

polygon.c       2D polygon() and triangle(), with their helpers

poly3d.c        3D polygon3d(), with their helpers

scene3d.c       3D rendering engine

clip3d.c        3D clipper

cpu.c           new version of Allegro's cpu.c. It also detects 3DNow! CPUs

scanline.s      added all the truecolor code and MMX versions. The code
                generated is a replacement for the original scanline.s .

mmxtest.s       test program for MMX-compatible assembler

test.c          added some menus and tests - see below under
                            "TRUECOLOR 3D PERFORMANCE"


INSTALLING

0.   Prerequisites: DJGPP, ALLEGRO 3.0 with WIP Oct 25 installed. It may
     work with earlier versions of Allegro, but I didn't try.
     NOTE: You will need binutils (i.e. GNU assembler version 2.8.1 or
     higher in order to compile the MMX support.
1.   Unzip all files in a dedicated directory (I use /djgpp/allegro/p3d).
2.A. If you are in a hurry, type "make install" and the pre-built library
     libp3d.a and header p3d.h will go to the "lib" and "include" directory
     of your djgpp installation.
2.B. If you want to compile it yourself, type "make clean" to get rid of
     the precompiled library, followed by "make". This will build everything
     (including the new test.exe) and install the library and the header
     to their places. Test.exe will stay put.
       Compiling the package requires two files from the Allegro package:
     - asmdefs.inc from allegro/src. This file is part of the Allegro
       sources.
     - asmdef.inc from allegro/obj/djgpp. This file is created when you
       compile Allegro.
     Both files are copied by "make" from their locations, presuming
     "allegro" is a subdir of the djgpp base directory (ex: c:\djgpp\allegro).
     If this is not true, you can copy the 2 files to the p3d directory
     and "make" it afterwards, or edit the makefile.
     Test.c will not compile if you don't have WIP May 30. If this is your
     case, instead of "make" you should "make lib" and "make install".
     The makefile detects whether your assembler chokes on MMX opcodes.
     If you don't have binutils 2.8.1 or newer you won't be able to
     compile with MMX support.
3.   Include "p3d.h" after "allegro.h" in the source files that have to
     do with 3D. Link your programs with libp3d.a before liballeg.a.
     Example:     "gcc -o foo.exe foo.o   -lp3d -lalleg"


TRUECOLOR 3D RENDERING

The low-level routines for POLYTYPE_FLAT have been replaced with calls
to normal hline(). This way they will take advantage of accelerated
versions of these functions, if available. This change has a side-effect:
POLYTYPE_FLAT is now sensitive to the graphics drawing mode. If you
want to get the same output as before, you'll have to make sure that
you call solid_mode() before rendering. However, the other modes are
interesting also (especially DRAW_MODE_TRANS). Play with test.exe to
see what I mean.

All standard POLYTYPEs work for all the depths (8, 15, 16, 24, 32):
FLAT, GCOL, GRGB, ATEX, PTEX, ATEX_MASK, PTEX_MASK, ATEX_LIT, PTEX_LIT.
There are two more POLYTYPEs for all depths:
POLYTYPE_ATEX_MASK_LIT and POLYTYPE_PTEX_MASK_LIT, which work as you'd
expect from their names.
Mode-X is not directly supported. If you use mode-X, you'll have to
render to a memory bitmap and blit afterwards to the screen.

The GCOL routines for depths > 8 are mapped to GRGB, but they use the
color in the vertex structure as a normal color (fit for the destination
bitmap), rather than fixed RGB mapped on 0xff0000, 0x00ff00, 0x0000ff
as GRGB routines do. This seemed the most consistent with the behaviour
of the 8-bit GCOL and GRGB routines.

If "cpu_mmx" global variable is nonzero, which normally comes from using
check_cpu(), the following truecolor routines try to take advantage of
MMX: GRGB, ATEX*LIT, PTEX*LIT. The 8-bit GRGB routine will also use MMX.
You can disable this feature by zeroing cpu_mmx after check_cpu(). In
fact, you can play with cpu_mmx anywhere in your program, for example
enabling MMX for GRGB but not for the others.
Using MMX for *LIT routines has a side-effect:
Normally (without MMX), these routines use the blender functions used
also for other lighting functions, set with set_trans_blender() or
set_blender_mode(). The MMX versions only use the rgb value passed
to set_trans_blender() and do the linear interpolation themselves.
Therefore a new set of blender functions passed to set_blender_mode()
is ignored.

If "cpu_3dnow" global variable is nonzero, which normally comes from
using check_cpu(), the truecolor PTEX*LIT routines try to take advantage
of 3DNow! CPUs. As with "cpu_mmx", you can disable this feature by zeroing
"cpu_3dnow".

TRUECOLOR 3D PERFORMANCE

Of course, it all comes to the computer you use. I've tested the
routines using test.exe - the standard Allegro test program, to which
I added some new features:
  MMX mode - in Test
  3D rendering profile - in Test
  Triangle3d - in Graphics - Primitives
  Scene_render - in Graphics - Primitives

See also the Polygon3d - in Graphics - Interactive, which shows you
what you can really do. I recomend looking in test.c. Search for "3d".

Here are some conclusions (I only tested on Pentiums):

- the FLAT routines' speeds are limited by destination bitmap's speed,
  therefore are slower at more bits per pixel (especially if you render
  directly to screen).
- the simple pixel-by-pixel ops (GRGB, ATEX, ATEX_MASK) have constant
  speeds at any depth from 8 to 32, except when your video card is too
  slow for your CPU (on a Cx PR166 with S3 Trio 64, speeds flatten at
  250 triangles per second for all funtions but PTEX_LIT, due to low
  bandwidth of the S3 Trio).
- the CPU and FPU intensive ops (like ATEX_LIT and especially PTEX_LIT)
  are limited by CPU power (even on a PII/300). Maybe 3DNow! enhancement
  could speed them up.
- all above conclusions are based on the MMX-using routines. The
  non-MMX routines have heavy penalties (sometimes half the speed).
  With some exceptions! These CPUs ! You can never get it right for all
  of them ! The Cyrix PR166 and especially the AMD K6 (all frequencies)
  seem to take a lot of time to switch between FPU and MMX. Therefore
  you could find out that PTEX_LIT runs faster when you disable MMX on
  these CPUs. I don't know how more recent Cyrix PRs behave. Probably
  better - my Cyrix Media GX is OK. Intel CPUs seem to have better MMX
  performance. (Please read all these as: MY CODE runs differently on
  different CPUs, due to poor programming).


SCENE RENDERING INTRO

There are 4 important and simple enough approaches to removing hidden
objects (or surfaces):

1. Painter's algorithm - you draw far things first, so that when you
   draw near things you overwrite them. It leads to the depth-sort
   algorithm, which is pretty simple. It has some drawbacks: it can't
   solve some kind of ambiguities without breaking some polygons in
   smaller ones; it draws _everything_ - if you have a lot of polygons
   in your line of sight you draw a _lot_; the complexity of the sorting
   problem grows fast with the number of polygons.

2. Z-buffering - every pixel is recorded along with its z coordinate.
   If the new pixel is farther away, it will be skipped. Andrei Ellman
   has written a z-buffering package for Allegro, called AllegroPak.
   Pros: you don't make any polygon sorting - you just draw everything
   and the low-level routines take care that you see what should be
   visible; it's the only algorithm that directly solves penetrating
   shapes; the complexity of the problem grows linear with the number
   of polygons. Cons: the low-level routines are all more complex, and
   therefore slower; you still draw (or at least process every pixel
   individually) all the pixels of all the polygons.

3. Scan-line algorithms - along each scanline in your screen, you keep
   track of what polygons you are "in" and which is the nearest. This
   status changes only where the scanline crosses some polygon edge.
   So you have to juggle an edge list and a polygon list. And you
   have to sort the edges for each scanline (this can be countered by
   keeping the order of the previous scanline - it won't change much).
   The BIG advantage is that you write each pixel only once. If you
   have a lot of overlapping polygons you can get incredibile speeds
   compared to any of the previous algorithms. This is what I used.

4. Area subdivision - the true "divide and conquer" strategy. Too complex
   to tell you about it here. The nicest from these 4, from the
   programmer's point of view - it's juicy, it's challenging. But it
   expects efficient area drawing low-level routines. Because Allegro
   (due to video memory banking) is scan-line oriented, the performance
   expectations are too low to even try.


SCENE RENDERING IMPLEMENTATION

The scene rendering has approx the following steps:
1. Initialize the scene (set the clip area, clear the bitmap, blit a
   background, etc)
2. Call scene_start()
3. Transform all your points to camera space
4. Clip polygons
5. Project with persp_project()
6. "Draw" polygons with scene_polygon3d() and/or scene_polygon3d_f().
   This doesn't do any actual drawing, only initializes tables.
7. Render all the polygons defined in step 3 to the bitmap with
   scene_render().
8. Overlay some non-3D graphics.
9. Show the bitmap (blit it to screen, flip the page, etc).


The package uses a scanline-oriented algorithm (see above). For each
horizontal line in the viewport an x-sorted edge list is used to keep
track of what polygons are "in" and which is the nearest.
 Vertical coherency is used - the edge list for a scanline is sorted
starting from the previous one - it won't change much.
 With ver 1.2 horizontal coherency is used - a dinamic z-sorted list
is used for polygons that overlap.
 I've tried global coherency - when you determine the visibility relation
between two polygons, store it for use in the next scanlines. I gave up
because there is no speed to be gained - the current bottleneck is
somewhere else.
The scene rendering routines use the same low-level asm routines as
normal polygon3d().


Here are the functions:

int scene_start(BITMAP *bmp, int nedge, int npoly);
   Initializes a scene.
   The bitmap bmp is the bitmap you will eventually render on.

   Nedge and npoly are your estimates of how many edges and how many
   polygons you will render (you cannot get over the limit specified
   here). If you use same values in succesive calls, the space will be
   reused (no new malloc()). If you call with nedge=0 or npoly=0, the
   allocated space is freed.
   The memory allocated is a little less than 150 * (nedge + npoly) bytes.

   Returns zero on success, negative numbers if allocations fail.

int scene_polygon3d(int type, BITMAP *texture, int vc, V3D *vtx[]);
int scene_polygon3d_f(int type, BITMAP *texture, int vc, V3D_f *vtx[]);
   Puts a polygon in the rendering list. Nothing is really rendered at
   this moment. Should be called between scene_start() and scene_render().

   Arguments are the same as for polygon3d(), except the bitmap is missing.
   It will be used the one passed to scene_start().

   Unlike polygon3d(), the polygon may be concave or self-intersecting.
   Shapes that penetrate one another may look OK, but they are not
   really handled by this code.
   Note that the texture is stored as a pointer only, and you should
   keep the actual bitmap until scene_render(), where it is used.

   Since the FLAT style is implemented with the low-level hline() funtion,
   the FLAT style is subject to DRAW_MODEs. Only SOLID and TRANS draw
   modes are valid. Along with the polygon, this mode will be stored
   for the rendering moment, and also the pointer to the current color_map
   (for 8 bits) or the _blender_alpha value (passed to set_trans_blender()
   for truecolor).

   The setting of cpu_mmx global variable on entry in this routine
   affects the choice of low-level asm routine that will be used by
   scene_render() for this polygon.

   Returns zero on success, negative numbers if there is no space in
   the edge or polygon tables, positive if it won't be rendered for
   lack of rendering routine.

void scene_render();
   Renders all the specified scene_polygon3d()'s on the bitmap passed
   to scene_start(). Rendering is done one scanline at a time, with
   no pixel being processed more than once.
   Note that between scene_start() and scene_render() you shouldn't
   change the clip rectangle of the destination bitmap. For speed
   reasons, you should set the clip rectangle to the minimum.
   Note also that all the textures passed to scene_polygon3d() are
   stored as pointers only and actually used in scene_render().

extern float scene_gap;
   This number (default value = 100.0) controls the behaviour of the
   z-sorting algorithm. When and edge is very close to another's
   polygon plane, there is an interval of uncertainty inside which
   you cannot tell which object is visible (which z is is smaller).
   This is due to numerical cummulative errors for edges that
   have undergone a lot of transformations and interpolations.

   The default value means that if the 1/z values (in projected space)
   differ by only 1/100 (one percent), they are considered to be
   equal and the x-slopes of the planes are used to find out which
   plane is getting closer when we move to the right.

   Larger values means narrower margins, and increasing the chance of
   missing true adjacent edges/planes. Smaller values means larger
   margins, and increasing the chance of mistaking close polygons for
   adjacent ones. The value of 100 is close to the optimum. However,
   the optimum shifts slightly with resolution, and may be
   application-dependent. It's for you to fine-tune. Mail me if you
   have a better solution.


SCENE RENDERING PITFALLS

The plane's equation is computed from the first 3 vertices of the
polygon. Make sure that they are not colinear, or you will probably
get FPU exceptions.

Unlike polygon3d(), scene_polygon3d() requires valid z coordinates
for all vertices, regardless of rendering style (unlike polygon3d(),
which only uses z coordinate for *PTEX*).

Don't build the colors for GRGB styles with makecol24(), as this routine
builds a hardware-dependent color. GRGB requires a hardware-independent
color with the RGB in this order. Use something like:
    color = (r << 16) | (g << 8) | b
with r, g, b between 0 and 255.

All polygons passed to scene_polygon3d() have to be persp_project()'ed.

The valid DRAW_MODEs that change the behaviour of POLYTYPE_FLAT are only
SOLID and TRANS. After scene_render() the mode is reset to SOLID.


SCENE RENDERING PERFORMANCE

You can play with test.exe and see what happens with simple rendering
styles (FLAT), medium (GRGB) and very CPU-intensive (PTEX_LIT). You
will see that the slow video card, the performance loss due to
truecolor mode, the performance penalty of tough (but sometimes
unavoidable) rendering styles, are all compensated by the fact that
the algorithm doesn't process a pixel twice.
There is another source of speed: the video banks are changed only
once per scanline, not once per scanline per polygon. With VESA 1 for
example this may be important.

On the other hand, using a lot of *MASK* polygons drastically reduces
performance, because when a MASKed polygon is the first in line of
sight, the polygons underneath have to be drawn too. Same applies to
FLAT polygons drawn with DRAW_MODE_TRANS.

Example:
A Cyrix PR166 + S3 Trio renders 186 normal PTEX_LITs per second, in
32 bit mode. This means that you can hope (at the strange resolution
of 512 x 256 that test.exe uses for testing) for 4 frames per second
with 50 triangles per frame. But the scene test shows 717 triangles
per second at 50 triangles per scene (and frame). So you may even
get to 14 frames per second !
On the other hand, the same machine renders 1813 FLATs per second
in 8 bit mode. Scene rendering shows around 1500 no matter how many
triangles there are in a frame. You lose a little because of the
extra processing in sorting and testing edges, compared to straight
FLAT polygons which draw _very_ fast.

Problem:
Let's say your scene is in a room. You know from the start that there
are two polygons (two walls) which are behind everything else, and they
have to be rendered with PTEX_LIT (very plausible). What's best ?
Render the walls the normal way, and start/fill/render the scene
afterwards, or include the walls in the scene with all your furniture
and characters ?
My answer is: there is no unique solution for all hardware out there.
The program should make smart choices, or let the user make his/hers
own dumb ones.


CLIPPING ROUTINE

int clip3d_f(int type, float min_z, float max_z, int vc, V3D_f *vtx[],
             V3D_f *vout[], V3D_f *vtmp[], int outflags[]);
   Clips the polygon given in vtx. The number of vertices is vc. The
   result goes in vout. Vtmp and outflags are needed for internal purposes.
   Vtx, vout, vtmp and outflags must be present and the pointers in vtx,
   vout and vtmp must point to valid V3D_f structures. The size of vout
   vtmp and outflags should be at least vc * 2 (additional vertices may
   appear in the process of clipping).

   The frustum (viewing volume) is defined by -z < x < z, -z < y < z,
   0 < min_z < z < max_z. If max_z <= min_z the z < max_z clipping
   is not done. As you can see, clipping is done in the camera space,
   with perpective in mind. It means that this routine should be called
   after you applied the camera matrix, but before the perspective
   projection.

   The routine will correctly interpolate u, v, and c in the vertex
   structure. However, no provision is made for high/truecolor GCOL.
   Do not use this combination.

   The routine doesn't check for trivial cases. You are supposed to
   do this before calling it. You should at least check whether the
   polygon is totally on one side of the frustum and discard it, and
   whether it is totally inside the frustum and skip calling the
   clipping routine.
   Hint: the 2D renderers are able to clip polygons in the range
   -8000 < x < 8000, -8000 < y < 8000. In camera space the 8000 limit
   corresponds to 8000 / viewport_width and 8000 / viewport_height.
   This means that for example, at 640 x 480 you can skip clipping if
   the polygon is inside the volume: -12 < x < 12, -16 < y < 16,
   z > min_z.

   There is no fixed point version (using V3D 's).


CHANGES
From 1.3 to 1.4:
- Corrected a major problem with the scene rendering: I was only taking
into account the closest polygon at all times. But this meant that you
don't see the polygon in the back through the holes of a MASKed polygon
in the front. In fact the fix made the code more elegant.
From 1.2 to 1.3:
- Changed the FLAT modes to use hline().
- Added back 3DNow! routines.
- Split the distribution in more files.
- Automatically check for MMX-capable assembler.
From 1.1 to 1.2:
- Removed the "parallel" argument to scene_start(). There was no real
support for parallel projection. Some other things (too many) had to
be modified too and I quit. Only perspective is supported. Sorry for
the oscillations - sometimes I'm in open loop.
- Rearranged scene_render() for better performance. Now it takes
advantage of horizontal coherency (if the polygons don't penetrate each
other their z-order doesn't change - so overlapping polygons are kept
in a z-sorted list).
- Solved a strange bug related to z-sorting adjacent polygons. It was
due to numerical approximations (you have to leave an uncertainty zone
around zero when you compare two values that have undergone a lot of
transformations).
- Added 3d clipping.

From 1.0 to 1.1:
- Added the "parallel" argument to scene_start(). Previously the 
parallel (!) mode was used. It had no bad effects for perspective,
unless some shapes got too close to the viewing point.
- Re-packed the whole thing. Now it generates an independent library
and doesn't change any of the original Allegro sources.


TODO
- Add a new rendering style: PGRGB (GRGB with perspective - correct
interpolation). I discovered that the warping effects are very strange
when the polygon is 3D clipped (when extra vertices are inserted by
the clipping routine). I'm afraid though that it could be very sloooow.


THANKS
To DJ Delorie for DJGPP
To Shawn Hargreaves for Allegro
To the Allegroids on the mailing list for their interest


APOLOGIES
To Prezemyslav Podsiadly for (almost) totally replacing his code when
I coded this.


DOWNLOAD FROM

My page at
http://www.geocities.com/SiliconValley/Garage/9955/



Try it and tell me what you think,
Calin Andrian
<calin@ibd.dbio.ro>
<candrian@geocities.com>

1998 Nov 06, 20:54 pm
