This benchmark page is pretty sprase--just three boards: Trident 9680PCI, S3-Trio64V+ (PCI), and an old Cirrus Logic GD-5428VLB. I have an old Diamond Stealth64 DRAM (S3-864 + SDAC) laying around that'll get around to benchmarking.
For those of you interested, here's some background info on VGA architecture. Most DOS games (for example, Quake, Duke Nukem3D, and Tomb Raider) use your SVGA card as a "dumb framebuffer." (*Framebuffer is the RAM reserved for screen display. Except for UMA systems, framebuffer memory always resides on the display adapter.) In a game like Tomb Raider, a frame of animation is prepared by the software prior to output. Once this "snapshot" is ready, the program copies the rendered screen image from the host's main-memory to the video card's framebuffer. In this context, the software application uses standard CPU read/write instructions to accomplish the transfer of data between application memory and video-display memory. In the phrase "dumb framebuffer," 'dumb' refers to your SVGA card's passive role; your $200 Windows accelerator looks and behaves like a memory array. Of course, the VGA memory-mapped IO region is special in that changes made to the this location are reflected on your CRT.
DOS/SVGA benchmarks like VIDSPEED4 and Scitechsoft's PROFILE measure video-performance in terms of bus-thruput, i.e. how quickly CPU write instructions can store data in the SVGA framebuffer.
Many of you have probably read about the "linear framebuffer" feature of VESA 2.0. To understand the implications of this development, think back to IBM's original PC standard. The original PC ISA architecture could address 1024k of memory. IBM reserved a whopping 128k region (A0000-BFFFF) of this 1mb address-space (the lower 640k was allocated to system RAM) for video usage. The 128k region was divided into two regions for two basic modes of operation: alphanumeric (character) text and bitmap (all-points-addressable) graphics. The reserved graphics-area, which is relevant to gaming applications, totaled 64kbytes (A0000-AFFFF.) As you might imagine, video-adapter configurations quickly outgrew this 128k window. Today, the typical SVGA adapter is equipped with 1mb or 2mb display RAM, roughly 16-32X the size of the original 64k graphics area set aside for PC graphics.
In most game applications, graphics operation are carried out with the framebuffer paradigm. The software (game) performs all graphics calculations, manipulations, and renderings, then copies the complete frame of animation to the graphics adapter's display RAM (A0000-AFFFF.) In MCGA mode, one full-screen of graphics represents 64kb of data, a size which barely fits within the A0000-AFFFF VGA-graphics memory address space. But in the case of SVGA mode(s), for example 640x480x256, 300kb of data obviously does not fit within the 64k VGA-graphics space. The apparent data problem is solved with bank-switching registers resident in the SVGA chipset. Bank switching allows the software application to access the entire 1mb/2mb display framebuffer, one "page" of display memory at a time. The price of this arrangement is high overhead. For example, copying a full 640x480x256 screen would require no fewer than 5 bank switch operations.
Modern PCI/VLB video chipsets support an alternate (superior) addressing mode, commonly known as "linear addressing mode" or "linear framebuffer mode." This mode allows the host (CPU) contiguous access to the entire SVGA framebuffer. This usually works out to an unused area somewhere in the 32-bit memory-address space, where the SVGA chipset can map its entire framebuffer (1, 2, 4, or 8mb). Interestingly enough, some SVGA chipsets have much faster bus-thruput in linear addressing-mode. So in addition to the elimination of the bank switching overhead, byte-byte data transfer is quicker. The end-result is turbocharged DOS/SVGA performance. Well, turbocharged when compared to conventional bank-switched SVGA memory access. For example, both the ATI Mach64 and S3 chipsets post much higher DOS/SVGA thruput when tested with linear-addressing enabled (VESA 2.0.) Recent games like Quake, Duke Nukem 3D, Jane's Longbow AH64, F-22 Lightning II, all support linear-framebuffer
(Unfortunately, most legacy DOS games use the older VESA 1.x standard, which did not support linear-addressing. For such games, linear-framebuffer performance is irrelevant.)
Finally, performance of a given video board depends on the
interface architecture (VLB/PCI/ISA), the motherboard chipset
(because implementation of the bus architectures varies from chipset
to chipset), and the SVGA-chipset's host-interface, and the (SVGA)
board design itself (no two competing video boards are identical.)
I couldn't get Internet Assistant for MS Word95 to co-operate, so I had to remove the boldfaced items. But at least the tables are now in a reasonable font size!
The following benchmarks show a Trident's DOS/SVGA performance at several different DRAM-clock frequencies. These tests were performed on a Microstar SI-1 486PCI motherboard (SiS496/7 chipset, 40MHz PCI clock), AMD 5x86-133 @ 160MHz.
Video chipset display mode VIDSPEED 4 score MCLK (bits per pixel) (32bitWRITE/READ) Settings ---------------------------------------------------------------------- Tr9680PCI-1mb 800x600x16 @ 75Hz 17327 4788 /0 16 7 1 63.64MHz Tr9680PCI-1mb 800x600x16 @ 75Hz 25222 5870 /0 17 6 1 75.17MHz *Tr9680PCI-1mb 800x600x16 @ 75Hz 25844 6478 /0 44 7 0 76.36MHz Tr9680PCI-1mb 640x480x24 @ 60Hz 24528 5266 /0 16 7 1 63.64MHz Tr9680PCI-1mb 640x480x24 @ 60Hz 31647 6324 /0 17 6 1 75.17MHz *Tr9680PCI-1mb 640x480x24 @ 60Hz 31767 6329 /0 44 7 0 76.36MHz Tr9680PCI-2mb 800x600x32 @ 75Hz 18074 5095 /0 16 7 1 63.64MHz Tr9680PCI-2mb 800x600x32 @ 75Hz 25646 7641 /0 17 6 1 75.17MHz *Tr9680PCI-2mb 800x600x32 @ 75Hz 26497 7941 /0 44 7 0 76.36MHz Tr9680PCI-2mb 1024x768x16@ 75Hz 17996 7267 /0 16 7 1 63.64MHz Tr9680PCI-2mb 1024x768x16@ 75Hz 29582 9046 /0 17 6 1 75.17MHz *Tr9680PCI-2mb 1024x768x16@ 75Hz 31234 9140 /0 44 7 0 76.36MHz ---------------------------------------------------------------------- *maximum stable MCLK frequency for the test board The Trident 9680's memory-controller executes 2-cycle EDO timing (X-2-2-2). ! Performance limit of test-motherboard. "32000 byte/sec" seems to be the PCI performance limit of the Microstar SI-1 486PCI test-motherboard. "50000byte/sec" seems to be the PCI limit of the generic TritonVX test-board.
The "Tr9680PCI" test board is a generic video card based on the Trident 9680PCI chipset. First installed megabyte (soldered) is NEC 424210A-60 (EDO SOJ 256x16.) At 800x600, the board was outputting its maximum refresh rates (for the respective configurations, 1mb & 2mb.) The board's power-on default is 75.17MHz (EDO.) UNIVBE does not appreciably affect the Trident 9680's VGA/SVGA performance.
MCLK Benchmarks performed with VIDSPEED 4.0 on a 486PCI Microstar SI-1 motherboard, AMD 5x86-133 (overclocked to 160MHz) SiS496/497 B5 PCI chipset, 40MHz PCI clock, 256k WB async cache, 16mb FPM 60ns main memory
"M13speed" benchmarks performed with VIDSPEED 4.0 on two motherboards. 486PCI (SiS496/497) motherboard, as described above; and a generic "i430VX mainboard" Triton2-VX AMD K5-PR90 @ 75MHz (37.5MHz PCI clock.)
The following benchmarks show MCLK's effect on a generic S3-765
(Trio64V+) video board. The SVGA card was equipped with 2mb (first
mb = Siemens HYB514265J-50,
2nd mb = SEC KM416C256BJ-6)
EDO DRAM. Tested on a Microstar
SI-1 (486PCI SiS496/497 B5)
motherboard AMD 5x86 @ 160MHz (40MHz PCI clock.) It looks like
the 486PCI board has a relatively low performance-ceiling on the
PCI-bus (32000 bytes/sec.)
Note:: "1-EDO" refers to 1-cycle EDO timing (X-1-1-1)
"2-EDO" refers to 2-cycle EDO timing (X-2-2-2)
Board display mode VIDSPEED 4 score MCLK profile# (bits per pixel) (32bitWRITE/READ) Setting ---------------------------------------------------------------------- 2A1 800x600x16 @ 75Hz 8763/1218 #20914/3867 60MHz 2-EDO 1mb 2B1 800x600x16 @ 75Hz ! 15056/2153 #32009/7408 75MHz 2-EDO 1A1 800x600x16 @ 75Hz ! 12145/1909 #32009/6478 50MHz 1-EDO 1B1 800x600x16 @ 75Hz ! 18172/2567 #32009/7152 64MHz 1-EDO 2A1 800x600x16 @ 85Hz 6603/0840 #13301/2446 60MHz 2-EDO 2B1 800x600x16 @ 85Hz ! 29170/2153 #32009/6790 75MHz 2-EDO 2C1 800x600x16 @ 85Hz ! 15803/2021 #32009/7003 80MHz 2-EDO 1A1 800x600x16 @ 85Hz ! 10800/1766 #32009/6228 50MHz 1-EDO 1B1 800x600x16 @ 85Hz ! 16908/2417 #32009/6793 64MHz 1-EDO (above- 1mb configurations, below - 2mb configurations) 2A 320x200x8 @ 70Hz ! 32000/4786 #32000/7360 60MHz 2-EDO 2mb 1B 320x200x8 @ 70Hz ! 32000/4874 #32000/7606 60MHz 1-EDO 2A 1024x768x16@ 75Hz 11100/1830 #31351/6392 60MHz 2-EDO 1A 1024x768x16@ 75Hz ! 12406/2096 #32009/6539 50MHz 1-EDO 1B 1024x768x16@ 75Hz ! 16673/2535 #32009/6539 60MHz 1-EDO 2A 800x600x32 @ 85Hz 7065/949 #14989/2709 60MHz 2-EDO 1A 800x600x32 @ 85Hz 9984/1842 #29221/6392 50MHz 1-EDO 2C 800x600x32 @ 85Hz ! 14459/2147 #32009/7211 80MHz 2-EDO 1B 800x600x32 @ 85Hz ! 13908/2321 #32000/6974 60MHz 1-EDO # S3SPDUP loaded, S3SPDUP seems to make a huge difference. UNIVBE offers a similar performance enhancement to S3 SVGA modes. ! Performance limit of test-motherboard. "32000 byte/sec" seems to be the PCI performance limit of the 486PCI test-motherboard. Board profiles (MCLK commands) (Description) ---------------------------------------------------------------------- 2A1 S3-765B 1mb /0 65 2 2 /1 2 "59.96MHz 2-cycle EDO" (S3 rated max) 2B1 S3-765B 1mb /0 124 4 2 /1 2 "75.17MHz 2-cycle EDO" !overclocked 2C1 S3-765B 1mb /0 87 2 2 /1 2 "79.64MHz 2-cycle EDO" !overclocked 1A1 S3-765B 1mb /0 54 2 2 /1 0 "50.11MHz 1-cycle EDO" (S3 rated max) 1B1 S3-765B 1mb /0 70 2 2 /1 0 "64.43MHz 1-cycle EDO" !overclocked (above- 1mb configurations, below - 2mb configurations) 2A2 S3-765B 2mb /0 65 2 2 /1 2 "59.96MHz 2-cycle EDO" (S3 rated max) 2B2 S3-765B 2mb /0 124 4 2 /1 2 "75.17MHz 2-cycle EDO" !overclocked 2C2 S3-765B 2mb /0 87 2 2 /1 2 "79.64MHz 2-cycle EDO" !overclocked 1A2 S3-765B 2mb /0 54 2 2 /1 0 "50.11MHz 1-cycle EDO" (S3 rated max) 1B2 S3-765B 2mb /0 65 2 2 /1 0 "59.96MHz 1-cycle EDO" !overclocked *peak PCI performance of test motherboard (486PCI SiS496/497, 40MHz PCI)
The following benchmarks show the benchmarks of the same Trio64V+,
except run on a different motherboard. Tested on a generic "i430VX
Mainboard" Triton 430VX (Triton2-VX) chipset motherboard,
AMD K5-PR90 @ 75MHz (37.5MHz PCI clock.) It looks like my generic
Triton 430VX motherboard maxes out at 52000bytes/sec. To compare
my setup with "standard" Pentium-class PCI systems (i.e.
66MHz bus-speed), I ran a single VIDSPEED benchmark at PCI=33MHz
: 320x200x8 (Trio64V+, no s3spdup) = 42000Write bytes/sec. My
friend's iPent-100 (TritonFX motherboard) and Stealth 2xx1 Video
(Trio64V+ 2mb EDO) perform nearly identically at the same resolution.
display mode VIDSPEED 4 score MCLK (bits per pixel) (32bitWRITE/READ) Setting ---------------------------------------------------------------------- 320x200x8 @ 70Hz ! 47352/4457 #50103/6519 60.11MHz 2-EDO 2mb 1024x768x16@ 75Hz 11093/1802 #35611/6291 60.11MHz 2-EDO 2mb 1024x768x16@ 75Hz ! 19571/2606 #50218/6818 79.64MHz 2-EDO 1024x768x16@ 75Hz 12406/2055 #44491/6048 50.11MHz 1-EDO 1024x768x16@ 75Hz ! 15530/2339 #50281/6303 57.27MHz 1-EDO 800x600x32 @ 85Hz 7065/ 913 #19873/2474 60.11MHz 2-EDO 2mb 800x600x32 @ 85Hz 9978/1778 #34726/6094 50.11MHz 1-EDO 800x600x32 @ 85Hz 14661/2109 #39373/7000 79.64MHz 2-EDO 800x600x32 @ 85Hz 12784/2103 #41932/6404 57.27MHz 1-EDO # S3SPDUP loaded, S3SPDUP seems to make a huge difference. UNIVBE offers a similar performance enhancement to S3 SVGA modes. ! Performance limit of test-motherboard. "50000 byte/sec" seems to be the PCI performance limit of the TritonVX test-motherboard.
display mode VIDSPEED 4 score MCLK (bits per pixel) (32bitWRITE/READ) Settings ---------------------------------------------------------------------- *800x600x32 @ 75Hz 19779 3614 /0 16 7 0 63.64MHz EDO 800x600x32 @ 75Hz 28048 5596 /0 17 6 1 75.17MHz EDO *800x600x32 @ 75Hz 28721 5748 /0 44 7 0 76.36MHz EDO 1024x768x16@ 75Hz 18192 5324 /0 16 7 1 63.64MHz EDO 1024x768x16@ 75Hz 28095 6771 /0 17 6 1 75.17MHz EDO *1024x768x16@ 75Hz 29478 6741 /0 44 7 0 76.36MHz EDO
Mb/video display mode VIDSPEED 4 scores (bits per pixel) (32bitW/R) ---------------------------------------------------------------------- SiS496/Tr9680 320x200 MCGA 14411/4178 SiS496/Tr9680 320x200 "M13SPEED" 32000/9430 i430VX/Tr9680 320x200 MCGA 15031/5263 i430VX/Tr9680 320x200 "M13SPEED" 50103/8701 ! Performance limit of test-motherboard. "50000 byte/sec" seems to be the PCI performance limit of the TritonVX test-motherboard. "32000 byte/sec" seems to be PCI limit of SiS496 test-motherboard.
The following benchmarks show the impact of the Cirrus's power-on default wait-states. Using MCLK, I trimmed the wait-states down to the minimum allowed. I should mention that the GD-5428 (like all GD-542x members) has a 16-bit host-interface. 32-bit host CPU read/writes are transparently handled as two consecutive 16-bit operations.
Video chipset display mode VIDSPEED 4 score MCLK (bits per pixel) (32bitWRITE/READ) Settings ---------------------------------------------------------------------- CL GD-5428 640x480x8 @ 72Hz 11050 5518 /0 28 /2 3 /3 3 CL GD-5428 640x480x8 @ 72Hz 13050 6648 /0 28 /2 0 /3 0 *CL GD-5428 640x480x8 @ 72Hz 13050 6648 /0 32 /2 0 /3 0 CL GD-5428 800x600x16@ 60Hz ???? ???? /0 28 /2 0 /3 0 *CL GD-5428 800x600x16@ 60Hz ???? ???? /0 32 /2 0 /3 0 *maximum stable MCLK frequency for the test board (57MHz)
"????" The test-board GD-5428 is, uh, in a transitive state of location. As soon as my friend returns it, I'll rerun the Cirrus benchmarks.
The test board is a Cirrus GD-5428 VLB, with 1mb 70ns FPM DRAM.
Power-on defaults include 50.11MHz DRAM-clock (mclk /0 28), and
1T IO RDY delay & 5T write-mem delay. Test system is AMD 486SX2-66
and SiS461 VLB motherboard. SiS471 VLB should perform slightly
faster. Of the chipsets that MCLK supports, the Cirrus GD-5426/8/9
are the only ones with adjustable bus delay cycles. Since the
bus-delay cycles affect write transactions, removing them directly
improves DOS/VGA performance.