MCLK benchmark page

DOS/SVGA benchmarks

This benchmark page is pretty sprase--just three boards: Trident 9680PCI, S3-Trio64V+ (PCI), and an old Cirrus Logic GD-5428VLB. I have an old Diamond Stealth64 DRAM (S3-864 + SDAC) laying around that'll get around to benchmarking.

For those of you interested, here's some background info on VGA architecture. Most DOS games (for example, Quake, Duke Nukem3D, and Tomb Raider) use your SVGA card as a "dumb framebuffer." (*Framebuffer is the RAM reserved for screen display. Except for UMA systems, framebuffer memory always resides on the display adapter.) In a game like Tomb Raider, a frame of animation is prepared by the software prior to output. Once this "snapshot" is ready, the program copies the rendered screen image from the host's main-memory to the video card's framebuffer. In this context, the software application uses standard CPU read/write instructions to accomplish the transfer of data between application memory and video-display memory. In the phrase "dumb framebuffer," 'dumb' refers to your SVGA card's passive role; your $200 Windows accelerator looks and behaves like a memory array. Of course, the VGA memory-mapped IO region is special in that changes made to the this location are reflected on your CRT.

DOS/SVGA benchmarks like VIDSPEED4 and Scitechsoft's PROFILE measure video-performance in terms of bus-thruput, i.e. how quickly CPU write instructions can store data in the SVGA framebuffer.

Many of you have probably read about the "linear framebuffer" feature of VESA 2.0. To understand the implications of this development, think back to IBM's original PC standard. The original PC ISA architecture could address 1024k of memory. IBM reserved a whopping 128k region (A0000-BFFFF) of this 1mb address-space (the lower 640k was allocated to system RAM) for video usage. The 128k region was divided into two regions for two basic modes of operation: alphanumeric (character) text and bitmap (all-points-addressable) graphics. The reserved graphics-area, which is relevant to gaming applications, totaled 64kbytes (A0000-AFFFF.) As you might imagine, video-adapter configurations quickly outgrew this 128k window. Today, the typical SVGA adapter is equipped with 1mb or 2mb display RAM, roughly 16-32X the size of the original 64k graphics area set aside for PC graphics.

In most game applications, graphics operation are carried out with the framebuffer paradigm. The software (game) performs all graphics calculations, manipulations, and renderings, then copies the complete frame of animation to the graphics adapter's display RAM (A0000-AFFFF.) In MCGA mode, one full-screen of graphics represents 64kb of data, a size which barely fits within the A0000-AFFFF VGA-graphics memory address space. But in the case of SVGA mode(s), for example 640x480x256, 300kb of data obviously does not fit within the 64k VGA-graphics space. The apparent data problem is solved with bank-switching registers resident in the SVGA chipset. Bank switching allows the software application to access the entire 1mb/2mb display framebuffer, one "page" of display memory at a time. The price of this arrangement is high overhead. For example, copying a full 640x480x256 screen would require no fewer than 5 bank switch operations.

Modern PCI/VLB video chipsets support an alternate (superior) addressing mode, commonly known as "linear addressing mode" or "linear framebuffer mode." This mode allows the host (CPU) contiguous access to the entire SVGA framebuffer. This usually works out to an unused area somewhere in the 32-bit memory-address space, where the SVGA chipset can map its entire framebuffer (1, 2, 4, or 8mb). Interestingly enough, some SVGA chipsets have much faster bus-thruput in linear addressing-mode. So in addition to the elimination of the bank switching overhead, byte-byte data transfer is quicker. The end-result is turbocharged DOS/SVGA performance. Well, turbocharged when compared to conventional bank-switched SVGA memory access. For example, both the ATI Mach64 and S3 chipsets post much higher DOS/SVGA thruput when tested with linear-addressing enabled (VESA 2.0.) Recent games like Quake, Duke Nukem 3D, Jane's Longbow AH64, F-22 Lightning II, all support linear-framebuffer

(Unfortunately, most legacy DOS games use the older VESA 1.x standard, which did not support linear-addressing. For such games, linear-framebuffer performance is irrelevant.)

Finally, performance of a given video board depends on the interface architecture (VLB/PCI/ISA), the motherboard chipset (because implementation of the bus architectures varies from chipset to chipset), and the SVGA-chipset's host-interface, and the (SVGA) board design itself (no two competing video boards are identical.)

I couldn't get Internet Assistant for MS Word95 to co-operate, so I had to remove the boldfaced items. But at least the tables are now in a reasonable font size!

VIDSPEED 4.0 benchmark for Trident9680 (PCI)

The following benchmarks show a Trident's DOS/SVGA performance at several different DRAM-clock frequencies. These tests were performed on a Microstar SI-1 486PCI motherboard (SiS496/7 chipset, 40MHz PCI clock), AMD 5x86-133 @ 160MHz.

MCLK

Video chipset    display mode       VIDSPEED 4 score    MCLK
                 (bits per pixel)   (32bitWRITE/READ)  Settings
----------------------------------------------------------------------
 Tr9680PCI-1mb  800x600x16 @ 75Hz   17327  4788       /0 16 7 1 63.64MHz
 Tr9680PCI-1mb  800x600x16 @ 75Hz   25222  5870       /0 17 6 1 75.17MHz
*Tr9680PCI-1mb  800x600x16 @ 75Hz   25844  6478       /0 44 7 0 76.36MHz
 Tr9680PCI-1mb  640x480x24 @ 60Hz   24528  5266       /0 16 7 1 63.64MHz
 Tr9680PCI-1mb  640x480x24 @ 60Hz   31647  6324       /0 17 6 1 75.17MHz
*Tr9680PCI-1mb  640x480x24 @ 60Hz   31767  6329       /0 44 7 0 76.36MHz
 Tr9680PCI-2mb  800x600x32 @ 75Hz   18074  5095       /0 16 7 1 63.64MHz
 Tr9680PCI-2mb  800x600x32 @ 75Hz   25646  7641       /0 17 6 1 75.17MHz
*Tr9680PCI-2mb  800x600x32 @ 75Hz   26497  7941       /0 44 7 0 76.36MHz
 Tr9680PCI-2mb  1024x768x16@ 75Hz   17996  7267       /0 16 7 1 63.64MHz
 Tr9680PCI-2mb  1024x768x16@ 75Hz   29582  9046       /0 17 6 1 75.17MHz
*Tr9680PCI-2mb  1024x768x16@ 75Hz   31234  9140       /0 44 7 0 76.36MHz
 
----------------------------------------------------------------------
*maximum stable MCLK frequency for the test board
 The Trident 9680's memory-controller executes 2-cycle EDO timing (X-2-2-2).

! Performance limit of test-motherboard.  "32000 byte/sec" seems to be 
    the PCI performance limit of the Microstar SI-1 486PCI test-motherboard.
    "50000byte/sec" seems to be the PCI limit of the generic TritonVX test-board.

The "Tr9680PCI" test board is a generic video card based on the Trident 9680PCI chipset. First installed megabyte (soldered) is NEC 424210A-60 (EDO SOJ 256x16.) At 800x600, the board was outputting its maximum refresh rates (for the respective configurations, 1mb & 2mb.) The board's power-on default is 75.17MHz (EDO.) UNIVBE does not appreciably affect the Trident 9680's VGA/SVGA performance.

MCLK Benchmarks performed with VIDSPEED 4.0 on a 486PCI Microstar SI-1 motherboard, AMD 5x86-133 (overclocked to 160MHz) SiS496/497 B5 PCI chipset, 40MHz PCI clock, 256k WB async cache, 16mb FPM 60ns main memory

"M13speed" benchmarks performed with VIDSPEED 4.0 on two motherboards. 486PCI (SiS496/497) motherboard, as described above; and a generic "i430VX mainboard" Triton2-VX AMD K5-PR90 @ 75MHz (37.5MHz PCI clock.)

VIDSPEED 4.0 for S3 Trio64V+ (PCI)

The following benchmarks show MCLK's effect on a generic S3-765 (Trio64V+) video board. The SVGA card was equipped with 2mb (first mb = Siemens HYB514265J-50, 2nd mb = SEC KM416C256BJ-6) EDO DRAM. Tested on a Microstar SI-1 (486PCI SiS496/497 B5) motherboard AMD 5x86 @ 160MHz (40MHz PCI clock.) It looks like the 486PCI board has a relatively low performance-ceiling on the PCI-bus (32000 bytes/sec.)

Note:: "1-EDO" refers to 1-cycle EDO timing (X-1-1-1)

"2-EDO" refers to 2-cycle EDO timing (X-2-2-2)

Board     display mode        VIDSPEED 4 score        MCLK
profile# (bits per pixel)    (32bitWRITE/READ)        Setting
----------------------------------------------------------------------
2A1     800x600x16 @ 75Hz     8763/1218 #20914/3867   60MHz 2-EDO 1mb
2B1     800x600x16 @ 75Hz  ! 15056/2153 #32009/7408   75MHz 2-EDO
1A1     800x600x16 @ 75Hz  ! 12145/1909 #32009/6478   50MHz 1-EDO
1B1     800x600x16 @ 75Hz  ! 18172/2567 #32009/7152   64MHz 1-EDO
2A1     800x600x16 @ 85Hz     6603/0840 #13301/2446   60MHz 2-EDO
2B1     800x600x16 @ 85Hz  ! 29170/2153 #32009/6790   75MHz 2-EDO
2C1     800x600x16 @ 85Hz  ! 15803/2021 #32009/7003   80MHz 2-EDO
1A1     800x600x16 @ 85Hz  ! 10800/1766 #32009/6228   50MHz 1-EDO
1B1     800x600x16 @ 85Hz  ! 16908/2417 #32009/6793   64MHz 1-EDO
     (above- 1mb configurations, below - 2mb configurations)
2A       320x200x8 @ 70Hz  ! 32000/4786 #32000/7360   60MHz 2-EDO 2mb
1B       320x200x8 @ 70Hz  ! 32000/4874 #32000/7606   60MHz 1-EDO 
2A      1024x768x16@ 75Hz    11100/1830 #31351/6392   60MHz 2-EDO
1A      1024x768x16@ 75Hz  ! 12406/2096 #32009/6539   50MHz 1-EDO
1B      1024x768x16@ 75Hz  ! 16673/2535 #32009/6539   60MHz 1-EDO
2A      800x600x32 @ 85Hz     7065/949  #14989/2709   60MHz 2-EDO
1A      800x600x32 @ 85Hz     9984/1842 #29221/6392   50MHz 1-EDO
2C      800x600x32 @ 85Hz  ! 14459/2147 #32009/7211   80MHz 2-EDO
1B      800x600x32 @ 85Hz  ! 13908/2321 #32000/6974   60MHz 1-EDO

  # S3SPDUP loaded, S3SPDUP seems to make a huge difference.
    UNIVBE offers a similar performance enhancement to S3 SVGA modes.

  ! Performance limit of test-motherboard.  "32000 byte/sec" seems to be 
    the PCI performance limit of the 486PCI test-motherboard.

Board profiles (MCLK commands)  (Description)
----------------------------------------------------------------------
2A1 S3-765B 1mb   /0 65 2 2 /1 2  "59.96MHz 2-cycle EDO" (S3 rated max)
2B1 S3-765B 1mb   /0 124 4 2 /1 2 "75.17MHz 2-cycle EDO" !overclocked
2C1 S3-765B 1mb   /0 87 2 2 /1 2  "79.64MHz 2-cycle EDO" !overclocked
1A1 S3-765B 1mb   /0 54 2 2 /1 0  "50.11MHz 1-cycle EDO" (S3 rated max)
1B1 S3-765B 1mb   /0 70 2 2 /1 0  "64.43MHz 1-cycle EDO" !overclocked
     (above- 1mb configurations, below - 2mb configurations)
2A2 S3-765B 2mb   /0 65 2 2 /1 2  "59.96MHz 2-cycle EDO" (S3 rated max)
2B2 S3-765B 2mb   /0 124 4 2 /1 2 "75.17MHz 2-cycle EDO" !overclocked
2C2 S3-765B 2mb   /0 87 2 2 /1 2  "79.64MHz 2-cycle EDO" !overclocked
1A2 S3-765B 2mb   /0 54 2 2 /1 0  "50.11MHz 1-cycle EDO" (S3 rated max)
1B2 S3-765B 2mb   /0 65 2 2 /1 0  "59.96MHz 1-cycle EDO" !overclocked

*peak PCI performance of test motherboard (486PCI SiS496/497, 40MHz PCI)

VIDSPEED 4.0 benchmarks on TritonVX

The following benchmarks show the benchmarks of the same Trio64V+, except run on a different motherboard. Tested on a generic "i430VX Mainboard" Triton 430VX (Triton2-VX) chipset motherboard, AMD K5-PR90 @ 75MHz (37.5MHz PCI clock.) It looks like my generic Triton 430VX motherboard maxes out at 52000bytes/sec. To compare my setup with "standard" Pentium-class PCI systems (i.e. 66MHz bus-speed), I ran a single VIDSPEED benchmark at PCI=33MHz : 320x200x8 (Trio64V+, no s3spdup) = 42000Write bytes/sec. My friend's iPent-100 (TritonFX motherboard) and Stealth 2xx1 Video (Trio64V+ 2mb EDO) perform nearly identically at the same resolution.

Trio64V+ (2mb EDO)

display mode         VIDSPEED 4 score        MCLK
(bits per pixel)    (32bitWRITE/READ)        Setting
----------------------------------------------------------------------
320x200x8  @ 70Hz  ! 47352/4457 #50103/6519   60.11MHz 2-EDO 2mb
1024x768x16@ 75Hz    11093/1802 #35611/6291   60.11MHz 2-EDO 2mb
1024x768x16@ 75Hz  ! 19571/2606 #50218/6818   79.64MHz 2-EDO
1024x768x16@ 75Hz    12406/2055 #44491/6048   50.11MHz 1-EDO
1024x768x16@ 75Hz  ! 15530/2339 #50281/6303   57.27MHz 1-EDO
800x600x32 @ 85Hz     7065/ 913 #19873/2474   60.11MHz 2-EDO 2mb
800x600x32 @ 85Hz     9978/1778 #34726/6094   50.11MHz 1-EDO
800x600x32 @ 85Hz    14661/2109 #39373/7000   79.64MHz 2-EDO
800x600x32 @ 85Hz    12784/2103 #41932/6404   57.27MHz 1-EDO

  # S3SPDUP loaded, S3SPDUP seems to make a huge difference.
    UNIVBE offers a similar performance enhancement to S3 SVGA modes.

  ! Performance limit of test-motherboard.  "50000 byte/sec" seems to be 
    the PCI performance limit of the TritonVX test-motherboard.

Trident 9680PCI (2mb EDO)

 display mode       VIDSPEED 4 score    MCLK
(bits per pixel)   (32bitWRITE/READ)  Settings
 ----------------------------------------------------------------------
*800x600x32 @ 75Hz   19779  3614       /0 16 7 0 63.64MHz EDO
 800x600x32 @ 75Hz   28048  5596       /0 17 6 1 75.17MHz EDO
*800x600x32 @ 75Hz   28721  5748       /0 44 7 0 76.36MHz EDO
 1024x768x16@ 75Hz   18192  5324       /0 16 7 1 63.64MHz EDO
 1024x768x16@ 75Hz   28095  6771       /0 17 6 1 75.17MHz EDO
*1024x768x16@ 75Hz   29478  6741       /0 44 7 0 76.36MHz EDO

Trident 9680PCI with "m13speed"

Mb/video         display mode       VIDSPEED 4 scores
                 (bits per pixel)   (32bitW/R)   
----------------------------------------------------------------------
SiS496/Tr9680    320x200 MCGA       14411/4178  
SiS496/Tr9680    320x200 "M13SPEED" 32000/9430
i430VX/Tr9680    320x200 MCGA       15031/5263
i430VX/Tr9680    320x200 "M13SPEED" 50103/8701 

! Performance limit of test-motherboard.  "50000 byte/sec" seems to be 
    the PCI performance limit of the TritonVX test-motherboard.
    "32000 byte/sec" seems to be PCI limit of SiS496 test-motherboard.

VIDSPEED 4.0 for Cirrus Logic GD-5428 (VLB)

The following benchmarks show the impact of the Cirrus's power-on default wait-states. Using MCLK, I trimmed the wait-states down to the minimum allowed. I should mention that the GD-5428 (like all GD-542x members) has a 16-bit host-interface. 32-bit host CPU read/writes are transparently handled as two consecutive 16-bit operations.

Video chipset    display mode        VIDSPEED 4 score    MCLK
                 (bits per pixel)    (32bitWRITE/READ) Settings
----------------------------------------------------------------------
 CL GD-5428      640x480x8 @ 72Hz     11050   5518     /0 28 /2 3 /3 3
 CL GD-5428      640x480x8 @ 72Hz     13050   6648     /0 28 /2 0 /3 0
*CL GD-5428      640x480x8 @ 72Hz     13050   6648     /0 32 /2 0 /3 0
 CL GD-5428      800x600x16@ 60Hz     ????    ????     /0 28 /2 0 /3 0
*CL GD-5428      800x600x16@ 60Hz     ????    ????     /0 32 /2 0 /3 0
 *maximum stable MCLK frequency for the test board (57MHz)

"????" The test-board GD-5428 is, uh, in a transitive state of location. As soon as my friend returns it, I'll rerun the Cirrus benchmarks.

The test board is a Cirrus GD-5428 VLB, with 1mb 70ns FPM DRAM. Power-on defaults include 50.11MHz DRAM-clock (mclk /0 28), and 1T IO RDY delay & 5T write-mem delay. Test system is AMD 486SX2-66 and SiS461 VLB motherboard. SiS471 VLB should perform slightly faster. Of the chipsets that MCLK supports, the Cirrus GD-5426/8/9 are the only ones with adjustable bus delay cycles. Since the bus-delay cycles affect write transactions, removing them directly improves DOS/VGA performance.

reference material for various benchmark programs may be found at http://www.sysopt.com or Tom's Hardware Homepage

last updated 02/27/97

Hosted by www.Geocities.ws