|'Wavey' Dave Baumann|
|2 December 2002|
Frankly, to many Matrox have been looking sickly for some time. Their last major release, G550, did nothing to excite the hardware enthusiast community, with its 'Headcasting' technology almost looked upon with derision. Matrox's much talked of G800, which was touted as being "twice as fast as G400", never materialised, seeing as many other chips had already far exceeded twice the performance of G400. During this period Matrox retreated into their known and trusted workstation/professional market, yet with the big players ever encroaching on all the available graphics markets out there Matrox needed something else.
And so it was that in May 2002 Matrox announced the Parhelia-512 GPU, an all new chip designed to interest gamer and professional alike. However, as soon as the benchmarks hit the web things did not seem all that fine again as the new Matrox entrant had trouble shining in terms of performance against NVIDIA's GeForce4 Ti, let alone what was in store from ATI a few more month down the line. Was this performance down to immaturity of drivers, or was the hardware too slow? In this review we'll take a look at some early drivers and see how things have evolved with their latest driver updates. We'll also see what else Parhelia has to offer that other boards don't...
Note: The 128MB board has 8 BGA memory chips on the front side of the board to facilitate the 128MB 256bit bus. The back side of the board clearly shows the slots where extra BGA chips can be placed directly behind those on the front side of the board to increase the amount of RAM. Indeed Matrox now have a 256MB version of Parhelia available.
Here's a quick rundown of the key chip specifications:
220MHz is relatively slow in comparison to many other boards out there, though at the time of release it was generally thought by many that, as Parhelia was the biggest chip of the time, the 0.15 � process wouldn't allow for much more. This was proved to be incorrect some months later by ATI's behemoth R300 chip. However, with a 256-bit bus, as also used by ATI with Radeon 9700 and 3Dlabs with Wildcat VP, there were high hopes that this would be the answer to what was often thought as the biggest issue with contemporary 3D graphics: memory bandwidth.
Let's have a little look at some of the features that Parhelia has to offer.
Displacement Mapping - Now included as part of the DirectX9 feature set, and developed by Matrox, Parhelia features hardware displacement mapping. This feature uses a texture (Displacement Map) to generate height values that can be applied to tessellated geometry to generate much more complicated scenes or characters. The geometry sent to the graphics board is very simple and along with the displacement map the high detail output is internally generated by the graphics processor, meaning that CPU/system usage is kept to a minimum whilst generating potentially much more complicated output.
Parhelia also features depth adaptive tessellation so that the further the displacement mapped object is into the viewport the lower the tessellation, hence detail, will be. This ensures that the highest level of detail is only applied to the important areas and time is not wasted unnecessarily rendering where it's not needed.
Beyond3D will take a closer look at Displacement Mapping at a later date.
Vertex Shaders - With Parhelia, Matrox have opted to concentrate quite heavily on the geometry end of the pipeline. Parhelia implements a total of four Vertex Shaders, to DX9 Vertex Shader 2.0 specification.
The Vertex Shader array has a 512 instruction cache and 256 constant registers.
Quad Texturing / 64 Sample Texture Filtering - Parhelia offers four texture sample units on each of the four texture pipelines, giving a total of 16 texture sampling units. Matrox also describe this as "64 Super Sample Texture Filtering", though it appears that the 64 figure is in relation to the entire pixel array, meaning each texture unit is capable of four sample filtering.
Pixel Shaders - Unlike the Vertex Shaders, Matrox opted to go for a DX8 compliant Pixel (fragment) Shader pipeline. Presumably, along with 3Dlabs, Matrox felt there wasn't enough room to manoeuvre with 0.15µ, hence floating point pixel pipelines just weren't feasible just yet, thus opting for a hybrid design with DX9 Vertex Shaders (which are presumably far less complex than the pixel shaders) and DX8.1 Pixel Shaders. The Pixel Shader capabilities fall in line with DirectX's PS1.3 version.
Matrox describe this as a 36 stage array with 4 pixel pipes, each with 4 texture stages (units), and a 5 stage Pixel Shader array. This makes 9 stages per pipe, or 36 in total. However, it's unclear exactly what these 5 Pixel Shader stages actually are. We know that ATI have opted for a three stage Pixel Shader pipe line with one texture look-up, one texture address operation, and one colour operation; however, what all these stages do in Parhelia is unclear. The diagram above also indicates that Parhelia can chain two Pixel Shader pipes together to achieve 10 stages on 2 pixels rather than just 5 stages on 4 pixels.
The image above is taken from Matrox's impressive 'Reef' demonstration. The Reef demo utilises all four of Parhelia's texture layers in conjunction with Vertex and Pixel Shaders to show off Parhelia's abilities.
Unlike newer DirectX9 parts, which are going towards at least 64-bit floating point accuracy, both the Vertex and Pixel Shader pipelines of Parhelia have 40-bit accuracy.
16X Fragment Antialiasing (FAA) - Most Super-Sampling or Multi-Sampling AA schemes result in the pixel being sampled multiple times regardless of whether they actually are intersected by polygon edges. In the case of Multi-Sampling the multiple texture sampling can be alleviated for pixels that are not intersected by a polygon edge, but multiple pixel samples are still produced for every pixel resulting in high bandwidth utilisation. FAA attempts to alleviate this by only sampling the pixels that are intersected by polygon edges.
With Fragment Antialiasing, rather then applying multiple samples to every pixel, regardless of whether or not the pixel requires extra samples, each pixel is tested to establish if a polygon edge occurs through it. Pixels that do not require any extra samples, i.e. those that occur fully within a polygon, are sent straight to the frame buffer with no extra samples and hence no extra bandwidth utilisation. Pixels that do span an edge, and hence require anti-aliasing, have the extra samples applied and are then sent to a separate 'fragment buffer' which stores lists of the pixels and their appropriate subsample information. Once all the pixels, and fragments, for the scene are written the fragment pixels are combined and written to the frame buffer ready for final display.
FAA has the advantage that it requires less fill-rate than other normal Super- or Multi-sample solutions, and potentially less bandwidth (in theory compressed Multisampling is probably likely to achieve similar bandwidth levels). Because of the reduced pixel and bandwidth requirements more samples for the pixels that require extra sampling can be done, in the case of Matrox's FAA they have opted for 16 samples which compares to the current Maximum of 6 samples on ATI's Radeon 9700 PRO's Multi-sample FSAA. However, a drawback of FAA is that there are some cases in which it does not apply extra samples where it probably should.
10-bit GigaColor - Parhelia uses 10-bit per channel throughout the pipeline, including the dual RAMDAC's. With 10-bits per channel colour output this increased the dynamic range of colours available from the 16 million of 24-bit output to over a billion colours. Matrox have enabled GigaColor to operate both for normal Windows desktop use or in 3D rendering as well.
The DVD playback also operates at 10-bit per channel precision.
Dual Head High Fidelity / Triple Head Desktop - Matrox's forte has always been with their multi-monitor outputs and Parhelia is no exception. Parhelia includes dual 400MHz RAMDAC's with dual DVI output's capable of displaying dual 2048 x 1536 at 32bpp analogue outputs or dual 1600 x 1200 at 32bpp digital outputs with hardware overlays, hardware cursors and gamma correction for video.
Parhelia also supports a Triple head stretched desktop mode that can utilise up to 3 display outputs and stretch the Windows desktop across them all. In this mode a maximum resolution of 3840 x 1024 at 32bpp is supported.
Surround Gaming - Extending triple head from the desktop, Matrox also takes it into gaming, hence 'Surround Gaming'. The extra screens give more rendering space, giving three times the field of view. Many games already released can support Surround Gaming and newer titles are being released with support already built in for it -- a list of titles supporting the feature can be found on Matrox's website here.
Package, Software and Setup
Given Matrox's background it would appear that the box art for Parhelia is trying to strike a balance between their professional market and still being interesting towards the gamer.
The contents of the package include the board, instructions, setup CD, a DVI to VGA converter, a DVI to dual VGA converter cable as well as a VGA to S-VHS/composite video. The setup CD contains the drivers, a number of utilities and a demo version of the game Imperial Galactica III: Genesis.
Once installed the drivers take on a slightly different feel than other drivers. Rather than having a dizzying array of tabs under the advanced monitor settings of the Windows display prosperities, Matrox have opted to go with a 'PowerDesk' utility. The PowerDesk utility can be launched from the display properties, from the start bar, or, if you desire, it will remain in the system tray. From there you are presented with control panel-like options for the various display controls.
While there isn't exactly a huge number of options in the 3D settings tab there's probably enough to suit all but the most ardent of tweakers out there; setting FSAA/FAA and Anisotropic filtering on or off is probably enough for most people. One handy element is that in the later Powerdesk versions Matrox have provided the facility for saving game specific settings so that if you find one game that runs well with FSA and/or Anistropic filtering on, but you prefer another game with them off, rather than having to change the global DirectX or OpenGL settings each time you play all you need to do is load up the game and the drivers will recognise the settings you want for it.
With Dual Head and Triple head there is a hugely varied number of multi monitor options available to use, and the PowerDesk setup makes the choice and connection suitable for you quite easy to use. However, if you wish to use TV out then a myriad of cables are required. The diagram below illustrates the cable connections to enable a VGA monitor with a TV-Out option.
With Triple Head display only the central display device can use a DVI connection, the outer display devices have to be driven from the DVI to 2 way VGA cable limiting them to analogue VGA display only.
For the first game benchmark we'll use the DirectX title, Max Payne.
Looking at the two sets of drivers here we can see that Matrox do appear to have made some small gains in performance since the earlier release, with at best a 7% increase in low resolution, with that trailing down to 2% at high resolution where things are a little more fill-rate limited.
From the fill-rate graph we can see that Parhelia is becoming more fill-rate limited by 1024x768 and performance is dropping off at this point. At 1024x768 the performance is well above 60 FPS on average, with it still remaining above 30 FPS at 1600x1200.
Here we can see that enabling 2X Anisotropic filtering is only accounting for at worst a 12% performance decrease, though 2X Anisotropic filtering isn't a huge degree for a high end board these days. Performance still remains well above 60 FPS at 1024x768 and over 30 FPS at 1600x1200.
Enabling FAA accounts for a larger performance impact, with at worst 34% reduction. At 1024x768 the performance has dropped a little shy of 60 FPS and under 30 FPS at 1600x1200.
Using both FAA and Anisotropic filtering together accounts for a further performance degradation, as we'd expect, but in this case it does not appear to be a cumulative effect and the two are complementing each other a little.
The next game we'll use is the DirectX title, Dungeon Siege
Dungeon Siege has usually proven itself to be a CPU limited title in many occasions on modern cards, yet in this case the fill-rate graph clearly shows that it's getting reasonably fill-rate limited on the Parhelia, at least beyond 1024x768. At 1024x768 the performance of the title is well above 60 FPS, on average, but at 1600x1200 it's sunk below 60, but still well above 30 FPS.
As we saw with Max Payne, there have been some modest improvements in the drivers under this DirectX title, albeit fairly small.
Again we can see that adding 2X Anisotropic filtering is taking between 9% and 15% in performance. Quite bizarrely, though, the worst of the performance decreases are coming at low resolutions where it's not fill-rate limited. With the filtering enabled the performance is still above 60 FPS at 1024x768 and well over 30 FPS at 1600x1200.
In this title FAA is accounting for between 17% and 24% of a performance decrease, which is slightly less than in Max Payne, probably because the title isn't as fill-rate limited. Unlike the Anisotropic filtering performance, we can see that FAA is taking a larger performance drop at the higher resolutions, which is what we would normally expect. The performance here is just over 60 FPS on average at 1024x768 and still above 30 FPS at 1600x1200.
As with Max Payne, enabling both FAA and Anisotropic Filtering gives a slighty more balanced performance decrease with the total drop not being cumulative.
UT2003 - Botmatch
Here we'll look at the 'Botmatch' benchmarks scores from the full UT2003 game benchmarks.
Under UT2003 we can see that Matrox have made some relatively significant driver gains from earlier drivers to the current ones, with a best performance increase of 33%.
Here the Parhelia is posting scores more or less in line with other boards that we've used the benchmark on, with a performance just under 60 FPS on average at 1024x768, but still over 30 FPS at 1600x1200.
Interestingly, in this case both 2X Anisotropic Filtering and FAA are posting roughly the same performance impact from normal rendering, with both having about 30% performance degradation at 1600x1200. Combining FAA and the 2X filtering results, as we'd expect, in a higher performance drop with the effects being above cumulative at the low resolutions, but below at higher resolutions.
UT2003 - Flyby
Here we'll look at the Flyby element of the UT2003 benchmark demos on the Parhelia.
As with the Botmatch element of the UT2003 benchmark we can see that the gains in performance from the later driver also carries over to the Flyby demo. In this case, though, the effects are mainly noticed at the lower resolutions as the Flyby demo gets fill-rate bound quite quickly.
Again we can see that the performance hit for both Anisotropic filtering and FAA are quite similar with the Flyby element of the UT2003 benchmark as it is with the Botmatch. Utilising them both at the same time incurs a greater penalty, roughly halving the performance across most of the resolutions.
Return to Castle Wolfenstein
RtCW is an OpenGL title, based on the venerable Quake3 engine.
Unlike the DirectX titles, the performance here appears to have sunk a little since the earlier drivers, with at worst a 16% drop in performance. With the current drivers, at 1024x768 the Parhelia is well above 60 FPS, but has dropped below it by 1600x1200 at just under 50 FPS.
Adding 2X Anisotropic filtering accounts for a worst case performance drop from normal rendering, although the performance is still above 60 FPS at 1024x768 on average, and above 30 FPS at 1600x1200. FAA accounts for a maximum performance penalty of 30% in high resolution, but again it's still above 60 FPS at 1024x768 and 30 FPS at 1600x1200. Adding both shows a further drop and this time it has sunk below 60 FPS at 1024x768 and just a little under 30 FPS at 1600x1200.
Jedi Knight II: Jedi Outcast
JKII is another OpenGL title based on the Quake3 engine.
As with RtCW, the later drivers appear to have a lower performance than the earlier driver; however, on this occasion when the game becomes more fill-rate limited this is turned around into a performance increase.
In this instance the average FPS is well above 60 FPS at 1024x768 and just above 30 FPS at 1600x1200.
Interestingly, in this instance adding 2X Anisotropic filtering hardly reduces the performance at all. It may be that JKII is predominantly single texturing and 2X Trilinear Anisotropic filtering is fitting the 4x4 texel sampling units quite neatly.
FAA is accounting for a performance penalty, albeit a worst case of 25% drop. Adding 2X Anisotropic filtering on top of FAA doesn't amount to any further decrease and with both these the title remains above 60FPS at 1024x768 and over 30 FPS at 1600x1200.
Serious Sam: Second Encounter
Here we'll look at SS:SE under OpenGL.
Again, we can see that the later drivers are performing, on average, worse than the earlier set. Here, with the newer drivers, the performance at 1024x768 is just below 60 FPS whilst it's still above 30 FPS at 1600x1200.
SS:SE uses up to five texture layers, so with Trilinear filtering it's probably already using more samples than the four texture units per pipe can handle in one pass, so doubling the sampling requirements in this situation is bound to decrease the performance. Here we can see that at 1600x1200, the most fill-rate limited situation, a 19% performance drop occurs.
The penalty for enabling FAA in this case is actually less than for 2X Anisotropic filtering, with the higher performance penalty being only 12%. Adding both FAA and 2X filtering at the same time looks to give, more or less, a cumulative performance drop in this instance.
Serious Sam Histogram
Let's take a look at the histogram of the SS:SE Citadel demo running at 1024x768 on the Parhelia.
We can see that the peaks for the two driver sets occur pretty similarly, but outside of the peaks the older driver seems to be rendering mostly in the 40-60 FPS range whereas the newer driver appears to be down in the 40's for much of the time.
Here we'll take a look at the performance of Parhelia under the professional workstation OpenGL benchmark SPECviewperf 7.0
In general terms it would appear that the performance of Parhelia is below that of what we'd expect for many retail boards at the moment, the 'proe-01' and 'ugs-01' tests in particular.
Looking at the comparisons between the two driver sets we can see that half of the tests are up and half are down. Of the two tests that have increased 100% in performance we can see it's because one of the tests has moved from 1 FPS to 2 FPS and the other test didn't run on the initial set of drivers, while it achieved 1 FPS on the later drivers.
Having talked with Matrox it appears that they feel they had made progress with their later OpenGL drivers under the various applications that SPECviewperf 7.0 is testing and these improvments were not reflected in our tests. Upon further investigation it turns out there was a difference in settings between our configuration and Matrox's internal SPECviewperf testing.
For this review we tested SPECviewperf under standard/settings, which includes a default AGP aperture size of 64MB, the results above are correct for those settings. However, if the AGP aperture size is increased to 256MB then a slightly different story is told under SPECviewperf...
With a 256MB AGP aperture size enabled we can see that Matrox have indeed made significant gains under these benchmark, hence, hopefully, throughout all the applications that SPECviewperf tests. Its a bit of a mystery why similar gains are not shown with smaller AGP apertures though.
The workstation market is an important market for Matrox and it does appear that some significant effort has gone into their later drivers to cater for this market. This is evident with the performance increases we see here.
Pixel Shader Performance
We'll use 3DMark2001SE's Pixel Shading tests to take a look at the Pixel Shading performance of Parhelia.
Performance-wise, the 'Pixel Shader' and 'Advanced Shader' tests seem to operate at respectable levels throughout all the resolutions. It's only really the 'Nature' test that sinks down quite low at high res. We can see that the newer drivers appear to have a slight hiccup in the higher resolutions with the 'Pixel Shader' test as they drop below the earlier set in terms of performance.
Texture Filtering Performance
Using SS:SE's Citadel demo again we'll take a look at the rendering performance of the various filter methods available on Parhelia.
As we would expect the performance hit for adding 2X Anisotropic filtering is larger when using Trilinear filtering than it is for Bilinear filtering. SS:SE uses lots of texture layers to begin with, so it's likely that the four bilinear sampling units per pipeline are occupied just applying the textures in the first place even with straight Bilinear filtering. Adding more complicated filtering than bilinear probably results in the texturing being applied over multiple cycles. In Trilinear two texture units will be occupied for a single texture map, so adding 2X filtering will result in all four of the texture units per pipe only sampling from a single texture layer per clock cycle.
Texture Filtering Quality
Here's some sample filtering outputs from SS:SE...
Looking at the screenshots there appears to be few surprises in the output of Parhelia's filtering options available. We can see that Trilinear filtering is blending the mipmaps in a fashion that's representative of a normal Trilinear filtering implementation and we can also see Matrox have implemented curved boundaries. One thing that can be seen on the Bilinear shots is a slight 'dirtiness' between the mipmap boundaries.
Let's take another look at the filtering methods used with "samX"'s Anisotropic Filtering Test application.
From the Bilinear shot we can see the 'noise' of the mipmap boundary is quite pronounced, though oddly it appears to be worse in the lower half of the screen than it is in the upper half. The Trilinear shot shows that Trilinear filtering also has the same characteristics, yet it's a little less obvious because the mipmap boundaries are being merged into one another.
Both the 2X Anisotropic filtering shots show that Anisotropic filtering appears to be calculated uniformly, with no angles appearing to be worse case. Oddly, though, on the Trilinear shot there does appear to be areas around the 45 degree angles where the Anisotropic filtering starts a little late, leaving some slight traces of mipmapping where we wouldn't expect it.
Parhelia's main Antialiasing method is FAA, but is does have a fallback Supersampling mode. We'll take a look at the performance of both, once again using SS:SE's Citadel demo, without Anisotropic filtering.
As we can see the 4X Super-sampling FSAA is accounting for roughly the type of performance hit we would expect, being fairly large. In the worst case the Super-sample AA is taking 69%. FAA, on the other hand, is offering far better performance in this title, although it has seen larger performance drops in other titles used in this review.
Here are some sampling images using FAA/FSAA.
The first image is a 640x480 screen shot doubled in size to highlight the edges more. The second image is a close up, four times its initial size.
Looking at the close-up image of the 4X SSAA shot shows a number of intermediate colour samples that is consistent with an ordered grid 4X implementation.
The screenshot with FAA enabled shows that it clearly offers much better edge AA on the edges it applies to, and the close up shot shows the many intermediate colour samples in the inner square. However in this instance we can clearly see that it's not applying extra samples around the outer square, whereas the SSAA shot is and so would a Multisampling implementation.
FAA is also known to have issues operating with Stencil Buffers in place. Below is a screenshot of the 'Fablemark' demo which makes use of stencil buffers.
As mentioned before, Parhelia's flexible multi-monitor support allows for a feature dubbed 'Surround Gaming'. Surround Gaming is basically allowing a game to operate over three display devices giving more view to your peripheral vision and giving a more immersive feel to the game.
We've run a few benchmarks in Surround Gaming to see what the performance is like and we'll compare them with a few normal resolutions as well. The graphs below are ordered by the number of displayable pixels in that resolution.
As we can see, these Surround Gaming titles appear to be averaging to reasonably playable frame rates, with both titles above 50 FPS at 2400x600.
One thing that you may notice is that under RtCW, despite there being a relatively small number of pixels difference between 1280x1024 and 2400x600, we can see that the Surround Gaming mode incurs a much higher performance penalty than the difference in pixel fill-rate would suggest, as the difference in pixels between 2400x600 and 1600x1200 is much larger and yet the performance difference is smaller. The most likely reason for this is extra geometry load. Under normal resolution rendering the geometry would be clipped at a 4:3 ratio, yet the geometry with Surround Gaming is clipped with a 4:1 (12:3) ratio, resulting in a greater quantity of geometry processing. Because RtCW is predominantly using the host CPU for geometry calculation Surround Gaming is putting an extra burden on the CPU because more geometry needs to be calculated.
Below are some sample outputs from Parhelia's Surround Gaming mode.
Hardware vs Software Geometry Processing
With four DX9 compliant Vertex Shaders we would expect Parhelia to have a good hardware geometry throughput. We'll test the processing using hardware and CPU based geometry processing to see what kind of performance gains can be attained by using Parhelia's geometry processing opposed to the geometry throughputs from the P4 2.53GHz CPU.
The 3DMark2001SE tests are all showing pretty good gains in performance when running hardware geometry processing as opposed to running it via the CPU. Even at high resolutions two of the tests are still showing good percentage increases, the only exception to this being the 'Nature' test. But as explained before this is probably because the test is becoming quite fill-rate/bandwidth limited, thus minimising the improvements that hardware geometry processing would be able to show.
As is always the case, though, 3DMark2001SE's tests are only demos and hence not representative of an actual game, which has many more calculations running on the CPU than just geometry processing. Let's take a look at the performance of a couple of game titles using software and hardware geometry processing
As we've said before, Dungeon Siege is quite a CPU bound title and here we can see that using hardware geometry processing accounts for nearly a 20% improvement over software processing in the lower resolutions. As the resolution goes up, this advantage is scaled back as Parhelia is becoming a little more fill-rate limited in this game.
Max Payne shows even larger performance gains than Dungeon Siege at lower resolutions, but this is almost negated at high resolutions where fill-rate is of paramount importance.
Multi-Texturing vs Multi-Pass
Parhelia offers 4 texture units per pipeline, meaning that up to 4 texture maps can be sampled simultaneously and Serious Sam can handle up to four layer in a single pass, which should be a good fit for Parhelia. SS:SE also provides the ability to alter the number of texture layer applied per pass, so we can see how much benefit the 3D architecture is gaining from the number of texture layers applied in a single pass.
SS:SE's Citadel demo was used, with Trilinear filtering enabled
Why there is such a large performance difference at 640x480 without any multi-texturing is a mystery. In theory we would expect multi-passing at low resolution to perform poorly because the geometry needs to be resent multiple times and hence would put more load on the Vertex Processors.
The gains in the higher resolutions here are quite large, but because of the texture unit arrangement on Parhelia, in a title like Serious Sam, removing the number of texture layers applied per pass has direct impact on its fill-rate and not just bandwidth. With future games a flexible number of texture layers per pass is very important for Parhelia.
We can see that the largest single gain is from going from no multi-texturing to dual texturing. Parhelia's texture units are capable of sampling bilinear samples in a single clock, so for Trilinear sampling two texture units will be used per map; hence at two texture maps per pass with Trilinear filtering all of Parhelia's texture units are being used simultaneously.
We'll use the VillageMark test to see what Parhelia's performance is like under the conditions presented by this benchmark
Parhelia's performance in this test isn't too bad, despite it not being advertised with any bandwidth or overdraw reduction optimisations. VillageMark does use several simultaneous texture layers so the raw texturing performance of Parhelia will be of help, as will the raw bandwidth of the 256-bit memory bus.
Here we'll at "Humus"'s GL_EXT_reme benchmark, which has tests using various difference render orders.
From this we can see that the gains from Random ordering or Front-to-Back ordering are very minimal in comparison to Back-to-Front ordering, which is a good indication that there isn't any early pixel Z tests to retire an occluded pixel before texturing or Pixel Shading/Fragment processing is done on that pixel.
The next generation id title 'DoomIII' specifically operates by rendering the entire scene in a very uncomplicated fashion, which is to first purely lay down Z information for the frame prior to any of the per pixel operations taking place. The results of this is that boards that have early Z checking / rejection schemes won't need to do the per-pixel operations on occluded pixels, which have very intensive calculations, once the initial Z calculation pass is complete. Because the Z information for the entire frame will already be present, there will be a 100% certainly of whether the current pixels being rendered will actually be displayed. However, a board without early Z testing will not know whether or not a pixel is going to be occluded until it's ready to be sent to the frame buffer, so all of the intensive per pixel operations will be carried out on every pixel regardless of whether or not it will actually be displayed in the final frame.
Stencil Test (FableMark)
Here we'll see how Parhelia performs under the PowerVR FableMark test, which makes extensive use of Stencil Buffers
Although the performance seems to be quite low at high resolution this is reasonably consistent with other DX8 class hardware. However, it does not appear that Parhelia features anything like single pass stencil operations.
As we can see, while the general gaming performance of Parhelia is respectable it's simply not up there with the likes of the high end Radeon 9700 PRO. Yet it does have a reasonable performance overall.
It would seem that the development of Parhelia's drivers is a slightly hit and miss affair. They appear to have made some gains with their DirectX drivers, but the later OpenGL drivers appear to have degraded performance in many cases.
Parhelia does bring a number of interesting concepts to the table. It's unquestionable that FAA's performance is exceptionally good given, and that it has sixteen sample points for edges. Even the compressed Multi-sampling scheme's on newer boards do not show as small a performance hit in many situations. However, FAA is not without its problems. While you may not be able to notice some edges that are not AA'ed when playing games, if the artifacts displayed in Fablemark are indicative of normal stencil operations with FAA enabled then it would seem to be an untenable combination -- you would notice the lack of AA. While there are not many titles at the moment that use Stencil buffers there is one very important engine coming up that makes extensive use of Stencils, being, of course, Doom III.
Surround Gaming is another interesting case. Unquestionably, it brings a greater level of involvement to the game, but do you have the desktop real-estate for 3 CRT's or are you able to afford a number of TFT's?
Naturally, where Matrox scores big is their usual haunt, which is of their display output. The 2D quality is very good on Parhelia, among the best I've personally seen. Also, if multi-monitor support is something that's important for you then Matrox has both the most flexibility in the display output options and exceptionally good multi-monitor configurations and software tools.
Matrox say they are refocusing Parhelia on the workstation market, which will probably make good use of the multi-monitor options; however, does this rule it out for the consumer market? If you are a high end gamer then it's unquestionable that you'd be looking at either the Radeon 9700 PRO or the upcoming GeForce FX for pure performance and future features. But if you are looking for reasonably good performance in current games, with all the features necessary to run them, Parhelia is certainly worth considering if you can afford it. If 2D and/or multi-monitor support are also a high priority then Parhelia requires some attention.