Cache Transfer Technologies and Timing



One of the most important factors directly influencing the 
performance of the level 2 cache is the technology used to 
transfer information to and from the processor. There are three 
main types of cache technology currently in use in motherboards; 
the capabilities of the chipset (in particular, the cache 
controller) dictate which your system will use.

"Timing" refers to the number of clock cycles required to perform 
the data transfers to and from the cache or processor, and this 
is a function of the technology used (among other things). Timing 
is a complex matter involving various characteristics of the 
processor, cache, memory, chipset, etc. Iin general, however, the 
fewer clock cycles it takes to transfer data, the faster the 
system. System timing is described in detail here, 
in the memory chapter.


Cache Bursting

In a typical level 2 cache each cache line contains 32 bytes, 
and transfers to and from the cache occur 32 bytes (256 bits) at 
a time. The normal transfer paths (for a fifth- or 
sixth-generation machine) are only 64 bits wide, which means four 
transfers are done in sequence. Because the transfers are from 
consecutive memory locations there is no need to specify a 
different address after the first one; this makes the second, 
third and fourth accesses extremely fast.

This high-performance access is called "bursting" or using the 
cache in "burst mode". All modern level 2 caches use this type of 
access. The timing, in clock cycles, to perform this quadruple 
read is normally stated as "x-y-y-y". For example, with 3-1-1-1" 
timing the first read takes 3 clock cycles and the next three 
take 1 each, for a total of 6. Obviously, the lower these 
numbers, the better. 

 Note: This is almost identical to the way burst transfers are 
done to and from memory in modern systems, except faster.


Asynchronous Cache

The oldest and slowest type of cache timing is asynchronous cache. 
Asynchronous means that transfers are not tied to the system 
clock. A request is sent to the cache, and the cache responds, and 
this happens independently of what the system clock (on the memory 
bus) is doing. This is similar to how most system memory works; 
your typical FPM or EDO memory is also asynchronous (and 
relatively slow, for this reason.)

Because asynchronous cache is not tied to the system clock, it 
can have problems dealing with faster clock speeds. At slow speeds 
like 33 MHz it is capable of 2-1-1-1 timing (which is very good) 
but at speeds like 60 or 66 MHz as used in modern Pentium class 
PCs it drops down to 3-2-2-2 (which is pretty bad.) For this 
reason, asynchronous cache is commonly found on 486 class 
motherboards but is not generally used on Pentium or later 
class machines.


Synchronous Burst Cache

Unlike asynchronous cache, which operates independently of the 
system clock, synchronous cache is tied to the memory bus clock. 
Each tick of the system clock, a transfer can be done to or from 
the cache (if it is ready). This means that it is capable of 
handling faster system speeds without slowing down the way 
asynchronous cache does. However, the faster the system runs, the 
faster the SRAM chips have to be, in order to keep up. Otherwise 
timing problems (crashes, lockups) occur.

Even this type of cache slows down at very high speeds. It is 
capable of 2-1-1-1 operation up to 66 MHz, but then it slows down 
to 3-2-2-2 at higher speeds (which are starting to become more 
popular and will become even moreso in the future). Synchronous 
burst cache never quite caught on; pipelined burst cache was 
developed at around the same time and seemed to take the market 
away from sync burst before the latter could really get going.


Pipelined Burst (PLB) Cache

Pipelining is a technology commonly used in processors to 
increase performance; in the pipelined burst (PLB) cache it is 
used in a similar way. PLB cache adds special circuitry that 
allows the four data transfers that occur in a "burst" to be done 
partially at the same time. In essence, the second transfer begins 
before the first transfer is done, just the way you can start 
pouring a second gallon of fluid down a pipeline before the first 
gallon has finished exiting the other side.

Because of the complexity of the circuitry, a bit more time is 
required initially to set up the "pipeline". For this reason, 
pipelined burst cache is slightly slower than synchronous burst 
cache for the initial read, requiring 3 clock cycles instead of 2 
for sync burst. However, this parallelism allows PLB cache to 
burst at a single clock cycle for the remaining 3 transfers even 
up to very high clock speeds; this means 3-1-1-1 speed up to even 
100 MHz bus speeds. PLB cache is now the standard for almost all 
quality Pentium class motherboards.


Comparison of Transfer Technology Performance

The table below shows a summary of the theoretical maximum system 
performance of the various cache technologies at different system 
bus speeds. It is theoretical because it is only possible with a 
chipset that supports it, fast enough cache memory and other 
factors. Note how, interestingly, synchronous burst is the best 
at the 60 and 66 MHz bus speeds common on so many Pentium 
machines today. Despite this it is not nearly as common as 
pipelined burst cache. Fortunately, PLB cache is only slightly 
slower, and holds more potential for use at the higher system 
speeds that should take the market by storm in 1998:

Bus Speed (MHz)	    33        50        60        66
Asynchronous	    2-1-1-1   3-2-2-2   3-2-2-2   3-2-2-2
Synchronous Burst   2-1-1-1   2-1-1-1   2-1-1-1   2-1-1-1
Pipelined Burst	    3-1-1-1   3-1-1-1   3-1-1-1   3-1-1-1

Bus Speed (MHz)     75        83        100
Asynchronous        3-2-2-2   3-2-2-2   3-2-2-2
Synchronous Burst   3-2-2-2   3-2-2-2   3-2-2-2
Pipelined Burst     3-1-1-1   3-1-1-1   3-1-1-1
