4:
Cache Technology One
common question asked at this point is, "Why not make
all of the computer's memory run at the same speed as
the L1 cache, so no caching would be required?" That
would work, but it would be incredibly expensive. The
idea behind caching is to use a small amount of
expensive memory to speed up a large amount of slower,
less-expensive memory.
In designing a computer, the goal is to allow the
microprocessor to run at its full speed as inexpensively
as possible. A 500-MHz chip goes through 500 million
cycles in one second (one cycle every two nanoseconds).
Without L1 and L2 caches, an access to the main memory
takes 60 nanoseconds, or about 30 wasted cycles
accessing memory.
When you think about it, it is kind of incredible
that such relatively tiny amounts of memory can maximize
the use of much larger amounts of memory. Think about a
256-kilobyte L2 cache that caches 64 megabytes of RAM.
In this case, 256,000 bytes efficiently caches
64,000,000 bytes. Why does that work?
In computer science, we have a theoretical concept
called locality of reference. It means that in a
fairly large program, only small portions are ever used
at any one time. As strange as it may seem, locality of
reference works for the huge majority of programs. Even
if the executable is 10 megabytes in size, only a
handful of bytes from that program are in use at any one
time, and their rate of repetition is very high. Let's
take a look at the following pseudo-code to see why
locality of reference works: Output to screen � Enter a number between 1 and 100 �
Read input from user
Put value from user in variable X
Put value 100 in variable Y
Put value 1 in variable Z
Loop Y number of time
Divide Z by X
If the remainder of the division = 0
then output � Z is a multiple of X �
Add 1 to Z
Return to loop
End
This small program asks the user to enter a number
between 1 and 100. It reads the value entered by the
user. Then, the program divides every number between 1
and 100 by the number entered by the user. It checks if
the remainder is 0 (modulo division). If so, the program
outputs "Z is a multiple of X" (for example, 12 is a
multiple of 6), for every number between 1 and 100. Then
the program ends.
Even if you don't know much about computer
programming, it is easy to understand that in the 11
lines of this program, the loop part (lines 7 to
9) are executed 100 times. All of the other lines are
executed only once. Lines 7 to 9 will run significantly
faster because of caching.
This program is very small and can easily fit
entirely in the smallest of L1 caches, but let's say
this program is huge. The result remains the same. When
you program, a lot of action takes place inside loops. A
word processor spends 95 percent of the time waiting for
your input and displaying it on the screen. This part of
the word-processor program is in the cache.
This 95%-to-5% ratio (approximately) is what we call
the locality of reference, and it's why a cache works so
efficiently. This is also why such a small cache can
efficiently cache such a large memory system. You can
see why it's not worth it to construct a computer with
the fastest memory everywhere. We can deliver 95 percent
of this effectiveness for a fraction of the cost. |