|
Troubleshooting, Maintaining & Repairing PCs Stephen Bigelow $54.95 0-07-913732-6 |
|
| Chapter: 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28 | 29 | 30 | 31 | 32 | 33 | 34 | 35 | 36 | 37 | 38 | 39 | 40 | 41 | 42 | 43 | 44 | 45 | 46 | 47 | 48 | 49 | 50 | 51 | 52 | 53 |
| Reserve your copy at a Beta Bookstore near you! |
Contact Bet@books © 1998 The McGraw-Hill Companies, Inc. All rights reserved. Any use of this Beta Book is subject to the rules stated in the Terms of Use. |
CHAPTER 29
Memory troubleshooting
Memory is a cornerstone of the modern PC. It is memory that holds the program code and data that is processed by the CPU - and it is this intimate relationship between memory and the CPU that forms the basis of computer performance. With larger and faster CPUs constantly being introduced, and more complex software is developed to take advantage of the processing power. In turn, the more complex software demands larger amounts of faster memory. With the explosive growth of Windows (and more recently Windows 95) the demands made on memory performance are more acute than ever. These demands have resulted in a proliferation of memory types that go far beyond the simple, traditional DRAM. Cache (SRAM), fast page-mode (FPM) memory, extended data output (EDO) memory, video memory (VRAM), synchronous DRAM (SDRAM), flash BIOS, and other exotic memory types (such as RAMBUS) now compete for the attention of PC technicians. These new forms of memory also present some new problems. This chapter will provide you an understanding of memory types, configurations, installation concerns, and troubleshooting options.
Essential memory concepts
The first step in any discussion of memory is to understand basically how memory works. If you already have a good grasp of memory basics, feel free to skip this part of the chapter.
Memory organization
All memory is basically an array organized as rows and columns as shown in Fig. 29-1. Each row is known as an address - there may be 1 million or more addresses on a single memory IC. The columns represent data bits - a typical high-density memory IC has 1 bit, but may have 2 or 4 bits depending on the overall amount of memory required.
As you probably see in Fig. 29-1, the intersection of each column and row is an individual memory bit (known as a cell). This is important because the number of components in a cell - and the way those components are fabricated onto the memory IC - will have a profound impact on memory performance. For example, a classic DRAM cell is a single MOS transistor, while static RAM (or SRAM) cells often pack several transistors and other components onto the IC die. Although you do not have to be an expert on IC design, you should realize that the internal fabrication of a memory IC has more to do with its performance than just the way it is soldered into your computer.
Memory signals
A memory IC communicates with the "outside world" through three sets of signals; address lines, data lines, and control lines. Figure 29-2 illustrates these signal types. Address lines define which row of the memory array will be active. In actual practice, the address is specified as a binary number, and conversion circuitry inside the memory IC translates the binary number into a specific row signal. Data lines pass binary values back and forth to the defined address. Control lines are used to operate the memory IC. A Read/-Write (R/-W) line defines whether data is being read from the specified address, or written to it. A -Chip Select (-CS) signal makes a memory IC active or inactive (this ability to "disconnect" from a circuit is what allows a myriad of memory ICs to all share common address and data signals in the computer). Some memory types require additional signals such as row address-select (RAS) and column address-select (CAS) for refresh operations. More exotic memory types may require additional control signals.
Memory package styles and structures
Ultimately, the memory die is mounted in a package just like any other IC. The completed memory packages can then be soldered to your motherboard, or attached to plug-in structures such as SIMMs, DIMMs, and memory cards. There are really only four package styles normally used for memory devices:
Add-on memory devices
Memory has always pushed the envelope of IC design. This trend has given us tremendous amounts of memory in very small packages, but it also has kept memory relatively expensive. Manufacturers responded by providing a minimum amount of memory with the system, then selling more memory as an add-on option - this keeps the cost of a basic machine down, and increases profit through add-on sales. As a technician, you should understand the three basic types of add-on memory.
Proprietary add-on modules - Once the Intel i286 opened the door for more than 1MB of memory, PC makers scrambled to fill the void. However, the rush to more memory resulted in a proliferation of non-standard (and incompatible) memory modules. Each new motherboard came with a new add-on memory scheme - this invariably led to a great deal of confusion among PC users and makers alike. You will likely find proprietary memory modules in 286 and early 386 systems.
SIMMs and DIMMs - By the time 386 systems took hold in the PC industry, proprietary memory modules had been largely abandoned in favor of the "Memory Module" (Fig. 29-3). A SIMM (Single In-line Memory Module) is light, small, and contains a relatively large block of memory, but perhaps the greatest advantage of a SIMM is standardization - Using a standard pin layout, a SIMM from one PC can be installed in any other PC. The 30-pin SIMM (Table 29-1) provides 8 data bits, and generally holds up to 4MB of RAM. The 30-pin SIMM proved its worth in 386 and early 486 systems, but fell short when providing more memory to later-model PCs. The 72-pin SIMM (Table 29-2) supplanted the 30-pin version by providing 32 data bits, and may hold up to 32MB (or more). Table 29-3 outlines a variation of the standard 72-pin SIMM highlighting the use of Error Correction Code (ECC) instead of parity.
You may also find such structures referred to as DIMMs (or Dual In-Line Memory Modules). DIMMs appear virtually identical to SIMMs, but they are larger. And where each electrical contact on the SIMM is tied together between the front and back, the DIMM keeps front and back contacts separate - effectively doubling the number of contacts available on the device. For example, if you look at a 72-pin SIMM, you will see 72 electrical contacts on both sides of the device (144 contacts total) - but these are tied together, so there are only 72 signals (even with 144 contacts). On the other hand, a DIMM keeps the front and back contacts electrically separate (and usually adds some additional pins to keep SIMMs and DIMMs from accidentally being mixed). Table 29-4 outlines a 144-pin DIMM. Today, virtually all DIMM versions provide 168 pins (84 pins on each side). DIMMs are appearing in high-end 64-bit data bus PCs (such as Pentiums and PowerPC RISC workstations). As PCs move from 64 to 128 bits over the next few years, DIMMs will likely replace SIMMs as the preferred memory expansion device. Table 29-5 lists the pinout for an unbuffered DRAM DIMM, while Table 29-6 presents the pinout for an unbuffered SDRAM DIMM.
Finally, you may see SIMMs and DIMMs referred to as "composite" or "non-composite" modules. These terms are used infrequently to describe the technology level of the memory module. For example, a composite module uses older, lower-density memory, so more ICs are required to achieve the required storage capacity. Conversely, a non-composite module uses newer memory technology, so fewer ICs are needed to reach the same storage capacity. In other words, if you encounter a high-density SIMM with only a few ICs on it, chances are that the SIMM is non-composite.
Megabytes and memory layout
Now is a good time to explain the idea of "bytes" and "megabytes". Very simply, a byte is 8 bits (binary 1s and 0s), and a megabyte is one million of those bytes (1,048,576 bytes to be exact - but manufacturers often round down to the nearest million or so). The idea of megabytes (MB) is important when measuring memory in your PC. For example, if a SIMM is laid out as 1M by 8 bits, it has 1MB. If the SIMM is laid out as 4M by 8 bits, it has 4MB. Unfortunately, memory has not been laid out as 8 bits since the IBM XT.
More practical memory layouts involve 32-bit memory (for 486 and OverDrive processors), or 64-bit memory (for Pentium processors). When memory is "wider" than one byte, it is still measured in MB. For example, a 1M x 32-bit (4 bytes) SIMM would be 4MB (that is, the capacity of the device is 4MB), while a 4M x 32-bit SIMM would be 16MB. So when you go shopping for an 8MB 72-pin SIMM, chances are you’re getting a 2M x 32-bit memory module. Table 29-7 provides you with an index to help identify common 72-pin SIMMs. You can see the relationship between memory layout and overall capacity.
Memory organization
The memory in your computer represents the result of evolution over several computer generations. Memory operation and handling is taken care of by your system's microprocessor, so as CPUs improved, memory handling capabilities have improved as well. Today's microprocessors such as the Intel Pentium or Pentium Pro are capable of addressing more than 4GB of system memory - well beyond the levels of contemporary software applications. Unfortunately, the early PCs were not nearly so powerful. Older PCs could only address 1MB of memory due to limitations of the 8088 microprocessor.
Since backward compatibility is so important to computer users, the drawbacks and limitations of older systems had to be carried forward into newer computers instead of being eliminated. Newer systems overcome their inherent limitations by adding different "types" of memory, along with the hardware and software to access the memory. This part of the chapter describes the typical classifications of computer memory; conventional, extended, and expanded memory. This chapter also describes high memory concepts. Note that these memory types have nothing to do with the actual ICs in your system, but the way in which software uses the memory.
Conventional memory
Conventional memory is the traditional 640KB assigned to the DOS Memory Area (10000h to 9FFFFh as shown in Fig. 29-4). The original PCs used microprocessors that could only address 1MB of memory (called real-mode memory or base memory). Out of that 1MB, portions of the memory must be set aside for basic system functions. BIOS code, video memory, interrupt vectors, and BIOS data are only some of the areas that require reserved memory. The remaining 640KB became available to load and run your application which can be any combination of executable code and data. The original PC only provided 512KB for the DOS program area, but computer designers quickly learned that another 128KB could be added to the DOS area while still retaining enough memory for overhead functions, so 512KB became 640KB.
Every IBM-compatible PC still provides a 640KB "base memory" range, and most DOS application programs continue to fit within that limit to ensure backward compatibility to older systems. However, the drawbacks to the 8088 CPU were soon apparent. More memory had to added to the computer for its evolution to continue. Yet, memory had to be added in a way that did not interfere with the conventional memory area. Table 29-8 illustrates a comprehensive memory map for a typical PC.
Extended memory
The 80286 introduced in IBM's PC/AT was envisioned to overcome the 640KB barrier by incorporating a protected-mode of addressing. The 80286 can address up to 16MB of memory in protected-mode, while its successors (the 80386 and later) can handle 4GB of protected-mode memory. Today, virtually all computer systems provide several MB of extended memory. Besides an advanced microprocessor, another key element for extended memory is software. Memory management software must be loaded in advance for the computer to access its extended memory. Microsoft's DOS 5.0 provides an extended memory manager utility(HIMEM.SYS), but there are other off-the-shelf utilities as well.
Unfortunately, DOS itself cannot make use of extended memory. You may fill the extended memory with data, but the executable code comprising the program remains limited to the original 640KB of base memory. Some programs written with DOS extenders can overcome the 640KB limit, but the additional code needed for the extenders can make such programs a bit clunky. A DOS extender is basically a software module containing its own memory management code which is compiled into the final application program.
The DOS extender loads a program in real-mode memory. After the program is loaded, it switches program control to the protected-mode memory. When the program in protected-mode needs to execute a DOS (real-mode) function, the DOS extender converts protected-mode addresses into real-mode addresses, copies any necessary program data from protected to real-mode locations, switches the CPU to real-mode addressing, and carries out the function. The DOS extender then copies any results (if necessary) back to protected-mode addresses, switches the system to protected-mode once again, and the program continues to run. This back-and-forth conversion overhead results in less than optimum performance compared to strictly real-mode programs, or true "protected-mode" programs.
With multiple megabytes of extended memory typically available, it is possible (but unlikely) that any one program will utilize all of the extended memory. Multiple programs that use extended memory must NOT attempt to utilize the same memory locations. If conflicts occur, a catastrophic system crash is almost inevitable. To prevent conflicts in extended memory, memory manager software can make use of three major industry standards; the Extended Memory Specification (XMS), the Virtual Control Program Interface (VCPI), or the DOS Protected-Mode Interface (DPMI). This chapter will not detail these standards, but you should know where they are used.
Expanded memory
Expanded memory is another popular technique used to overcome the traditional 640KB limit of real-mode addressing. Expanded memory uses the same "physical" RAM chips, but differs from extended memory in the way that physical memory is used. Instead of trying to address physical memory locations outside of the conventional memory range as extended memory does, expanded memory blocks are switched into the base memory range where the CPU can access it in real-mode. The original expanded memory specification (called the Lotus-Intel-Microsoft; LIM, or EMS specification) used 16KB banks of memory which were mapped into a 64KB range of real-mode memory existing just above the video memory range. Thus, four "blocks" of expanded memory could be dealt with simultaneously in the real-mode.
Early implementations of expanded memory utilized special expansion boards that switched blocks of memory, but later CPUs that support memory mapping allowed expanded memory managers (EMMs or LIMs) to supply software-only solutions for i386, i486, and Pentium-based machines. EMS/LIM 4.0 is the latest version of the expanded memory standard which handles up to 32MB of memory. An expanded memory manager (such as the DOS utility EMM386.EXE) allows the extended memory sitting in your computer to emulate expanded memory. For most practical purposes, expanded memory is more useful than extended memory because its ability to map directly to the real-mode allows support for program multi-tasking. To use expanded memory, programs must be written specifically to take advantage of the function calls and subroutines needed to switch memory blocks. Functions are completely specified in the LIM/EMS 4.0 standard.
Upper memory area (UMA)
The upper 384KB of real-mode memory is not available to DOS because it is dedicated to handling memory requirements of the physical computer system. This is called the High DOS Memory Range or Upper Memory Area (UMA). However, even the most advanced PCs do not use the entire 384KB, so there is often a substantial amount of unused memory existing in your system's real-mode range. Late model CPUs like the i386 and i486 can remap extended memory into the range unused by your system. Since this "found" memory space is not contiguous with your 640KB DOS space, DOS application programs cannot use the space, but small independent drivers and TSRs can be loaded and run from this UMA. The advantage to using high DOS memory is that more of the 640KB DOS range remains available for your application program. Memory management programs (such as the utilities found with DOS 5.0 and higher) are needed to locate and remap these memory "blocks".
High memory
There is a peculiar anomaly that occurs with CPUs supporting extended memory - they can access one segment (about 64KB) of extended memory beyond the real-mode area. This capability arises because of the address line layout on late model CPUs. As a result, the real-mode operation can access roughly 64KB above the 1MB limit. Like high DOS memory, this "found" 64KB is not contiguous with the normal 640KB DOS memory range, so DOS cannot use this high memory to load a DOS application, but device drivers and TSRs can be placed in high memory. DOS 5.0 is intentionally designed so that its 40-50KB of code can be easily moved into this high memory area. With DOS loaded into high memory, an extra 40-50KB or so will be available within the 640KB DOS range.
Memory considerations
Memory has become far more important than just a place to store bits for the microprocessor. It has proliferated and specialized to the point where it is difficult to keep track of all the memory options and architectures that are available. This part of the chapter reviews established memory types, and explains some of the current memory architectures.
Memory speed and wait states
The PC industry is constantly struggling with the balance between price and performance. Higher prices usually bring higher performance, but low cost makes the PC appealing to more people. In terms of memory, cost-cutting typically involves using cheaper (slower) memory devices. Unfortunately, slow memory cannot deliver data to the CPU quickly enough, so the CPU must be made to wait until memory can catch up. All memory is rated in terms of speed - specifically access time. Access time is the delay between the time data in memory is successfully addressed, to the point at which the data has been successfully delivered to the data bus. For PC memory, access time is measured in nanoseconds (ns), and current memory offers access times of 50-60ns. 70ns memory is extremely common.
The question often arises; "can I use faster memory then the manufacturer recommends?" The answer to this question is almost always "Yes", but there rarely ever a performance benefit. As you will see in the following sections, memory and architectures are typically tailored for specific performance. Using memory that is faster should not hurt the memory, or impair system performance, but it costs more, and will not produce a noticeable performance improvement. The only time such a tactic would be advised is when your current system is almost obsolete, and you would want the new memory to be useable on a new, faster motherboard if you choose to upgrade the motherboard later on.
A wait state orders the CPU to pause for one clock cycle in order to give memory additional time to operate. Typical PCs use one wait state, though very old systems may require two or three. The latest PC designs with high-end memory or aggressive caching may be able to operate with no (zero) wait states. As you might imagine, a wait state is basically a waste of time, so more wait states result in lower system performance. Zero wait states allow optimum system performance. Table 29-9 illustrates the general relationship between CPUs, wait states, and memory speed. It is interesting to note that some of the fastest systems allow the most wait states. This flexibility lets the system support old, slow memory, but the resulting system performance would be so poor that there would be little point in using the system in the first place.
There are three classical means of selecting wait states. First, the number of wait states may be fixed (common in old XT systems). Wait states may be selected with one or more jumpers on the motherboard (typical of i286 and early i386 systems). Current systems such as i486 and Pentium computers place the wait state control in the CMOS setup routine. You may have to look in an "advanced settings" area to find the entry. When optimizing a computer, you should be sure to set the minimum number of wait states.
NOTE: Setting too few wait states can cause the PC to behave erratically.
Determining memory speed
It is often necessary to check SIMMs or DIMMs for proper memory speed during troubleshooting, or when selecting replacement parts. Unfortunately, it can be very difficult to determine memory speed accurately based on part markings. Speeds are normally marked cryptically by adding a number to the end of the part number. For example, a part number ending in -6 often means 60ns, a -7 is usually 70ns, and a -8 can be 80ns. Still, the only means of being absolutely certain of the memory speed is to cross reference the memory part number with a manufacturer’s catalog, and read the speed from the catalog’s description (i.e. 4Mx32 50ns EDO).
Presence detect (PD)
Another feature of modern memory devices is a series of signals known as the "Presence Detect" lines (you’ll see these as "PDx" signals in 72-pin pinouts such as Table 29-2 and 29-3). By setting the appropriate conditions of the PD signals, it is possible for a computer to immediately recognize the characteristics of the installed memory devices, and configure itself accordingly. Presence detect lines typically specify three operating characteristics of memory; size (device layout) and speed. Table 29-10 highlights many of the most commonly used signal combinations.
Understanding memory "refresh"
The electrical signals placed in each DRAM storage cell must be replenished (or refreshed) periodically every few milliseconds. Without refresh, DRAM data will be lost. In principle, refresh requires that each storage cell be read and re-written to the memory array. This is typically accomplished by reading and re-writing an entire row of the array at one time. Each row of bits is sequentially read into a sense/refresh amplifier (part of the DRAM IC) which basically recharges the appropriate storage capacitors, then re-writes each row bit to the array. In actual operation, a row of bits is automatically refreshed whenever an array row is selected. Thus, the entire memory array can be refreshed by reading each row in the array every few milliseconds.
The key to refresh is in the way DRAM is addressed. Unlike other memory ICs that supply all address signals to the IC simultaneously, a DRAM is addressed in a two-step sequence. The overall address is separated into a row (low) address and a column (high) address. Row address bits are placed on the DRAM address bus first, and the -Row Address Select (-RAS) line is pulsed logic 0 to multiplex the bits into the IC's address decoding circuitry. The low portion of the address activates an entire array row and causes each bit in the row to be sensed and refreshed. Logic 0s remain logic 0s, and logic 1s are recharged to their full value.
Column address bits are then placed on the DRAM address bus, and the -Column Address Select (-CAS) is pulsed to logic 0. The column portion of the address selects the appropriate bits within the chosen row. If a read operation is taking place, the selected bits pass through the data buffer to the data bus. During a write operation, the read/write line must be logic 0, and valid data must be available to the IC before -CAS is strobed. New data bits are then placed in their corresponding locations in the memory array.
Even if the IC is not being accessed for reading or writing, the memory must still be refreshed to ensure data integrity. Fortunately, refresh can be accomplished by interrupting the microprocessor to run a refresh routine which simply steps through every row address in sequence (column addresses need not be selected for simple refresh). This row-only (or -RAS only) refresh technique speeds the refresh process. Although refreshing DRAM every few milliseconds may seem like a constant aggravation, the computer can execute quite a few instructions before being interrupted for refresh. Refresh operations are generally handled by the chipset on your motherboard. Often, memory problems (especially "parity errors") which cannot be resolved by replacing a SIMM can be traced to a refresh fault on the motherboard.
Memory types
In order for a computer to work, the CPU must take program instructions and exchange data directly with memory. As a consequence, memory must keep pace with the CPU (or make the CPU wait for it to catch up). Now that processors are so incredibly fast (and getting faster), traditional memory architectures are being replaced by specialized memory devices that have been tailored to serve specific functions in the PC. As you upgrade and repair various systems, you will undoubtedly encounter some of the memory designations explained below:
DRAM (Dynamic Random Access Memory) - This remains the most recognized and common form of computer memory. DRAM achieves a good mix of speed and density, while being relatively simple and inexpensive to produce - only a single transistor and capacitor is needed to hold a bit. Unfortunately, DRAM contents must be refreshed every few milliseconds, or the contents of each bit location will decay. DRAM performance is also limited because of relatively long access times. Today, many video boards are using DRAM SIMMs to supply video memory.
SRAM (Static Random Access Memory) - The SRAM is also a "classical" memory design - it is even older than DRAM. SRAM does not require regular refresh operations, and can be made to operate at access speeds that are much faster than DRAM. However, SRAM uses six transistors or more to hold a single bit. This reduces the density of SRAM, and increases power demands (which is why SRAM was never adopted for general PC use in the first place). Still, the high speed of SRAM has earned it a place as the PC’s L2 (or external) cache. You’ll probably encounter three types of SRAM cache schemes; Asynchronous, Synchronous Burst, and Pipeline Burst.
VRAM (Video Random Access Memory) - DRAM has been the traditional choice for video memory, but the ever-increasing demand for fast video information (i.e. high-resolution SVGA displays) requires a more efficient means of transferring data to and from video memory. Originally developed by Samsung Electronics, video RAM achieves speed improvements by using a "dual data bus" scheme. Ordinary RAM uses a single data bus - data enters or leaves the RAM through a single set of signals. Video RAM provides an "input" data bus and an "output" data bus. This allows data to be read from video RAM at the same time new information is being written to it. You should realize that the advantages of VRAM will only be realized on high-end video systems such as 1024x768x256 (or higher) where you can get up to 40% more performance than a DRAM video adapter. Below that, you will see no perceivable improvement with a VRAM video adapter.
FPM DRAM (Fast-Page Mode DRAM) - This is a popular twist on conventional DRAM. Typical DRAM access is accomplished in a fashion similar to reading from a book - a memory "page" is accessed first, and then the contents of that "page" can be located. The problem is that every access requires the DRAM to re-locate the "page". Fast-page mode operation overcomes this delay by allowing the CPU to access multiple pieces of data on the same "page" without having to "re-locate" the "page" every time - as long as the subsequent read or write cycle is on the previously located "page", the FPDRAM can access the specific location on that "page" directly.
EDRAM (Enhanced DRAM) - This is another, lesser-known variation of the classic DRAM developed by Ramtron International and United Memories. First demonstrated in August 1994, the EDRAM eliminates an external cache by placing a small amount of static RAM (cache) into each EDRAM device itself. In essence, the cache is distributed within the system RAM, and as more memory is added to the PC, more cache is effectively added as well. The internal construction of an EDRAM allows it to act like page-mode memory - if a subsequent read requests data that is in the EDRAM’s cache (known as a hit), the data is made available in about 15ns - roughly equal to the speed of a fair external cache. If the subsequent read requests data that is not in the cache (called a miss), the data is accessed from the DRAM portion of memory in about 35ns which is still much faster than ordinary DRAM.
EDO RAM (Extended Data Out RAM) - EDO RAM is a relatively well-established variation to DRAM which extends the time which output data is valid - thus the word’s presence on the data bus is "extended". This is accomplished by modifying the DRAM’s output buffer, which prolongs the time where read data is valid. The data will remain valid until a motherboard signal is received to release it. This eases timing constraints on the memory and allows a 15-30% improvement in memory performance with little real increase in cost. Because a new external signal is needed to operate EDO RAM, the motherboard must use a chipset designed to accommodate EDO. Intel’s Triton chipset was one of the first to support EDO, though now most chipsets (and most current motherboards) currently support EDO. You should realize that EDO RAM can be used in non-EDO motherboards, but there will be no performance improvement.
BEDO (Burst Extended Data Output RAM) - This powerful variation of EDO RAM reads data in a burst, which means that after a valid address has been provided, the next three data addresses can be read in only one clock cycle each. The CPU can read BEDO data in a 5-1-1-1 pattern (5 clock cycles for the first address, then one clock cycle for the next three addresses. While BEDO offers an advantage over EDO, it is only supported currently by the VIA chipsets; 580VP, 590VP, 680VP. Also, BEDO seems to have difficulty supporting motherboards over 66MHz.
SDRAM (Synchronous or Synchronized DRAM) - Typical memory can only transfer data during certain portions of a clock cycle. The SDRAM modifies memory operation so that outputs can be valid at any point in the clock cycle. By itself, this is not really significant, but SDRAM also provides a "pipeline burst" mode which allows a second access to begin before the current access is complete. This "continuous" memory access offers effective access speeds as fast as 10ns, and can transfer data at up to 100MB/s. SDRAM is becoming quite popular on current motherboard designs, and is supported by the Intel VX chipset, and VIA 580VP, 590VP, and 680VP chipsets. Like BEDO, SDRAM can transfer data in a 5-1-1-1 pattern, but it can support motherboard speeds up to 100MHz which is ideal for the 75MHz and 82MHz motherboards now becoming so vital for Pentium II systems. Check out the references below for more information on SDRAM:
CDRAM (Cached DRAM) - Like EDRAM, the CDRAM from Mitsubishi incorporates cache and DRAM on the same IC. This eliminates the need for an external (or L2) cache, and has the extra benefit of adding cache whenever RAM is added to the system. The difference is that CDRAM uses a "set-associative" cache approach which can be 15-20% more efficient than the EDRAM cache scheme. On the other hand, EDRAM appears to offer better overall performance.
RDRAM (Rambus DRAM) - Most of the memory alternatives so far have been variations of the same basic architecture. Rambus, Inc. (joint developers of EDRAM) has created a new memory architecture called the Rambus Channel. A CPU or specialized IC is used as the "master" device, and the RDRAMs are used as "slave" devices. Data is then sent back and forth across the Rambus channel in 256 byte blocks. With a dual 250MHz clock, the Rambus Channel can transfer data based on the timing of both clocks - this results in data transfer rates approaching 500MB/s (roughly equivalent to 2ns access time). The problem with RDRAM is that a Rambus Channel would require an extensive re-design to the current PC memory architecture - a move that most PC makers strenuously resist. As a result, you are most likely to see RDRAM in high-end, specialized computing systems. Still, as memory struggles to match the microprocessor, PC makers may yet embrace the Rambus approach for commercial systems.
WRAM (Windows RAM) - Samsung Electronics has recently introduced WRAM as a new video-specific memory device. WRAM uses multiple bit arrays connected with an extensive internal bus and high-speed registers that can transfer data almost continuously. Other specialized registers support attributes such as foreground color, background color, write-block control bits, and true-byte masking. Samsung claims data transfer rates of up to 640MB/s - about 50% faster than VRAM - yet WRAM devices are cheaper than their VRAM counterparts. It is likely that WRAM will receive some serious consideration in the next few years.
Memory techniques
Rather than incur the added expense of specialized memory devices, PC makers often use inexpensive, well-established memory types in unique architectures designed to make the most of low-end memory. There are three popular architectures that you will probably encounter; paged memory, interleaved memory, and memory caching.
Paged memory - this approach basically divides system RAM into small groups (or "pages") from 512 bytes to several KB long. Memory management circuitry on the motherboard allows subsequent memory accesses on the same "page" to be accomplished with zero wait states. If the subsequent access takes place outside of the current "page", one or more wait states may be added while the new "page" is found. This is identical in principle to fast-page mode DRAM explained above. You will find page mode architectures implemented on high-end i286, PS/2 (model 70 and 80), and many i386 systems.
Interleaved memory - this is a technique which provides better performance than paged memory. Simply put, interleaved memory combines two banks of memory into one. The first portion is "even", while the second portion is "odd" - so memory contents are alternated between these two areas. This allows a memory access in the second portion to begin before the memory access in the first portion has finished. In effect, interleaving can double memory performance. The problem with interleaving is that you must provide twice the amount of memory as matched pairs. Most PCs that employ interleaving will allow you to add memory one bank at a time, but interleaving will be disabled, and system performance will suffer.
Memory caching - is perhaps the most recognized form of memory enhancement architecture (Fig. 29-5). Cache is a small amount (anywhere from 8KB to 1MB) of very fast SRAM which forms an interface between the CPU and ordinary DRAM. The SRAM typically operates on the order of 5ns to 15ns which is fast enough to keep pace with a CPU using zero wait states. A cache controller IC on the motherboard keeps track of frequently-accessed memory locations (as well as predicted memory locations), and copies those contents into cache. When a CPU reads from memory, it checks the cache first. If the needed contents are present in cache (called a cache hit), the data is read at zero wait states. If the needed contents are not present in the cache (known as a cache miss), the data must be read directly from DRAM at a penalty of one or more wait states. A small quantity of very fast cache (called Tag RAM) acts as an index, recording the various locations of data stored in cache. A well-designed caching system can achieve a hit ratio of 95% or more - in other words, memory can run without wait states 95% of the time.
There are two levels of cache in the contemporary PC. CPUs from the i486 onward have a small internal cache - known as L1 cache - while external cache (SRAM installed as DIPs or COAST modules on the motherboard) is referred to as L2 cache. The i386 CPUs have no internal cache (though IBM’s 386SLC offers 8KB of L1 cache). Most i486 CPUs provide an 8KB internal cache. Early Pentium processors are fitted with two 8KB internal caches - one for data and one for instructions. Today’s Pentium II Slot 1 CPU incorporates 256KB-512KB of L2 cache into the processor cartridge itself.
Shadow memory - ROM devices (whether the BIOS ROM on your motherboard, or a ROM IC on an expansion board) are frustratingly slow with access times often exceeding several hundred nanoseconds. ROM access then requires a large number of wait states which slow down the system's performance. This problem is compounded because the routines stored in BIOS (especially the video BIOS ROM on the video board) are some of the most frequently accessed memory in your computer.
Beginning with the i386-class computers, some designs employed a memory technique called shadowing. ROM contents are loaded into an area of fast RAM during system initialization, then the computer maps the fast RAM into memory locations used by the ROM devices. Whenever ROM routines must be accessed during run-time, information is taken from the "shadowed ROM" instead of the actual ROM IC. The ROM performance can be improved by at least 300%.
Shadow memory is also useful for ROM devices that do not use the full available data bus width. For example, a 16-bit computer system may hold an expansion board containing an 8-bit ROM IC. The system would have to access the ROM not once but twice to extract a single 16-bit word. If the computer is a 32-bit machine, that 8-bit ROM would have to be addressed four times to make a complete 32-bit word. You may imagine the hideous system delays that can be encountered. Loading the ROM to shadow memory in advance virtually eliminates such delays. Shadowing can usually be turned on or off through the system’s CMOS Setup routines.
The issue of parity
As you might imagine, it is vital that data and program instructions remain error-free. Even one incorrect bit due to electrical noise or a component failure can crash the PC, corrupt drive information, cause video problems, or result in a myriad of other faults. PC designers approached the issue of memory integrity by employing a technique known as parity (the same technique used to check serial data integrity).
The parity principle
The basic idea behind parity is simple - each byte written to memory is checked, and a 9th bit is added to the byte as a checking (or "parity") bit. When a memory address is later read by the CPU, memory checking circuitry on the motherboard will calculate the expected parity bit, and compare it to the bit actually read from memory. In this fashion, the PC can continuously diagnose system memory by checking the integrity of its data. If the read parity bit matches the expected parity bit, the data (and indirectly the RAM) is assumed to be valid, and the CPU can go on its way. If the read and expected parity bits do not match, the system registers an error and halts. Every byte is given a parity bit, so for a 32-bit PC, there will be 4 parity bits for every address. For a 64-bit PC, there are 8 parity bits, and so on.
Even vs. odd
There are two types of parity - even and odd. With even parity, the parity bit is set to 0 if there are an even number of 1s already in the corresponding byte (keeping the number of 1s even). If there is not an even number of 1s in the byte, the even parity bit will be 1 (making the number of 1s even).
With odd parity, the parity bit is set to 0 if there is an odd number of 1s already in the corresponding byte (keeping the number of 1s odd). If there if there is not an odd number of 1s in the byte, the odd parity bit will be 1 (making the number of 1s odd).
Although even and odd parity work opposite of one another, both schemes serve exactly the same purpose, and have the same probability of catching a bad bit. The memory device itself does not care at all about what type of parity is being used - it just needs to have the parity bits available. The use of parity (and the choice of even or odd) is left up to the motherboard’s memory control circuit.
The problems with parity
While parity has proven to be a simple and cost-effective means of continuously checking memory, there are two significant limitations. First, though parity can detect an error, it cannot correct the error because there is no way to tell which bit has gone bad - this is why a system simply halts when a parity error is detected. Second, parity is unable to detect multi-bit errors. For example, if a 1 accidentally becomes a 0 and a 0 accidentally becomes a 1 within the same byte, parity conditions will still be satisfied. Fortunately, the probability of a multi-bit error in the same byte is extremely remote.
Circumventing parity
Over the last few years, parity has come under fire from PC makers and memory manufacturers alike. Opponents claim that the rate of parity errors due to hardware (RAM) faults is very small, and that the expense of providing parity bits in a memory-hungry marketplace just isn’t justified anymore. There is some truth to this argument considering that the parity technique is over 15 years old, and has serious limitations.
As a consequence, a few motherboard makers have begun removing parity support from their low-end motherboards, and others are providing motherboards that will function with or without parity (usually set in CMOS or with a motherboard jumper). Similarly, some memory makers are now providing non-parity and "fake" parity memory as cheaper alternatives to conventional parity memory. Non-parity memory simply foregoes the 9th bit. For example, a non-parity SIMM would be designated x8 or x32 (i.e. 4Mx8 or 4Mx32). If the SIMM supports parity, it will be designated x9 or x36 (i.e. 4Mx9 or 4Mx36). Fake parity is a bit more devious - the 9th bit is replaced by a simple (and dirt cheap) parity generator chip which "looks" like a normal DRAM IC. When a read cycle occurs, the parity chip on the SIMM provides the proper parity bit to the motherboard all the time. In effect, your memory is "lying" to the motherboard.
While there is a cost savings, your memory is left with no means of error checking at all. It’s a little like driving a car without a speedometer - you could go for miles without a problem, but sooner or later you’ll cross a speed trap. In actual practice, you can go indefinitely without parity, but when an error does occur, having parity in place can save you immeasurable frustration. Unless the "lowest cost" is your absolute highest priority, it is recommended that you spend the extra few dollars for parity RAM.
NOTE: Most motherboards can be operated with non-parity RAM. It is also usually possible to mix parity and non-parity memory in the same system. But in either case, you will need to disable ALL parity checking features for the RAM.
Abuse and detection of fake memory
Another potential problem with "fake" parity memory is fraud. There have already been reported instances where memory was purchased as "parity" at full price - only to find that the parity ICs were actually parity generators. This was determined by dissecting the IC packages and finding that the IC die in the parity position did not match the IC dies in the other bit positions. The buyer doesn’t know because parity generators are packaged to look just like DRAM ICs, and there is no other obvious way to tell just by looking at the SIMM or other memory device. System diagnostic software also cannot detect the presence of parity memory vs. fake memory.
There are really only two ways to protect yourself from fake memory fraud. First, industry experts indicate that many fake parity ICs (the parity generators) are marked with designations such as "BP", "VT", "GSM", or "MPEC". If you find that 1 out of every 9 ICs on your SIMM carries such a designation (or any other designation not matching the first 8), you may have a fraud situation. Of course, the first step in all justice is a "benefit of the doubt", so contact the organization you purchased the memory from - they may simply have sent the wrong SIMMs.
Second, you can check the IC dies themselves. Unfortunately, this requires you to carefully dissect several IC packages on the SIMM and compare the IC dies under a microscope - resulting in the destruction of the memory device(s). If the 9th die looks radically different (usually much simpler) than the other 8, you’ve likely got fake parity. A non-destructive way to check the SIMM is to use a SIMM checker (if you have access to one) with a testing routine specially written to test parity memory. If the SIMM works but the parity IC test fails (i.e. the tester cannot write to the parity memory), chances are you’ve got fake parity.
If you determine that you have been sold fake parity memory in place of real parity memory, and you cannot get any satisfaction from the seller, you are encouraged to contact the Attorney General in the seller’s state, and convey your information to them. After all, if you’re being stiffed, chances are a lot of other people are too - and they probably don’t even know it.
Alternative error correction
Although this book supports the use of parity, it is also quick to recognize its old age. In the world of personal computing, parity is an ancient technique. Frankly, it could easily be replaced by more sophisticated techniques such as Error Correction Code (ECC) or ECC-on-SIMM (EOS). ECC (which is already being employed in high-end PCs and file servers) uses a mathematical process in conjunction with the motherboard’s memory controller, and appends a number of ECC bits to the data bits. When data is read back from memory, the ECC memory controller checks the ECC data read back as well.
ECC has two important advantages over parity. It can actually correct single-bit errors "on-the-fly" without the user ever knowing there’s been a problem. In addition, ECC can successfully detect 2-bit, 3-bit, and 4-bit errors, which makes it an incredibly powerful error detection tool. If a rare multi-bit error is detected, ECC is unable to correct it, but it will be reported and the system will halt.
It takes 7 or 8 bits at each address to successfully implement ECC. For a 32-bit system, you will need to use x39 or x40 SIMMs (i.e. 8Mx39 or 8Mx40). These are relatively new designations, so you should at least recognize them as ECC SIMMs if you encounter them. As an alternative, some 64-bit systems use two 36-bit SIMMs for a total of 72 bits - 64 bits for data and 8 bits (which would otherwise be for parity) for ECC information.
EOS is a relatively new (and rather expensive) technology which places ECC functions on the memory module itself, but provides ECC results as parity - so while the memory module runs ECC, the motherboard continues to see parity. This is an interesting experiment, but it is unlikely that EOS will gain significant market share. Systems that use parity can be fitted with parity memory much more cheaply than EOS memory.
Memory installation and options
Installing memory is not nearly as easy as it used to be. Certainly, today’s memory modules just plug right in, but deciding which memory to buy, how much (or how little) to buy, and how to use existing memory in new systems, presents technicians with a bewildering variety of choices. This part of the chapter illustrates the important ideas behind choosing and using memory.
Getting the Right Amount
"How much memory do I need?" This is an age-old question which has plagued the PC industry ever since the 80286 CPU broke the 1MB memory barrier. With more memory, additional programs and data can be run by the CPU at any given time - which indirectly helps to improve the productivity of the particular PC. The problem is cost. Typical DRAM is running around $15/MB (US) - compared with about $0.50/MB (US) for hard drive space. The goal of good system configuration is to install enough memory to support the PC’s routine tasks. Installing too much memory means that you’ve spent money for PC resources that just remain idle. Installing too little memory results in programs that will not run (typical under DOS), or poor system performance because of extensive swap file use (typical under Windows).
So how much memory is enough? The fact of the matter is that "enough" is an ever-changing figure. DOS systems of the early 1980s (8088/8086) worked just fine with 1MB. By the mid-1980s (80286), DOS systems with 2MB were adequate. Into the late 1980s (80386), Windows 3.0 and 3.1 needed 4MB. As the 1990s got underway (80486), Windows systems with 8MB were common (even DOS applications were using 4-6MB). Today, with Pentium systems and Windows 95, 16MB is considered to be a minimum requirement, and 32MB systems are readily available. For today, this is the benchmark that you should use for general-purpose home and office systems. But by the end of the decade, 48-64MB systems will probably be the norm. And this is not to say that 32MB systems are the pinnacle of performance. Today’s file servers and industrial-strength design packages are employing 64MB to 128MB of RAM - motherboard chipsets can often support up to 512MB of RAM or more.
Filling banks
Another point of confusion is the idea of a "memory bank". Most memory devices are installed in sets (or banks). The amount of memory in the bank can vary depending on how much you wish to add, but there must always be enough data bits in the bank to fill each bit position. Table 29-11 illustrates a relationship between data bits and banks for the range of typical CPUs. For example, the 8086 is a 16-bit microprocessor (2 bytes). This means that 2 extra bits are required for parity giving a total of 18 bits. Thus, one bank is 18 bits wide. You may fill the bank by adding eighteen 1-bit DIPs, or two 30-pin SIMMs. As another example, an 80486DX is a 32-bit CPU, so 36-bits are needed to fill a bank (32 bits plus 4 parity bits). If you use 30-pin SIMMs, you will need four to fill a bank. If you use 72-pin SIMMs, only one is needed. Note that the size of the memory in MB does not really matter - so long as the entire bank is filled.
Bank requirements
There is more to filling a memory bank than just installing the right number of bits. Memory amount, memory matching, and bank order are three additional considerations. First, you must use the proper memory amount that will bring you to the expected volume of total memory. Suppose a Pentium system has 8MB already installed in Bank 0, and you need to put another 8MB into the system in Bank 1. Table 29-11 shows that two 72-pin SIMMs are needed to fill a bank, but each SIMM need only be 1M. Remember from the discussion of megabytes that a 1Mx36-bit (w/ parity) device is 4MB. Since 2 such SIMMs are needed to fill a bank, the total would be 8MB. When added to the 8MB already in the system, the total would be 16MB.
How about another example? Suppose the same 8MB is already installed in your Pentium system, and you want to add 16MB to Bank 1 rather than 8MB (bringing the total system memory to 24MB). In that case, you could use two 2M 72-pin SIMMs where 2Mx36 is 8MB (w/ parity) per SIMM. Two 8MB SIMMs yield 16MB, bringing the system total to (16MB+8MB) or 24MB.
Now for a curve. Suppose you want to outfit that Pentium as a network server with 128MB of RAM. Remember that there’s already 8MB in Bank 0, which means there’s only Bank 1 available. Since the largest commercially available SIMMs are 8Mx36 (32MB w/ parity), you can only add up to 64MB to Bank 1 (for a system total of 72MB. To get around this, you should remove the existing 1Mx36 SIMMs in Bank 0, and fill both Bank 0 and Bank 1 with 8Mx36 SIMMs which would put 64MB in Bank 0, and 64MB in Bank 1 - yielding 128MB in total. You can review many of the recommended SIMM/DIMM combinations for a typical Pentium motherboard in Table 29-12.
Another bank requirement demands memory matching - using SIMMs of the same size and speed within a bank. For example, when adding multiple SIMMs to a bank, each SIMM must be rated for the same access speed, and share the same memory configuration (i.e. 2Mx36).
Finally, you must follow the bank order. For example, fill Bank 0 first, then Bank 1, then Bank 2, and so on. Otherwise, memory will not be contiguous within the PC, and CMOS will not recognize the additional RAM.
Recycling older memory devices
Given the relatively high cost of PC memory, it is only natural that users and technicians alike would choose to re-use memory as much as possible when systems are upgraded or replaced. It is a simple matter to re-use memory - just as you would re-use hard drives or video boards. But there are some special issues to consider before you make plans to transfer memory from one system to another.
Memory speed
The goal of memory is to keep pace with the microprocessor using a minimum of wait states. It is possible to place a 100ns SIMM in a Pentium system, but the wait states required to allow this awful mis-match would negate any benefits from the advanced microprocessor. As a consequence, it is most effective to use memory that is fast enough to handle the CPU in the system that will be receiving the memory. Table 29-10 shows typical memory speeds for various microprocessors. It is possible to use memory if the speed is faster than the minimum requirement, but all the memory in the bank should be the same speed. Ordinarily, there is no reason to buy memory that is faster than necessary - no additional benefit is realized by the system. The only time it might be advisable to invest in faster memory is if you know in advance that the memory will eventually be transferred to another system.
Memory type
You should also be sure to use the same type of memory (i.e. EDO, FPM, SDRAM, and so on). For example, if your motherboard is designed to use EDO RAM, and you have EDO RAM already installed, you should be sure to install more EDO RAM. Some motherboard designs allow you to mix memory types, but mixing memory types on other (especially older) motherboards may cause the system to malfunction.
SIMM stackers
Although your memory type should be able to fit into the new computer, there are ways to make it fit. One of the most popular memory adapters is the "SIMM Stacker". The devices are actually known by a variety of trade names, but all allow you to convert four 30-pin SIMMs into a 72-pin SIMM frame. However, there are two drawbacks with SIMM Stackers:
Mixing "composite" and "non-composite" SIMMs
Most ordinary 30-pin SIMMs use 9 ICs (8 for data and 1 for parity). From time to time, you may encounter SIMMs with just a few ICs (usually three). The composite SIMM (with 9 ICs) is older - using less-dense memory. The non-composite SIMM (with 3 or so ICs) generally uses newer memory devices. In theory, it should be possible to mix composite and non-composite SIMMs together in the same bank or in the same system. However, there have been system problems reported when this happens. As a rule, you can try mixing these two generations of memory, but if you encounter memory problems with the system later on, remove either memory type and see if the problem goes away.
Remounting and rebuilding memory
Memory "recycling" has taken another more unexpected turn - some small companies are actually taking older memory devices and re-mounting them on SIMMs and other memory structures. In this way, you can use DIPs that are re-mounted on a SIMM. For example, a company called Autotime (www.autotime.com) in Portland, OR will remove memory devices from one SIMM and install them on a SIMM that you need (i.e. remove the ICs from four 1MB 30-pin SIMMs and install and test them on one 4MB 72-pin SIMM).
Memory troubleshooting
Unfortunately, even the best memory devices fail from time to time. An accidental static discharge during installation, incorrect installation, a poor system configuration, operating system problems, and even outright failures due to old age or poor manufacture can cause memory problems. This part of the chapter looks at some of the troubles that plague memory devices, and offers advise on how to deal with them.
Companion CD: There is a selection of tools on the Companion CD which can aid you in testing and troubleshooting PC memory. Check out CACHECHK.ZIP for cache testing, MEMSCAN.ZIP and RAMMAP.ZIP for general teting, and SHADTEST.ZIP for shadow RAM performance testing.
Memory test equipment
If you are working in a repair-shop environment, or plan to be testing a substantial number of memory devices, you should consider acquiring some specialized test equipment. A memory tester, such as the SIMCHECK from Innoventions, Inc. (Fig. 29-6), is a modular microprocessor-based system that can perform a thorough, comprehensive test of various SIMMs and indicate the specific IC that has failed (if any). The system can be configured to work with specific SIMMs by installing an appropriate adaptor module like the one shown in Fig. 29-7. Intelligent testers work automatically, and show the progress and results of their examinations on a multi-line LCD - guesswork is totally eliminated from memory testing.
Single ICs such as DIPs and SIPs can be tested using a single chip plug-in module. The static RAM checker illustrated in Fig. 29-8 is another test bed for checking high-performance static RAM components in a DIP package. Both Innoventions test devices work together to provide a full-featured test system. Specialized tools can be an added expense - but no more so than an oscilloscope or other piece of useful test equipment. The return on your investment is less time wasted in the repair, and fewer parts to replace.
Repairing SIMM sockets
If there is one weak link in the architecture of a SIMM, it is the socket which connects it to the motherboard. Ideally, the SIMM should sit comfortably in the SIMM socket, then gently snap back - held in place by two clips on either side of the socket. In actual practice, you really have to push that SIMM to get it into place. Taking it out again is just as tricky. As a result, it is not uncommon for a SIMM socket to break and render your extra memory unusable.
The best ("textbook") solution is to remove the SIMM socket and install a new one. Clearly there are some problems with this tactic. First, removing the old socket will require you to remove the motherboard, desolder the broken socket, then solder in a new socket (which you can buy from a full-feature electronics store such as DigiKey). In the hands of a skilled technician with the right tools, this is not so hard. But the printed circuit runs of a computer motherboard are extremely delicate, and the slightest amount of excess heat can easily destroy the sensitive, multi-layer connections.
Fortunately, there are some tricks that might help you. If either of the SIMM clips have bent or broken, you can usually make use of a medium-weight rubber band that is about 1" shorter than the SIMM. Wrap the rubber band around the SIMM and socket, and the rubber band should do a fair job holding the SIMM in place. If any part of the socket should crack or break, it can be repaired (or at least reinforced) with a good-quality epoxy. If you choose to use epoxy, be sure to work in a ventilated area, and allow plenty of time for the epoxy to dry.
Contact corrosion
Here’s one to tuck away for future reference. Corrosion can occur on SIMM contacts if the SIMM contact metal is not the same as the socket - this will eventually cause contact (and memory) problems. As a rule, check that the metal on the socket contact is the same as the SIMM contacts (usually tin or gold). You may be able get around the problem in the short term by cleaning corrosion off the contacts manually using a cotton swab and good electronics-grade contact cleaner. In the mean time, if you discover that your memory and connectors have dissimilar metals, you may be able to get the memory seller to exchange the SIMMs.
Parity errors
Parity errors constitute many of the memory faults that you will see as a technician. As you saw earlier in this chapter, parity is an important part of a computer’s self-checking capability. Errors in memory will cause the system to halt - rather than continue blindly along with a potentially catastrophic error. But it is not just faulty memory that causes parity errors. Parity can also be influenced by your system’s configuration. Here are the major causes of parity problems:
When you are faced with a parity error after an upgrade, you should suspect a problem with wait states, so check that first. If the wait states are correct, systematically remove each SIMM, clean the contacts, and re-seat each SIMM. If the errors continue, try removing one bank of SIMMs at a time (chances are that the memory is bad). You may have to relocate memory so that Bank 0 remains filled. When the error disappears, the memory you removed is likely defective.
When parity errors occur spontaneously (with no apparent cause), you should clean and re-install each SIMM first to eliminate the possibility of bad contacts. Next, check the power supply outputs - low or noisy outputs may allow random bit errors. You may have to upgrade the supply if it is overloaded. Try booting the system "clean" from a write-protected floppy disk to eliminate the possibility of buggy software or computer viruses. If the problem persists, suspect a memory defect.
Troubleshooting classic XT memory
It seems only fitting to start an examination of memory problems with a brief overview of the original IBM PC/XT computer. In the "good old days" of personal computing when there were only one or two commercial computers in the market, there were few memory arrangements. POST could be written very specifically, and errors could be correlated directly to memory IC location. The POST routine in an XT's BIOS ROM is designed to identify the exact bank and bit where a memory error is detected, and display that information on the computer's monitor.
IBM PC/XT computers classify a memory (RAM) failure as error code 201. In actual operation, a RAM error would appear as "XXYY 201", where "XX" is the bank, and "YY" is the bit where the fault is detected. As a result, it was often a simple matter to locate and replace a defective RAM IC. An XT is built with four RAM banks - each with 9 bits (parity plus 8 bits). Table 29-13 shows some bank and bit error codes for XT-class computers. As an example, suppose an XT system displayed 0002 201. This would indicate a memory failure in bank 0 at data bit D1. You need only replace the DIP memory IC residing at that location.
Symptom 29-1. You see 1055 201 or 2055 201 error message. Both of these error codes indicate an problem with the system's DIP switch settings. Remember that XTs do not use CMOS RAM to contain a system setup configuration, so DIP switches are used to tell the system how much memory should be present. If memory is added or removed, the appropriate switches in switch bank 2 (bits 1 to 8) and switch bank 1 (bits 3 and 4) must be set properly. Turn off the computer, check your switch settings and reboot the computer.
Symptom 29-2. You see a "PARITY CHECK 1" error message. This error typically suggests a power supply problem - RAM ICs are not receiving the proper voltage levels, so their contents are being lost or corrupted. When this happens, parity errors will be produced. Remove all power from the computer and repair or replace the power supply.
Symptom 29-3. You see a "XXYY 201" error message. This is a general RAM failure format for XT computers indicating the bank and bit where the fault is located. XX is the faulty bank, and YY is the faulty bit. See Table 29-13 to decipher the specific bank and bit in an XT. For example, an error code of 0004 201 indicates a memory fault in bank 0 (00) and bit D2 (04). Replace the defective IC, or bank of ICs.
Symptom 29-4. You see a "PARITY ERROR 1" error message. Multiple addresses or multiple data bits are detected as faulty in the XT. In some cases, one or more ICs may be loose or inserted incorrectly in their sockets. Remove power from the system and reseat all RAM ICs. If all RAM ICs are inserted correctly, rotate a new DRAM IC through each occupied IC location until the defective IC is located.
Troubleshooting classic AT memory
Like the XT, IBM's PC/AT was the leader of the 80286 generation. Since there was only one model (at the time), ATs use some specific error messages to pinpoint memory (RAM or ROM) problems on the motherboard, as well as in its standard memory expansion devices. The 200 series error codes represent system memory errors (Table 29-14). ATs present memory failures in the format: "AAXXXX YYYY 20x". The ten digit code can be broken down to indicate the specific system bank and IC number, although the particular bit failure is not indicated. The first two digits ("AA") represent the defective bank, while the last four digits ("YYYY") show the defective IC number. It is then a matter or finding and replacing the faulty DIP IC. Table 29-15 shows a set of error codes for early AT-class computers. For example, suppose an IBM PC/AT displayed the error message; "05xxxxxx 0001 201" (we don't care about the x's). That message would place the error in IC 0 of bank 1 on the AT's system memory.
Troubleshooting contemporary memory errors
Since the introduction of 286-class computers, the competition among motherboard manufacturers, as well as the rapid advances in memory technology, has resulted in a tremendous amount of diversity in the design and layout of memory systems. Although the basic concepts of memory operation remain unchanged, every one of the hundreds of computer models manufactured today use slightly different memory arrangements. Today's PCs also hold much more RAM than XT and early AT systems.
As a consequence of this trend, specific numerical (bank and bit) error codes have long-since been rendered impractical in newer systems where megabytes can be stored in just a few ICs. The i386, i486, and today’s Pentium/Pentium II computers use a series of generic error codes. The address of a fault is always presented, but there is no attempt made to correlate the fault's address to a physical IC. Fortunately, today's memory systems are so small and modular that trial-and-error isolation can often be performed rapidly. Let's look at some typical errors.
Symptom 29-5. You see the number "164" displayed on the monitor. This is a generic memory size error - the amount of memory found during the POST does not match the amount of memory listed in the system’s CMOS setup. Run the CMOS Setup routine, and make sure that the listed memory amount matches the actual memory amount. If memory has been added or removed from the system, you will have to adjust the figure in the CMOS Setup to reflect that configuration change. If CMOS Setup parameters do not remain in the system after power is removed, try replacing the battery or CMOS/RTC IC.
NOTE: The latest CMOS Setup routines do not list the amount of RAM - it is detected automatically. However, you may simply have to enter the CMOS Setup, then immediately save changes and exit to "recalibrate" the amount of detected RAM.
Symptom 29-6. You see an "Incorrect Memory Size" error message. This message can be displayed if the CMOS system setup is incorrect, or if there is an actual memory failure that is not caught with a numerical 200-series code. Check your CMOS Setup as described in Symptom 29-5 and correct the Setup if necessary. If the error persists, there is probably a failure in some portion of RAM.
Without a numerical code, it can be difficult to find the exact problem location, so adopt a divide-and-conquer strategy. Remove all expansion memory from the system, alter the CMOS setup to reflect base memory (system board) only, and retest the system. If the problem disappears, the fault is in some portion of expansion memory. If the problem still persists, you know the trouble is likely in your base (system board) memory. Take a known-good SIMM or DIMM and systematically swap devices until you locate the defective device. If you have access to a memory tester, the process will be much faster.
If you successfully isolate the problem to a memory expansion board (often found in older proprietary PCs), you can adopt the same strategy for the board(s). Return one board at a time to the system (and update the CMOS setup to keep track of available memory). When the error message reappears, you will have found the defective board. Use a known-good RAM IC, SIMM, or DIMM, and begin a systematic swapping process until you have found the defective memory device.
Symptom 29-7. You see a "ROM Error" message displayed on the monitor. To guarantee the integrity of system ROM, a checksum error test is performed as part of the POST. If this error occurs, one or more ROM locations may be faulty. Your only alternative here is to replace the system BIOS ROM(s) and retest the system.
Symptom 29-8. New memory is installed, but the system refuses to recognize it. New memory installation has always presented some unique problems since different generations of PC deal with new memory differently. The oldest PCs require you to set jumpers or DIP switches in order to recognize new blocks of memory. The vintage i286 and i386 systems (i.e. a PS/2) use a setup diskette to tell CMOS about the PC’s configuration (including new memory). More recent i386 and i486 systems incorporate an "installed memory" setting into a CMOS Setup utility in BIOS. Late-model i486 and Pentium systems actually "auto-detect" installed memory each time the system is booted (so it need not be entered in the CMOS setup).
Also check that an correct bank has been filled properly. The PC may not recognize any additional memory unless an entire bank has been filled, and the bank is next in order (i.e. Bank 0, then Bank 1, and so on). You may wish to check the PC’s user manual for any unique rules or limitations in the particular motherboard.
NOTE: Some late-model Pentium/Pentium II motherboards do NOT need banks filled in order, though that’s usually the safest policy to follow when upgrading or troubleshooting any PC.
Symptom 29-9. New memory has been installed or replaced, and the system refuses to boot. When faced with complete boot failures, always start by checking AC power, the system power switch, and power connections to the motherboard. Also see that all expansion boards are inserted evenly and completely in their expansion slots (flexing the motherboard during memory installation may have pryed one or more boards out of their slots). Your memory modules may not be inserted correctly. Take the modules out and seat them again - making sure the locking arm is holding the module securely in place. If the problem continues, you probably do not have the right memory module for that particular computer. Make sure that the memory module (SIMM or DIMM) is the a correct part which is compatible with your PC. Finally, check for any particular "device order" that may be required by the motherboard. Certain systems require that memory be installed in pairs or in descending order by size.
Symptom 29-10. You see an "XXXX Optional ROM Bad, Checksum = YYYY" error message. Part of the POST sequence checks for the presence of any other ROMs in the system. When another ROM is located, a checksum test is performed to check its integrity. This error message indicates that the external ROM (such as a SCSI adapter or Video BIOS) has checked bad, or its address conflicts with another device in the system. In either case, system initialization cannot continue.
If you have just installed a new peripheral device when this error occurs (i.e. a SCSI controller board), try changing the device’s ROM address jumpers to resolve the conflict. If the problem remains, remove the peripheral board - the fault should disappear. Try the board on another PC. If the problem continues on another PC, the adapter (or its ROM) may be defective. If this error has occurred spontaneously, remove one peripheral board at a time and retest the system until you isolate the faulty board, then replace the faulty board (or just replace its ROM if possible).
Symptom 29-11. You see a general RAM error with fault addresses listed. In actual practice, the error message may appear as any of the examples below depending on the specific fault, where the fault was detected, and the BIOS version reporting the error:
Each of the errors shown above are general RAM error messages indicating a problem in base or extended/expanded RAM. The code "XXXX" is the failure segment address - an offset address may be included. The word "YYYY" is what was read back from the address, and "ZZZZ" is the word that was expected. The difference between these read and expected words is what precipitated the error. In general, these errors indicate that at least one base RAM IC (if you have RAM soldered to the motherboard) or at least one SIMM/DIMM has failed. A trial-and-error approach is usually the least expensive route in finding the problem. First, re-seat each SIMM or DIMM and retest the system to be sure that each SIMM/DIMM is inserted and secured properly. Rotate a known-good SIMM/DIMM through each occupied SIMM/DIMM socket in sequence. If the error disappears when the known-good SIMM or DIMM is in a slot, the old device that had been displaced is probably the faulty one. You can go on to use specialized SIMM troubleshooting equipment to identify the defective IC, but such equipment is rather expensive unless you intend to repair a large volume of SIMMs/DIMMs to the IC level.
If the problem remains unchanged even though every SIMM has been checked, the error is probably in the motherboard RAM or RAM support circuitry. Run a thorough system diagnostic if possible, and check for failures in other areas of the motherboard that effect memory (such as the interrupt controller, cache controller, DMA controller, or memory management chips). If the problem prohibits a software diagnostic, use a POST board and try identifying any hexadecimal error code. If a support IC is identified, you can replace the defective IC, or replace the motherboard outright. If RAM continues to be the problem, try replacing the motherboard RAM (or replace the entire motherboard), and retest the system.
Symptom 29-12. You see a "Cache Memory Failure - Disabling Cache" error. The cache system has failed. The tag RAM, cache logic, or cache memory on your motherboard is defective. Your best course is to replace the cache RAM IC(s) or COAST (Cache-on-a-Stick) module. If the problem persists, try replacing the cache logic or tag RAM (or replace the entire motherboard). You will probably need a schematic diagram or a detailed block diagram of your system in order to locate the cache memory IC(s).
Symptom 29-13. You see a "Decreasing Available Memory" error message. This is basically a confirmation message that indicates a failure has been detected in extended or expanded memory, and that all memory after the failure has been disabled to allow the system to continue operating (although at a substantially reduced level). Your first step should be to re-seat each SIMM/DIMM and ensure that they are properly inserted and secured. Next, take a known-good SIMM or DIMM and step through each occupied SIMM/DIMM slot until the problem disappears - the device that had been removed is probably the faulty one. Keep in mind that you may have to alter the system's CMOS Setup parameters as you move memory around the machine (an incorrect setup can cause problems during system initialization).
Symptom 29-14. You are encountering a memory error with HIMEM.SYS under DOS. In many cases, this is a compatibility problem with system memory. For example, the Intel Advanced/AS motherboard is incompatible with two specific Texas Instruments EDO SIMMs (part numbers TM124FBK32S-60 and TM248GBK32S-60). Other EDO SIMMs from TI and other vendors will not cause this error. Try a SIMM from a different manufacturer. Also make sure that you’re using the latest version of HIMEM.SYS.
Symptom 29-15. Memory devices from various vendors refuse to work together. The system experiences a "Memory failure" during the memory count at start time. This is a very "machine-specific" problem. For example, Gateway Solo PCs can suffer this problem when customers use the same size memory modules (4MB, 8MB, or 16MB) made from different vendors. Try matching the memory modules from the same manufacturer (including part number and speed).
Symptom 29-16. Windows 95 "Protection" errors occur after adding SIMMs/DIMMs. Windows 95 stalls with "Windows Protection Errors" during boot, or randomly crashes with "Fatal Exception Errors" when opening applications. This is a known problem with the Intel Thor motherboard using the 1.00.01.CNOT BIOS after installing 32MB of RAM. This issue is usually due to certain third party SIMMs operating at speeds faster or slower than 60ns. The motherboard probably has tight memory specifications, and SIMMs which operate at CORRECT speed are required (not faster or slower - even though the SIMMs are "marked" properly). Some SIMM manufacturers mark the SIMMs at 60ns, but the SIMMs actually run at 45ns. Try some SIMMs from a different manufacturer. It is also possible that a BIOS upgrade may loosen timing enough to make the SIMMs usable.
Further study
That’s it for Chapter 29. Be sure to review the glossary and chapter questions on the accompanying CD. If you have access to the Internet, take a look at some of the memory resources listed below:
Autotime: http://www.autotime.com
Cameleon Technology: http://www.camusa.com/
CST, Inc: http://www.simmtester.com/
Innoventions: http://www.simcheck.com/
Jaguar Marketing Group: http://www2.inow.com/~degeorge/jaguar.htm
Kingston: http://www.kingston.com
PNY: http://www.pny.com
Simmsaver Technology, Inc.: http://www.simmsaver.com/
Chapter: 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28 | 29 | 30 | 31 | 32 | 33 | 34 | 35 | 36 | 37 | 38 | 39 | 40 | 41 | 42 | 43 | 44 | 45 | 46 | 47 | 48 | 49 | 50 | 51 | 52 | 53 |
| Reserve your copy at a Beta Bookstore near you! |
Contact Bet@books © 1998 The McGraw-Hill Companies, Inc. All rights reserved. Any use of this Beta Book is subject to the rules stated in the Terms of Use. |