THE BIOS GUIDE

BIOS settings are a frequent problem asked about in several hardware related newsgroups. Did you ever experienced a system lock up or poor performance and erratic behavior due to improper BIOS settings? Have you ever been left in the dark by a cryptic 5 pages, badly written motherboard manual? The answer is probably yes.

BIOS

Basic Input Output System. All computer hardware has to work with software through an interface. The BIOS gives the computer a little built-in starter kit to run the rest of softwares from floppy disks (FDD) and hard disks (HDD). The BIOS is responsible for booting the computer by providing a basic set of instructions. It performs all the tasks that need to be done at start-up time: POST (Power-On Self Test, booting an operating system from FDD or HDD). Furthermore, it provides an interface to the underlying hardware for the operating system in the form of a library of interrupt handlers. For instance, each time a key is pressed, the CPU (Central Processing Unit) perform an interrupt to read that key. This is similar for other input/output devices (Serial and parallel ports, video cards, sound cards, hard disk controllers, etc...). Some older PC's cannot co-operate with all the modern hardware because their BIOS doesn't support that hardware. The operating system cannot call a BIOS routine to use it; this problem can be solved by replacing your BIOS with an newer one, that does support your new hardware, or by installing a device driver for the hardware.

CMOS

Complementary Metal Oxide Semiconductor. To perform its tasks, the BIOS need to know various parameters (hardware configuration). These are permanently saved in a little piece (64 bytes) of CMOS RAM (short: CMOS). The CMOS power is supplied by a little battery, so its contents will not be lost after the PC is turned off. Therefore, there is a battery and a small RAM memory on board, which never (should...) loses its information. The memory was in earlier times a part of the clock chip, now it's part of such a highly Integrated Circuit (IC). CMOS is the name of a technology which needs very low power so the computer's battery is not too much in use.

Actually, there is not a battery on new boards, but an accumulator (Ni_Cad in most cases). It is recharged every time the computer is turned on. If your CMOS is powered by external batteries, be sure that they are in good operating condition. Also, be sure that they do not leak. That may damage the motherboard. Otherwise, your CMOS may suddenly "forget" its configuration and you may be looking for a problem elsewhere. In the monolithic PC and PC/XT, this information is supplied by setting the DIP (Dual-In-line Package) switches at the motherboard or peripheral cards. Some new motherboards have a technology named the Dallas Nov-Ram. It eliminates having an on-board battery: There is a 10 year lithium cell epoxyed into the chip.

Chipset

A PC consists of different functional parts installed on its motherboard: ISA (Industry Standard Architecture), EISA (Enhanced Industry Standard Architecture) VESA (Video Enhanced Standards Association) and PCI (Peripheral Component Interface) slots, memory, cache memory, keyboard plug etc... Not all of these are present on every motherboard. The chipset enables a set of instructions so the CPU can work (communicate) with other parts of the motherboard. Nowadays most of the discrete chips; PIC (Programmable Interrupt Controller), DMA (Direct Memory Access), MMU (Memory Management Unit), cache, etc... are packed together on one, two or three chips; the chipset. Since chipsets of a different brand are not the same, for every chipset there is a BIOS version. Now we have fewer and fewer chipsets which do the job. Some chipsets have more features, some less. OPTi is such a commonly used chipset. In some well integrated motherboards, the only components present are the CPU, the two BIOS chips (BIOS and Keyboard BIOS), one chipset IC, cache memory (DRAMs, Dynamic Random Access Memory), memory (SIMMs, Single Inline Memory Module, most of the time) and a clock chip.

Setup

Setup is the set of procedures enabling the configure a computer according to its hardware caracteristics. It allows you to change the parameters with which the BIOS configures your chipset. The original IBM PC was configured by means of DIP switches buried on the motherboard. Setting PC and XT DIP switches properly was something of an arcane art. DIP switches/jumpers are still used for memory configuration and clock speed selection. When the PC-AT was introduced, it included a battery powered CMOS memory which contained configuration information. CMOS was originally set by a program on the Diagnostic Disk, however later clones incorporated routines in the BIOS which allowed the CMOS to be (re)configured if certain magic keystrokes were used.

Unfortunately as the chipsets controlling modern CPUs have become more complex, the variety of parameters specifiable in SETUP has grown. Moreover, there has been little standardization of terminology between the half dozen BIOS vendors, three dozen chipset makers and large number of motherboard vendors. Complaints about poor motherboard documentation of SETUP parameters are very common.

To exacerbate matters, some parameters are defined by BIOS vendors, others bychipset designers, others by motherboard designers, and others by various combinations of the above. Parameters intended for use in Design and Development, are intermixed with parameters intended to be adjusted by technicians -- who are frequently just as baffled by this stuff as everyone else is. No one person or organization seems to understand all the parameters available for any given SETUP.

Hardware Performance

Although computers may have basic similarities (they all look the same on a shelf), performance will differ markedly between them, just the same as it does with cars. The PC contains several processes running at the same time, often at different speeds, so a fair amount of coordination is required to ensure that they don't work against each other.

Most performance problems arise from bottlenecks between components that are not necessarily the best for a particular job, but a result of compromise between price and performance. Usually, price wins out and you have to work around the problems this creates.

The trick to getting the most out of any machine is to make sure that each component is giving of its best, then eliminate potential bottlenecks between them. You can get a bottleneck simply by having an old piece of equipment that is not designed to work at modern high speed - a computer is only as fast as its slowest component, but bottlenecks can also be due to badly written software.

System Timing

The clock is responsible for the speed at which numbers are crunched and instructions executed. It results in an electrical signal that switches constantly between high and low voltage several millions times a second.

The System Clock, or CLKIN, is the frequency used by the processor; on "*?s and 386s, this will be half the speed of the main crystal on the motherboard (the CPU devides it by two), which is often called CLK2IN. 486 processors run at the same speed, because they use both edges of the timing signal. A clock generator chip (82284 or similar) is used to synchronize timing signals around the computer, and the data bus would be run at slower speed synchronously with the CPU, e.g. CLKIN/4 for an ISA bus with a 33 MHz CPU.

ATCLK is a separate clock for the bus, when it's run asynchronously, or not derived from CLK2IN. There is also a 14.138 MHz crystal which was used for all system timing on XTs. Now it's only used for the colour frequency of the video controller (6845).

Memory Access

The cycle time is the time it takes to read from and write to a memory cell, and it consists of two stages; precharge and access. Precharge is where the capacitor in the memory cell is able to recover from a previous access and stabilize. Access is where a data bit is actually moved between memory and the bus or the CPU. Total access time includes the finding of data, data flow and recharge, and parts of the access time can be eliminated or overlapped to improve performance. The combination of precharge and access equals cycle time, which is what you should use to calculate wait states from.

There are ways of making refreshes happen so that the CPU doesn't notice (i.e. Concurrent and Hidden), which is helped by the 486 being able to use its on-board cache and not needing to use memory so often anyway. In addition, you can affect the Row Access Strobe (RAS), or have Column Access Strobe (CAS) before RAS (see Advanced Chipset Setup).

The fastest DRAM commonly available is rated at 60ns. As these chips need alternate refresh cycles, under normal circumstances data will actually be obtained every 120ns, giving you and effective speed of around 8 MHz for the whole computer, regardless of the CPU speed, assuming no action is taken to compensate. Memory chips therefore need to be operating at something like 20ns to keep up, assuming that the CPU needs only one clock cycle for each one from the memory bus; one internal cycle for each external one. Intel processors mostly use two for one, so the 33 MHz CPU is actually ready to use memory every 60ns, but you need to allow a little more for overheads, such as data assembly and the like. One way of matching the capacities of components with different speeds includes the use of wait states.

Wait States

A wait state indicates how many ticks of the system clock the CPU has to wait for memory to catch up-it will generally be 0 or 1, but can be up to 3 if you're using slower memory chips. Ways of avoiding wait states include:

· Page-mode memory. This will cut-down address cycles to retrieve information form one general area, based on the fact that the second access to a memory location on the same page takes around half the time as the first; addresses are normally in two halves, with high bits (for row) and low bits (for column) being multiplexed onto one set of address pins. The page address of data is noted, and if the next data is in the same area, a second address cycle is eliminated as a whole row of memory cells can be read in one go; that is, once a row access has been made, you can get to subsequent column addresses in that row in the time available (you should therefore increase row access time for best performance). Otherwise data is retrieved normally, which will take twice as long. Fast Page Mode is a quicker version of the same thing; the DRAMs concerned have a faster CAS access speed. Memory capable of running in page mode is different from normal bit-by-bit type, and the two don't mix. It's unlikely that low capacity SIMMs are so capable.

· Interleaved memory, which divides memory into two or four portions that process data alternately; that is, the CPU sends information to one section while another goes through a refresh cycle; a typical installation will have odd addresses on one side and even on the other (you can have word or block interleave). If memory accesses are sequential, the precharge of one will overlap the access time of the other. To put interleaved memory to best use, fill every socket you've got (that is, eight 1 Mb SIMMs are better than two 4 Mb ones). The SIMM types must be the same. As an example, a machine in non-interleaved mode (say a 386SX/20) may need 60ns or faster DRAM for 0ws access, where 80ns chip could do if interleaving were enabled.

· A processor RAM cache, which is a bridge between the CPU and slower main memory; it consists of anywhere between 32-512K of (fast) Static RAM chips and is designed to retain the most frequently accessed code and data from main memory. It can make 1 wait state RAM look like that with 0 wait states, without physical adjustments, assuming that the data the CPU wants is in the cache when required (known as a cache hit). To minimize the penalty of a cache miss, cache and memory access are often in parallel, with one being terminated when not required. On a 486, how much cache you need really depends on the amount of memory; Dell say that jumping from 128K to 256K only increases the hit rate by around 5% and Viglen say you only need more than 256K if you have more than 32 Mb RAM. A cache should be fast and capable of holding the contents of several different parts of main memory. Software plays a part as well, since cache operation is based on the assumption that programs access memory where they have done so already, or are likely to next, maybe through looping (where code is reused) or code is organized to be next to other relevant parts. A basic cache design will look up an address for the CPU and return the data inside one clock cycle, or 20ns at 50 MHz. Asynchronous SRAM will be used for this. As the round trip from the CPU to cache and back again takes up a certain amount of time, only the remainder is available to retrieve data, which gets smaller as the motherboard speed is increased. Synchronous SRAM uses a buffer to keep the whole routine inside one clock cycle, even though it may use two (or more) clock cycles the first time round. The address from the CPU is stored, and while the next is coming in to the buffer, the data for the first is retrieved, and the cycle continues. Pipeline SRAM uses more clock cycles, typically three, the first time round, and Burst SRAM will deliver 4 words (blocks of data) over for consecutive cycles if the request from the CPU is for the first; there will be no waiting for the CPU to request each one individually. Note the level 2 cache can be unreliable, so be prepared to disable it in the interests of reliability. For maximum efficiency, or minimum access time, a cache may be subdivided into smaller blocks that can be separately loaded, so the chances of a different part of memory being requested and the time needed to replace a wrong section are minimized. There are three mapping schemes that assist with this:

· Fully Associative, where the whole address is kept with each block of data in the cache (in tag RAM), needed because it is assumed there is no relationships between the blocks. This can be inefficient, as an address comparison needs to be made with every entry each time the CPU presents the address for its next instruction.

· Direct Mapped, where every block can only be in one place in the cache, so only one address comparison is needed to see if the data required is there. Although simple, the cache controller must go to main memory more frequently if program code needs to jump between locations with the sane index, which defeats the object somewhat, as alternate references to the same cache cell mean cache misses for other processes. The "index" comes form the lower order addresses presented by the CPU.

· Set Associative, a compromise between the above two. Here, an index can select several entries, so in a 2 Way Set Associative cache, 2 entries can have the same index, so two comparisons are needed to see if the data required is in the cache. Also, the tag field is correspondingly wider and needs larger SRAMs to store address information. As there are two locations for each index, the cache controller has to decide which one to update or overwrite, as the case may be. The most common methods used to make these decisions are Random Replacement, First In First Out (FIFO) and Least Recently Used (LRU). The latter is the most efficient. It the cache is large enough (e.g. 64K), performance from this over direct-mapping may not be much. A Write Thru Cache means that every write access is immediately passed on to memory; although it means that cache contents are always identical to main memory, it is slow, as the CPU then has to wait for DRAMs. Buffers can be used to provide a variation on this, where data is written into a temporary buffer so the CPU is released quickly before main memory is updated. A Write Back Cache, on the other hand, exists where changed data is temporarily stored in the cache and written to memory when the system is quiet, or when absolutely necessary. This will give better performance when main memory is slower than the cache, or when several writes are made in a very short space of time, but is more expensive. A "dirty bit" is used as a mental note that the cache and main memory contents are different, and that the cache contains the most up to date data. This bit will be checked if the cache needs to be written to, and main memory updated first if this bit is set. Some motherboards don't have the required SRAM for the dirty bit, but it's still faster than Write Thru.

Shadow RAM

ROMs are used by components that need their own instructions to work properly, such as video card of cacheing disk controller. ROMs are 8-bit devices, so only one byte is accessed at a time; also, they typically run between 150-400ns, so using them will be slow relative to 32-bit memory at 60-80ns, which is capable of making four accesses at once.

Shadow RAM is the process of copying the contents of a ROM directly into extended memory which is given the same address as the ROM, from where it will run much faster. The original ROM is then disabled, and the new location write protected. If your applications execute ROM routines often enough, enabling Shadow RAM will make a difference in performance of around 8%, assuming a program spends about 10% of its time using instructions from ROM, but theoretically as high as 300%. The drawback is that the RAM set aside for shadowing cannot be used for anything else, and you will lose a corresponding amount of extended memory, The remainder of Upper Memory, however, can usually be remapped to the end of extended memory and used there.

With some VGA cards, if video shadow is disabled, you might get DMA errors, because of timing when code is fetched from the VGA BIOS, when the CPU cannot accept DMA requests. Some programs don't make use of the video ROM, preferring to directly address the card's registers, so you may want to use extended memory for something else. If you machine hangs during the startup sequence for no apparent reason, check that you haven't shadowed an area of upper memory containing a ROM that doesn't like it-particularly one on a hard disk controller, or that you haven't got two in the same 128K segment.

Bus Types

The expansion bus (where expansion cards go) is an extension of the Central Processor, so when adding cards to it, you are extending the capabilities of the CPU itself. The relevance of this regard to the BIOS is that older cards are less able to cope with modern buses running at higher speeds than the original design of 8 or so MHz. Also, when the bus is accessed, the whole computer slows down to the bus speed, so it's often worth altering the speed of the bus or the wait states between it and the CPU to speed things up. The PC actually has four buses; the processor bus connects the CPU to its support chips, the memory bus connects it to its memory, the address bus is part of both of them, and the I/O (or expansion) bus is what concerns us here.

ISA

Industry Standard Architecture. The 8-bit version cane on the original PC and the AT, but the latter uses an extension to make it 16-bit. It has a maximum data transfer rate of about 8 megabits per second on an AT, which is actually well above the capability of disk drives, or most network and video cards. The average data throughput is around a quarter of that. Its design makes it difficult to mix 8- and 16-bit RAM or ROM within the same 128K block of upper memory; an 8-bit VGA card could force all other cards in the same (C000-DFFF) range to use 8 bits as well, which was a common source of inexplicable crashes where 16-bit network card were involved.

EISA

Extended Industry Standard Architecture. An evolution of ISA and (theoretically) backward compatible with it, including the speed (8 MHz), so the increased data throughput is mainly due to the bus doubling in size-but you must use EISA expansion cards. It has its own DMA arrangements, which can use the complete address space. On advantage of EISA (and Micro Channel) is the ease of setting up expansion cards-plus them in and run the configuration software which will automatically detect their settings.

MCA

Micro Channel Architecture. A proprietary standard established by IBM to take over from ISA, and therefore incompatible with anything else. It comes in two versions, 16- and 32-bit and, in practical terms, is capable of transferring around 20 mbps.

Local Bus

The local bus is one more directly suited to the CPU; it's next door (hence local), has the same bandwidth and runs at the same speed, so the bottleneck is less (ISA was local in the early days). Data is therefore moved along the bus at processor speeds. There are two varieties:

· VL-BUS, a 32-bit bus which allows bus mastering, and uses two cycles to transfers a 32-bit word, peaking at 66 Mb/sec. It also supports burst mode, where a single address cycle precedes four data cycles, meaning that 4 32-bit words can move in only 5 cycles, as opposed to 8, giving 105 Mb/sec at 33 MHz. The speed is mainly obtained by allowing VL-Bus adapter cards first choice at intercepting CPU cycles. It's not designed to cope with more than a certain number of cards at particular speeds; e.g. 3 at 33, 2 at 40 and only 1 at 50 MHz, and even that often needs a wait state inserted. VL-Bus 2 is 64-bit, yielding 320 Mb/sec at 50 MHz. There are two types of slot; Master and Slave. Master boards (e.g. SCSI controllers) have their own CPUs which can do their own things; slaves (i.e. video cards) don't. A salve board will work on a master slot, but not vice versa.

· PCI, which is a mezzanine bus, divorced from the CPU, giving it some independence and the ability to cope with more devices, so it's more suited to cross-platform work. It is time multiplexed, meaning that address and data lines share connections. It has its own burst mode that allows 1 address cycle to be followed by as many data cycles as system overheads allow. At nearly 1 word per cycle, the potential is 264 Mb/sec. It can operate up to 33 MHz, or 66 MHz with PCI 2.1 and can transfer data at 32 bits per clock cycle so you can get up to 132 Mbytes/sec (264 with 2.1). Each PCI card can perform up to 8 functions, and you can have more than one busmastering card on the bus. It should be noted, though, that many functions are not available with PCI, such as sound. Not yet, anyway. It is part of the plug and play standard, assuming your operating system and BIOS agree, so it is auto configuring (although some cards use jumpers instead of storing information in a chip); it will also share interrupts under the same circumstances. The PCI chipset handles transactions between cards and the rest of the system, and allows other buses to be bridged to it (typically and ISA bus to allow older cards to be used). Not all of them are equal, though; certain features, such as byte merging, may be absent. The connector may vary according to the voltage the card uses (3.3 or 5v; some cards can cope with both).

Basic Optimization Tricks

This section is intended for users who have a limited knowledge of BIOS setup. It provides four fundamental procedures that may help improve the performance of your system.

· Make sure that all standard settings correspond to the installed components of your system. For instance, you should verify the date, the time, available memory, hard disks and floppy disks. For more information, go to the

Standard CMOS Setup

section.

· Make sure that your cache memory (internal and external) is enabled. Of course you must have internal (L1) and external (L2) cache memory present which is always the case for recent systems (less than five years old). For more information, go to the Advanced CMOS Setup section. Recently, some motherboards were found having fake cache memory. Some unscrupulous manufacturers are using solid plastics chips containing no memory to lure vendors and customers and then gain extra profits in an highly competitive semiconductors market. Beware!

Make sure that your Wait States values are at the minimum possible. You must however be careful because if values are too low, your system may freeze (hang up).
Make sure to shadow your Video and System ROM. On older systems, this may improve performance significantly.
Make sure to use a coherent power management strategy. Choosing the right timing may increase the life expectancy of your hard disk. See the Power Management section.

Hard disk speed is the major bottleneck for a system performance, notably for those with 16 MB of memory and more. You may have the fastest CPU, lots of memory and a confortable cache, but if you have a crummy hard disk, you may not see improvement in performances.

POST and Entering Setup

When the system is powered on, the BIOS will perform diagnostics and initialize system components, including the video system. (This is self-evident when the screen first flicks before the Video Card header is displayed). This is commonly referred as POST (Power-On Self Test). Afterwards, the computer will proceed its final boot-up stage by calling the operating system. Just before that, the user may interrupt to have access to SETUP.

To allow the user to alter the CMOS settings, the BIOS provides a little program, SETUP. Usually, setup can be entered by pressing a special key combination (DEL, ESC, CTRL-ESC, or CRTL-ALT-ESC) at boot time (Some BIOSes allow you to enter setup at any time by pressing CTRL-ALT-ESC). The AMI BIOS is mostly entered by pressing the DEL key after resetting (CTRL-ALT-DEL) or powering up the computer. You can bypass the extended CMOS settings by holding the <INS> key down during boot-up. This is really helpful, especially if you bend the CMOS settings right out of shape and the computer won't boot properly anymore. This is also a handy tip for people who play with the older AMI BIOSes with the XCMOS setup. It allows changes directly to the chip registers with very little technical explanation.

A Typical BIOS POST Sequence

Most BIOS POST sequences occur along four stages:

1. Display some basic information about the video card like its brand, video BIOS version and video memory available.

2. Display the BIOS version and copyright notice in upper middle screen. You will see a large sequence of numbers at the bottom of the screen. This sequence is the BIOS identification line.

3. Display memory count. You will also hear tick sounds if you have enabled it (see Memory Test Tick Sound section).

4. Once the POST have succeeded and the BIOS is ready to call the operating system (DOS, OS/2, NT, WIN95, etc.) you will see a basic table of the system's configurations:

· Main Processor: The type of CPU identified by the BIOS. Usually Cx386DX, Cx486DX, etc..

· Numeric Processor: Present if you have a FPU or None on the contrary. If you have a FPU and the BIOS does not recognize it, see section Numeric Processor Test in Advanced CMOS Setup.

· Floppy Drive A: The drive A type. See section Floppy drive A in Standard CMOS Setup to alter this setting.

· Floppy Drive B: Idem.

· Display Type: See section Primary display in Standard CMOS Setup.

· AMI or Award BIOS Date: The revision date of your BIOS. Useful to mention when you have compatibility problems with adaptor cards (notably fancy ones).

· Base Memory Size: The number of KB of base memory. Usually 640.

· Ext. Memory Size: The number of KB of extended memory.

In the majority of cases, the summation of base memory and extended memory does not equal the total system memory. For instance in a 4096 KB (4MB) system, you will have 640KB of base memory and 3072KB of extended memory, a total of 3712KB. The missing 384KB is reserved by the BIOS, mainly as shadow memory (see Advanced CMOS Setup).

· Hard Disk C: Type: The master HDD number. See Hard disk C: type section in Standard CMOS Setup.

· Hard Disk D: Type: The slave HDD number. See Hard disk D: type section in Standard CMOS Setup.

· Serial Port(s): The hex numbers of your COM ports. 3F8 and 2F8 for COM1 and COM2.

· Parallel Port(s): The hex number of your LTP ports. 378 for LPT1.

· Other information: Right under the table, BIOS usually displays the size of cache memory. Common sizes are 64KB, 128KB or 256KB. See External Cache Memory section in Advanced CMOS Setup.

Hosted by www.Geocities.ws