Memory

Because of the eeprom and fast ram on the Cypress 3761 and 3681 development board along with the flash memory on the Cygnal 80C52 chips, we're starting a memory feed.  Friday February 8, 2002 09:13

COMMENT
Gerald Worchel

Embedded Flash Memory in Semicustom Designs

EMBEDDED FLASH MEMORY, while being used for a number of years in conjunction with application-specific standard products (ASSPs), such as those also containing DSPs for the cellular handset market, is also finding increased usage in the design of high-complexity, customer-specific ICs destined for high-end applications. Embedded flash memory, like embedded DRAM technology, relative to customer-specific, high-complexity designs, has been available for a fair amount of time — at least five years. However, to this point in time, neither has enjoyed the success many would have thought. In large part, the slower-than-anticipated growth for both of these technologies is related to process inconsistencies between embedded memory process requirements and that of logic functions.

As the need for increasing performance, particularly in infrastructure applications, such as networking and wireless basestations, continues, it will drive higher levels of integration, including mixed signal. Worldwide revenues of high- complexity, cell-based designs containing at least one or more blocks of embedded flash and destined for a unique customer application are forecast to increase from $160 million last year to $514 million by 2005. Compound annual growth rate for 2000 to 2005 is estimated at 26.4 percent.

The use of embedded flash, for both programming (or code generation/development) and data storage, will continue to grow in the future, from the perspective of percent of design starts containing the function. However, its growth, relative to inclusion in new design starts, will be slow due to the technology’s major disadvantage: programming voltage. The voltage required to program a flash device is 12 volts, a significant drawback in many leading-edge designs. Another differentiation between the two, in addition to purpose (or function), is high- lighted by the difference in required density.

On a worldwide basis, current estimates put customer-specific design starts, which contain embedded flash, at about 4 percent of total cell-based design starts. While this percent will grow throughout the forecast period, it is only projected to reach about 10 percent of all design starts by the end of the forecast period.

Applications for designs containing embedded flash memory will span the end-use market spectrum as well as find homes in both low-end and high-end products, in both the customer- specific and application-specific worlds. At the high end, the most prevalent use of embedded flash will be found in development platforms, where, in most cases, it will be used for code generation. On the standard product side, embedded flash will commonly be used with DSPs, where it will be used for data storage. The most common applications in the ASSP world for embedded flash memory will be in cellular handsets, PDAs, handheld global positioning systems and advanced digital cameras, to name just a few.

Communications, as is the case with most high-complexity, customer-specific designs, will totally dominate embedded flash product consumption throughout the forecast period. The second-largest end-use market will be the consumer segment, where the technology will be used in the design and prototyping of next-generation systems, such as set-top boxes, games and digital video. The consumer market will be followed by the industrial and EDP markets.

While flash technology in and of itself does a market make, its embedded brethren do not. In order to have a high- complexity design containing embedded flash memory, other ingredients (functions) are necessary. In this case, Cahners In-Stat/MDR refers to them as complementary functions, which can encompass logic, memory and/or analog. (Cahners In-Stat/MDR is owned by Cahners Business Information, the parent company of Electronic News.) Dependent upon designer and design requirements, any function can be embedded in a design so long as it exists somewhere in software and/or hardware and has been thoroughly characterized.

Gerald Worchel is a senior analyst with Cahners In-Stat/MDR based in Scottsdale, Ariz. He can be reach at [email protected]

Electronic News February 4, 2002  www.electronic news.com

The zeitgeist again. Monday February 4, 2002 13:34

BREAK POINTS

Jack G Ganssle

Can Hardware Be Trusted?

Safety-Critical systems that rely on firmware are inherently problematic. Outside of some purely academic work, software can’t be “proven” correct. We can do a careful design, work out failure modes, and then perform exhaustive testing, but the odds are that bugs will still lurk in any sizeable chunk of code. DO-178B [1 2 3] and other standards exist to help ensure reliability, but guarantees are elusive at best. No one really knows how to make a perfect program when sizes reach into the hundreds of thousands of lines and beyond.

It seems that in a capitalist economy the edge between increasing complexity and needed correctness is mediated— not well of course—by litigation. I’ve no doubt that some horrible accident is coming, one that will be attributed to a flawed embedded system. Inevitably it will be followed by millions of dollars in lawyers’ fees and more to the victims. High-level corporate executives will suddenly understand the risks of building computer-based products.

Perhaps this parallels the Green movement, which has shown that the cost of many products, once you factor in environmental degradation and clean- up bills, greatly exceeds the sell-price. I suspect, sadly, that it won’t be until the looming embedded disaster arrives that management will understand the true costs of building complex systems. Today, too many view software quality as a nice feature if there’s time, but figure the company can always reflash the system when the usual bugs surface. As embedded products control ever more of our world, in ever more interconnected and complex ways—maybe in ways that exceed our understanding—quality will have to become the foundation on which all other decisions are made.

Since the dawn of the embedded world, we’ve regarded software with implicit mistrust. Controlling a dangerous mechanical device? There’s always a hardware-based safety interlock that will take over in case of a crash. The hardware will save the day.

We generally assume the hardware is perfect. Yes, wise developers create designs that fail into a safe state, since components do die. But mostly we believe in catastrophic problems, where a chip just stops functioning correctly, a mechanical shock breaks something, or a poor solder joint, stressed by thermal cycles, lets a chip lead pop free.

Hardware is deterministic. If it’s working, well, it works. Until it breaks. But what about glitches? You know, those erratic and inexplicable things the system does but we can never reproduce. A wizened old technician said to me, “If it ain’t broke, I can’t fix it.” This is the model for accepting rare non- repeatable events.

Years ago, I worked on a project that taugh me never to allow wire-wrapped printed circuit boards in prototypes. (Wire-wrapped circuits often exhibit squirrelly problems due to long lead lengths and crosstalk between wires. But a human problem was worse.) The engineers I worked with invariably attributed every strange and irregular behavior to the lousy wire-wrapped prototype. “Once we go to PCB everything will be fine. This is just a glitch.”

The same problem repeatedly surfaced in the circuit boards, necessitating yet another design respin. “It’s just a glitch” frequently means “something is terribly wrong and we don’t have a clue what’s going on.” The culprit was a timing error, one that created observable symptoms only rarely.

When I was in the emulator business, I was shocked by the number of target systems that exhibited erratic memory problems. Our product’s built- in tests pounded hard on memory. They frequently reported unrepeatable, yet undeniably serious problems, especially with big RAM arrays. The tests were designed to create worst-case situations of heavy bus switching. Tiny amounts of power supply droop, exacerbated by the massive transitions, created occasional failures. Impedance issues were even more common; the test patterns tossed out at the highly capacitive RAMs created so much ringing that the logic couldn’t always differentiate between zeroes and ones!

It makes sense for managers to institute a “no glitch” policy. Until a prototype works reliably, or until the cause of a problem is well understood, we keep improving the system design in the pursuit of a high quality product.

Particle physics

Hardware can be less than perfect, but in the examples I’ve just shared, the problems all stem from design flaws. More careful design, better reviews, and additional testing should improve such systems substantially. It’s the software that’s going to be the problem, since the hardware is but a platform that runs those instructions faithfully.

The times may be changing. According to Intel, any assumption that processors just do what they are told is now wrong. The latest CPUs will run your program, and pretty fast at that. But once in a while, you can expect a bit in the instruction stream to flip at random. It’s the price of high technology.

Scared yet? I was after I read a September 3, 2001 EE Times article about Intel’s McKinley processor. One of its neat features is an L3 cache located onboard the CPU. This about doubles some performance figures.

The cache is big and the geometry of the part quite tiny. It’s fabricated with .18-micron line widths, which is astonishingly small but no longer bleeding edge. That’s 180 nanometers, or something like a third of the wavelength of red light. (When light wavelengths look big, surely we’ve fallen through the looking glass into the realm of the fantastic.)

In the article, Intel acknowledged that this small size coupled with the large surface area of the four megabyte L3 cache creates a serious target for incoming cosmic rays. There’s very little energy difference in such small parts between a zero and a one, so occasionally the cache may be struck by an incoming particle of just the right energy to flip a bit. What happens next is hard to predict. Perhaps the processor will execute a cache-miss, causing a flush and reload. But maybe not. A cache with wrong data, even when it’s just a single bit, is a disaster waiting to happen.

How bad is the problem? Apparently they’ve looked at this statistically, and feel that odds are the average user will see less than one bit-flip per thousand years. That’s a very long time. (I can’t help but wonder if Intel can justify a 1,000-year mean time between failures (MBTF) partly because no one expects a Windows-based PC to run for more than a day or two between crashes anyway.)

Can embedded systems live with a 1,000-year MTBF? Perhaps. Unless one considers that high-volume production means that, worldwide, cache bits will flip rather often. Build a million widgets with these reliability parameters, and statistics suggest 1,000 bit-flips per year, divided among all of the deployed products. That is, maybe 1,000 crashes per year would be due just to this one infrequently encountered hardware flaw.

Suppose the airlines retrofit all commercial planes with a super-gee-whiz new kind of avionics that includes just a single McKinley CPU. I read somewhere that, on average, there are 7,000 airliners flying at any time. Will this cause seven crashes per year? Or could it cause even more, since radiation sources pack more punch at higher altitudes.

This is not a rant against Intel or the McKinley processor. The company showed courage by admitting the issue. The problem stems from the evolution of our technology. All vendors pushing into the deep submicron will have to grapple with erratic hardware as transistor sizes approach quantum limits. The PC industry generally leads embedded by a few years as they pioneer high density parts, but we do not follow far behind. We embedded developers can expect to see these sorts of problems in our systems before too long.

If l80 nm line widths are subject to cosmic rays then what does the future hold? Most roadmaps suggest that leading-edge parts will use 50 nm geometries within 10 years. To put this in perspective, a hydrogen atom is about an angstrom across. In a decade, our line widths will span a mere 500 atoms. It’s hard to conceive of building anything so small, especially when a single chip will have a billion transistors made from these tiny building blocks.

Intel predicts these near-future parts will run at 20GHz. Even CISC parts are evolving towards the RISC ideal of one clock per instruction, so that processor will execute an instruction in 50 picoseconds! Not only are the transistors miniscule, they operate at speeds that give RF designers fits.

Heat, too, conspires against chip designers. The McKinley reportedly dissipates 130 watts. A few of these CPUs would make a fine toaster, and, no doubt, that much embedded intelligence will ensure the perfect piece of toast, every time.

I can’t find a McKinley datasheet, but the previous generation Itanium shows a max current requirement (processor plus cache) of about 90 amps. This is not a misprint: 90 amps. That’s not much less than a Honda’s engine cranking current; it will drain a Die Hard in half an hour. The part comes with an elaborate thermal management subsystem that automatically degrades performance as temperature rises, to avoid self-destruction.

Heat has always been the enemy of electronics. High temperatures destroy semiconductors. Thermal stresses can cause leads to pop free of circuit boards. Connectors unseat themselves, PCBs change shape, and all sorts of mechanical issues stemming from high temperatures reduce system reliability. I can’t help but wonder if simply powering systems up and down, with the resulting heating and cooling cycles, will create failures. Once we thought semiconductors lasted forever. Will this be true in the future?

It’s really quite marvelous that a $1,000 PC running at 1GHz or so works as well as it does. Think about the Saganesque billions of bits—maybe closer to hundreds of billions—running around your machine each second. Even when the machine is “idle” (whatever that means), it’s still slinging data at rates undreamed of a decade ago. Yet if just one of those bits flips, or is misinterpreted, or goes unstable for a nanosecond or two, the machine will crash.

Our traditional assumption of reliable hardware will not be valid in the high-density, very fast, and quite hot world that’s just around the corner. Intel’s disclosure of cosmic ray vulnerabilities may actually be a wake-up call to the computer community that it’s time to rethink our designs.

Perhaps the vendors will counter with redundant hardware. In some cases, this makes a lot of sense. It’s pretty easy— though expensive—to widen memory arrays and add error correction codes that detect and correct one and two bit errors. Much more difficult is creating reliable CPU redundancies for a reasonable price. Today the cache is at risk, but in the 50-nanometer, one-billion transistor processors of the near future, much of the part’s internal logic will surely be susceptible to random bit flips as well. The space shuttle manages hardware failures using five “voting” computers, so we know it’s possible to create such redundancy. But the costs are astronomical. The shuttle is an economic anomaly funded by taxpayer money Most embedded systems must adhere to a much less generous financial model.

Nothing new

But the hardware has always been unreliable. Older engineers will remember the transition to 16KB (16,384 bits, that is) DRAMs in the ‘70s. New process techniques produced memories subject to cosmic rays. Here we are, a quarter- century later with similar problems! Most of us at the time fought with flaky DRAM arrays, blaming our designs when the chips themselves were at fault. New kinds of plastic packages were invented, until the vendors discovered that some plastics generated alpha particles, which also flipped bits. After a year or two the problems were resolved.

With the dawn of the PC age, DRAM demands skyrocketed. The parts were much less perfect than those we use today, so all PCs used 9-bit wide memory, the ninth bit being devoted to parity. Today’s reliable parts have mostly eliminated this—when was the last time you saw a parity error on a PC?

In the embedded world, we design products that might be used for decades. Yet some components won’t last that long. Many vendors only guarantee that their EPROMs, EEROMs, and flash devices will retain programmed data for ten years (some, like AMD, offer parts that remember much longer). After a decade, the program might flit away like a puff of smoke. Many vendors don’t specify a data retention time, leaving designers to speculate when the firmware will become even less than the ghost in the machine.

We’ve had some issues with hardware reliability already, but have mostly dealt with the problem by ignoring it. I suspect we’re going to see new kinds of firmware development techniques, designed to manage erratic hardware behavior. One obvious approach is a memory management unit that wraps protected areas around each task. A crash brings just a single task down, though what action the software takes at that point becomes problematic. Perhaps future reliable systems will need a failure mode analysis that includes recovery from such crashes. Firmware complexity will soar. esp

Jack G. Ganssle is a lecturer and consultant on embedded development issues. He conducts seminars on embedded systems and helps companies with their embedded challenges. Contact him at [email protected]

Embedded Systems Programing February 2002


MEMORY TECHNOLOGY

As SRAM geometries shrink, soft errors emerge as problem

BY JEANNE GRAHAM

The frequency of error in SRAM devices in intensifying as memory-IC vendors employ smaller geometries and continue moving their products into the networking equipment market.

A problem familiar to DRAM makes in the 1970s - largely resolved with packaging techniques - soft error rates (SERs) are becoming more visible and troublesome for the $4 billion SRAM industry. A soft error is a missed operation, upset due to an alpha particle or cosmic ray. A memory misfire in a PC or cell phone, applications responsible for about half last year's SRAM sales, often goes unnoticed by a user.

However, such a glitch in networking equipment, which accounted for about 24% of the 2001 SHAM market, can send information packets, such as money transfers, to the wrong address. As a result, error-correction and system reliability mechanisms need to become an automatic consideration of product architects, according to analysts and memory vendors.

Soft errors are a natural phenomenon. A burst of energy, caused by the collision of two atoms, follows a certain path until it dissipates, said Bob Merritt, an analyst at Semico Research Corp. in Redwood City, Calif. “If that trail happens to appear at the moment and space in time where the semiconductor is trying to measure what is being stored in that cell, the energy can cause the circuitry to read or write the wrong information,” Merritt said.

More errors

The transition to 0.13-micron process technology is exacerbating SHAM SERs. “Soft errors are creeping in as the lithography gets smaller,” Merritt said. "As the space gets smaller, it’s easier to disrupt the process and to corrupt the data.”

Other SHAM trends contributing to increased SERs include lower voltages, which reduce capacitance and increase a memory cell’s susceptibility to alpha particles and cosmic rays; faster clock speeds, which give particles more opportunity to disrupt a read or write command; and higher densities, for which designers may fail to include adequate error correction or bit parity

Although SRAM soft errors are not an everyday event, they are another variable systems designers should consider in new products along with target applications, price, voltage, clock speed, and die area, according to industry observers.

Soft errors are random and don’t damage or destroy the chip, said Russ Lange, chief technology officer and a fellow at IBM Microelectronics, East Fishkill, N.Y. IBM provides customers with a tutorial and SER model to educate designers and assist them in deciding which tradeoffs (such as the inclusion of error-correction code) best fit their intended application.

“We have to work in detail with each customer to make sure they’re aware of the phenomenon because many are not,” Lange said “It’s surprising how many have not thought through the issue.”

DRAMs based on trench technology are largely immune to soft errors, but that means a tradeoff in access speed, he said. In some applications, using SHAM and building in error-correction code might be a better option. “It means a little bit more cost for a little bit more memory,” Lange said.

The soft error measurement is failure- in-time, or FIT A typical soft error rate is 1,000 FITs and means that one device will fail every 144 years, said Mark Eric-Jones, vice president and general manager of intellectual property at MoSys Inc., San Jose.

“Unfortunately, in 0.13-micron technology we’re seeing some memory technology with error rates of 10,000 or 100,000 FITs per megabit,” Eric-Jones said. “This brings the frequency of error in a single device down to weeks or months,” Eric-Jones said.

MoSys solution

MoSys has developed an error-correction technology that alleviates soft errors in its single-transistor SHAM without requiring more silicon area, he said. With the technology, dubbed transparent error correction, an error rate of 1,000 FITs per Mbit can be maintained in 0.13-micron linewidths, Eric-Jones said.

Another alternative, offered by Fujitsu Ltd. and Toshiba Corp., is fast- cycle RAM, which offers low latency and low failure rates, according to the two companies.

Cypress Semiconductor Corp. has attacked SRAM soft errors by incorporating improvements at the fabrication process, package, and die levels, according to Sabbas Daniel, senior director of quality in San Jose.

The company performs accelerated SER measurements prior to releasing products by placing radioactive sources in close proximity to the die, which accelerates the failure rates that would be observed under normal conditions, Daniel said. Cypress shares that data, on a device-by-device basis, with its customers. The company’s goal for each product is a FIT of 200, according to Daniel.

“Depending on the application and the criticality of the application, the customer may choose to employ an error-correction code, if they feel the rate is too high,” he said.

FITs can also change depending on the altitude, IBM’s Lange noted. For example, IBM has discovered that SRAM tested at 10,000 feet above sea level will record SERS that are 14 times the rate tested at sea level, due to higher exposure to cosmic rays.

Although alpha particles can be controlled with improved packaging, such as a plastic coating over the die, cosmic rays can never be completely blocked out, according to Lange. “It’s more or less a fact of life,” he said. ()

EBN January 18, 2002 www.ebnonline.com

Hosted by www.Geocities.ws

1