Facts and rumors about failing Nvidia chips have been spewing from all sides for months now. What's AMD's take on the issue, and why aren't we seeing similar failures from its products? We recently had a chat with Neil McLellan, AMD's director of packaging and interconnect technologies, who offered his insight and opinions about these matters.
To understand where AMD is coming from, one must go back a few years to the former ATI. Prompted by problems with packaging and interconnect materials in consoles as well as the European Union's Restriction of Hazardous Substances (RoHS) directive, ATI hired McLellan and went about rethinking its chip packaging strategy. In 2005, the RoHS directive required GPU packages to start connecting to their host boards with lead-free solder balls. ATI also took that opportunity to replace the high-lead solder bumps with so-called eutectic bumps. As you'll see in the diagram below, those solder bumps connect the silicon GPU die to the rest of the package:
Why the change? High-lead bumps use 90% lead and 10% tin, while eutectic bumps switch that ratio to 37% lead and 63% tin. High-lead bumps can handle more current, but AMD thinks they're more prone to fatigue and need "comprehensive reliability engineering to be used successfully." To illustrate the fatigue issue, McLellan evoked a soda can: the tab will probably stay on if you bend it up and down slightly a hundred times, but it'll likely pop off if you bend it all the way two or three times. Similarly, high-lead bumps can fail because of repetitive heating and cooling. That's because the silicon GPU die and package substrate (see the diagram above) have different thermal expansion coefficients—2 parts per million/°C for the silicon and 30 ppm/°C for the substrate, McLellan said—which puts a significant stress on the bumps.
Eutectic bumps are easier to work with in AMD's view, but they have their downsides, too. They have lower tolerance for high current densities than their high-lead counterparts, so bumping up the amperage can render them useless by way of electromigration. Because different parts of a chip can have different power requirements, McLellan said a given chip might have mean power delivery of 200mA with some bumps getting 50mA and others receiving 600mA. To avoid stressing outliers excessively, AMD's engineers apply a redistribution layer—essentially a thick metal layer—between the bumps and the die in order to even out power delivery.
Keeping control over chip packaging is easier said than done, though. McLellan noted that both AMD and Nvidia rely on a number of third-party firms (like SPIL and ASE) to do the dirty work of packaging chips, and different firms can use different processes and materials. He went on to suggest that those companies didn't mind following AMD's guidelines for material usage and package design, but they declined to take the fall if any problems occurred. In essence, AMD could be on its own if it runs into packaging problems—although McLellan said that hasn't happened with the new packaging design so far.
On the upside, AMD says using eutectic bumps makes chips cheaper to produce, and they also increase yields. AMD states plainly in a related presentation, "There is no financial reason not to make the move to a more reliable package."
What about Nvidia? McLellan was a little vague in his criticism of AMD's rival, talking down the company for not paying closer attention to packaging and (allegedly) not caring a whole lot. However, he believes Nvidia's mobile graphics parts are failing because they use high-lead bumps and are running into the soda-can problem. This problem has shown up in notebooks because those systems get turned on and off a lot, but McLellan said plainly that folks who power-cycle Nvidia-powered desktops regularly should start seeing the same issues eventually.
To complicate things further, AMD says the RoHS directive will start requiring chipmakers to remove lead from both solder balls and solder bumps in 2010—and some of AMD's customers are requesting the change sooner than that. McLellan said that switch will introduce "an entirely new problem, which turns out to be quite challenging," although he didn't get into specifics. He did, however, mention that AMD has been working on the issue for the past 18 months, has some "great ideas" and has "done a lot of work." In his view, Nvidia has likely been spending the same time trying to fix problems in current package designs.
Of course, this is all a little one-sided. We've been trying to get Nvidia to comment on AMD's little spiel for well over a week now. While the company seemed willing, we still haven't received a statement. Stay tuned.