Chip failures: Nvidia responds at last

After almost two weeks of wrangling with its legal department, Nvidia has finally answered some of our questions about its GPU and chipset failure problems. First, the company sent us a statement addressing the allegations of AMD’s Packaging and Interconnect Director, Neil McLellan. As we wrote earlier this week, McLellan claims AMD has a superior chip package design because it uses a specific set of materials, including eutectic solder bumps. In his view, Nvidia chips have failed because they use (supposedly) more fatigue-prone high-lead bumps and because, he asserted, Nvidia cares less about packaging technologies.

Here’s Nvidia’s written response:

NVIDIA has a world class chip operations team, and has delivered over 1 Billion devices (and over 1 trillion bumps) over 14 years, in the most advanced processes, to the most demanding customers. NVIDIA is the leader in the graphics industry in innovation and has delivered technology over the years that companies like ATI and Intel have benefited from.
In his recent commentary on chip packaging, Mr. McLellan makes a number of speculative assertions about NVIDIA’s people, products and philosophy. In his interview McLellan asserts that High lead bumps are more prone to fatigue. What he fails to note is that AMD currently uses High lead bumps on their CPU line — a device well known to undergo high thermal stress, and also go through lots of power cycling.

The choice between High Lead and Eutectic is complex. There are trade-offs in using one vs. the other, as even Mr. McLellan points out. The electromigration issues associated with printed eutectic bumps can affect long term reliability of a high current device. Electromigration is when a high current causes metal to separate over time, and creates an open circuit. This is one of the main reasons why so many devices are still manufactured with High lead bumps today.

In fact, 10s of billions of semiconductor devices have been shipped with High Lead bumps by world class companies including AMD, Intel, IBM, Motorola and TI.

While McLellan implies that AMD is unique in its use of a “power redistribution layer,” this isn’t true. In fact use of a power redistribution layer is industry practice, and has been used in every flip chip GPU NVIDIA has shipped.

NVIDIA is committed to delivering lead free devices by the 2010 requirement. Our engineering team has been working on this important initiative for the past 18 months, and is fully engaged in this effort with our manufacturing partners.

NVIDIA uses industry standard packaging material and we have passed all industry standard (JEDEC) component package qualifications. We stand behind our products and we will continue to work with our partners to ensure the best visual experience.

After looking over the statement, we got on the phone with GeForce General Manager Ujesh Desai and GeForce Senior VP Jeff Fisher to ask some follow-up questions. What exactly causes Nvidia chips to fail in the first place? Can the same failures occur in desktops, and is that what we’re seeing in the HP systems we talked about earlier this week? What of the GeForce 9400M “motherboard GPU” in Apple’s new notebooks?

Predictably, we couldn’t get a more concrete answer on the specifics of the failures. Desai was, however, willing to point out that the failures only affect a “small percentage” of notebooks, and the problems depend on a combination of factors, so “you can’t just . . . point back to our chip.”

Fisher was notably chattier on the topic of potential desktop failures:

There is no evidence that this issue exists in desktops as we know them. And in fact, Mr. McLellan has no evidence to even imply that. The fact is that lead bumps—he’s saying that lead bumps will fail, and therefore you should expect to see failures on everything, and that’s completely out of balance from an educated operations guy like he is. . . . I think most industry people would say lead bumps are not a cause of failure and are in fact very reliable. And his soda-can analogy and attempt to drag in desktops is irresponsible from our view and a huge reach.

About the failing HP systems, Desai specified, “It’s not 38 different systems, it’s actually a single design, and the model numbers that were reported . . . are actually model numbers that refer to different configurations of the same product.” He went on to say Nvidia is “working closely with HP to determine if or how the Nvidia chips are even involved in the failures.”

So, looking forward, what can we expect from the GeForce 9400M chipset in Apple’s new MacBooks? Fisher stated plainly, “You can rest assured that Apple has been aware of all of the science that we’ve developed around this issue and would not be launching the most important product in their history with a product they felt was at risk.” Desai later added that Nvidia is “taking the necessary steps to ensure that all the Nvidia chips currently in production don’t exhibit this problem.”

Comments closed
    • temper
    • 11 years ago

    I had the misfortune to purchase 2 HP pavillion computers in January and April 2007. The first was a dv9200, with an AMD chip, costing me £990. The second was a dv9299ea, with an Intel chip, costing me £1649.99.
    The first has had 2 motherboard replacements while within the first year of warranty. It has failed again, and due to having the AMD chip, is covered by an extended warranty. I can send it back to HP to get it repaired again – but what for?! It will only fail again!
    The second PC has completely failed about 6 weeks ago. I can’t even turn it on. The fault is the same as with the first – GPU failure. This computer is not covered by the extended warranty as it has the Intel chip.
    HP will do nothing, so I have sought legal address, and been advised to claim for a full refund from the retailers as the computers do not meet the legal requirements of the Sale of Goods act (1979) in UK. The poor retailer will have to chase HP for their reimbursement – but at least I will legally be able to get my money back.
    I will not be buying HP ever again. They have dealt with this matter atrociously, refusing to acknowledge a fault despite literally hundreds within their own forum complaining of identical issues.
    And their fixes for the few ‘lucky’ ones that manage to get the extended warranty are doomed to fail again. I have researched and found that this is almost guaranteed to happen.
    So my recommendation would be – avoid buying from HP. They may have the capacity to develop and sell new technology, but they seem unable to support it. And they do not respect the legal rights of consumers, for which they will unfortunately have to pay for – either in loss of customer loyalty, financial renumeration, or both.

    • wilmeland
    • 11 years ago

    I have a later model slimline (not covered in the HP list) with 6150SE graphics. 3 questions on increasing longevity. HP includes a worthless proprietary bay for their media drive – would mounting a small case fan in this space so air is blown out the vent holes reduce likelihood of failure? Would adding video card in the slimline PCI slot and turning off onboard video help?(This would likely increase internal temp in this little case)
    I regularly turn this computer off since it’s connected to audio components and I rarely use more than 3 times a week for few hours at a time. Would leaving powered up in stand by or hibernate increase or decrease the longevity?

      • Rob01001
      • 10 years ago

      That’s an interesting question. I have repaired hundreds of these units, and there is no one magic secret to keeping them operational. I’ve experimented with adding an extra fan into the hard drive Bay I, using a custom made cable to route the hard drive to bay II so that it could boot. In that experiment, the system still failed.
      I have finally concluded that the only way to ensure that you are going to get a quality repair is to remove the chip, reball the chip with a higher quality solder (I don’t completely agree with lead free though – but I don’t like Nvidia’s choice either). I have invested thousands in specialized equipment so that I can provide reballing services. It’s the only way to go here, unfortunately.
      In some extreme cases, the fused die itself is the cuplrit – in which case the video chip must be replaced. Not common but it happens.
      But even after you find a skilled enough tech to reball your chip, other things should be done to manage the heat issue. Quality solder paste. Bios update. Use Ntune to underclock a little bit – it’s not gonna kill you.

    • sdack
    • 11 years ago

    Summary: Nvidia now is just a humble producer of GPUs. How other manufacturers ended up developing hardware around their GPUs is a complete mystery!!!

    Well, I am impressed. Sounds like they have out-sourced their excuse making to India.

    Tese chibz ah not fauldy!! Oh, no. Tey ah meand do fail when dey fail. And when dey work dey do tis for a greader cause and dere is no reason tobee worried! It is our way at Nweeda Corborashion of broduzing combuter chipz.

    • GrooWanderer
    • 11 years ago

    I never said it was underfill only, I said it was underfill, plus poor choices in bump and pad material, plus poor choices in packaging, plus poor choices in cooling, plus poor bump layouts, plus……. I did look into this for 2 months, talked to countless packaging people, and did a lot of reading.

    That all said, I am not a packaging engineer, but nothing I have claimed has been directly disputed by NV, and if there are any factual errors, they don’t hesitate to snipe from the gutter. This time, they didn’t. Again, not conclusive proof, but pretty strong anecdotal evidence.

    I think it was you that mentioned the C51m problems/report. There is an official list of bad parts, and that is on it. I am also sworn not to print it, so I won’t, but your reasoning for failures, PCIX, don’t really apply to the other parts listed, they are not MCPs.

    Also, Apple has come out and said they directly had failures due to defective NV parts, and the parts that Apple is claiming (either G84 or G86 – I am in Tokyo after a 20 hour flight (so far) – I don’t have my notes) wasn’t on the list. Nor is the G92 which NV steadfastly denies is defective, but Apple directly states that it is in Radar. This many failures across this many part lines and processes says that it is more than a single bad engineering snafu.

    The parts I claimed were bad early on have all (I think, the vast vast majority anyway) come up in warranty claims from HP, Dell and Apple. NV denies it, OEMs directly say they are bad, and are spending a lot of money to fix customer issues. Are they doing this out of their own good will, or do you think there is really a problem?

    Call me crazy, but after following it really closely, one side seems to be habitually unable to come clean. I will leave it as an exercise to the reader to figure out which one that is.

    -Charlie

      • GrooWanderer
      • 11 years ago

      This was meant to be a reply to 99. Sorry about that, way too tired now, and only 4 hours to the next flight. 🙁

      -Charlie

      • lolento
      • 11 years ago

      Hi Charlie,

      I have followed some of your posts but I don’t know where you get your information from. I assumed you know what your talking about. (But I assumed that you the list you got is a “line stoppage” order on Nvidia’s subcons)

      I know Nvidia is accepting RMAs for GPUs and MCPs. I don’t know why this is the case. My speculation is that Nvidia is building data (evidence) to charge back its subcon on the failures (this is MY speculation).

      I have read the FA reports on this whole thing; I don’t work for Nvidia but some of my colleagues forwarded it to me before this thing blew up to such a big thing. This issue was closed back in March and C51 is the culprit. There is no more engineering effort for further failure analysis on GPUs or this whole issue. Only things that are ongoing are manufacturing sustaining work and reliability improvement type efforts such as consolidating BOM across product lines which you attributed to more failures on more GPUs in some of your articles.

      Let me say that (in my opinion) you have been phishing for a feedback from Nvidia on most of your articles. They are very off-base.

        • Meadows
        • 11 years ago

        For all you know he could be professor Fud “nVidia” Demerjian.

          • lolento
          • 11 years ago

          Yea, actually, I only came in here to comment that Neil Mclellan is a big hack and people should not hold too much onto his comment about Nvidia. In fact, they are ridiculous.

          But I encourage people to read the data and draw their own conclusions instead of letting other people draw conclusions for them (which is what Charlie is promoting).

          (I have my own agenda too because I think Neil is a big faker but by all means people, do your own research and don’t let others tell you what to think.)

          • GrooWanderer
          • 11 years ago

          I am, didn’t I make that clear below?

          -Charlie

          • Silus
          • 11 years ago

          He must really be Charlie from the INQ. No one else would take responsibility for the crap he writes.

        • GrooWanderer
        • 11 years ago

        “I have read the FA reports on this whole thing; I don’t work for Nvidia but some of my colleagues forwarded it to me before this thing blew up to such a big thing. This issue was closed back in March and C51 is the culprit. There is no more engineering effort for further failure analysis on GPUs or this whole issue. Only things that are ongoing are manufacturing sustaining work and reliability improvement type efforts such as consolidating BOM across product lines which you attributed to more failures on more GPUs in some of your articles.”

        You are wrong here, NV is admitting to bump cracking on more lines than that, but I am not allowed to say which ones. You may have read _A_ failure report for the C51M, but that is far from the only bad part.

        As for consolidation, can you give me one good reason you would change material types on a part that is 2 months old? Do you know how much this costs? How much it costs the OEMs? Consolidating material sets is all good, and generally a wise thing to do, but not in the middle of a run of parts, and REALLY not during the early life and near EOL. It doesn’t make financial sense.

        -Charlie

          • lolento
          • 11 years ago

          Hi Charlie,

          The failure analysis report that I read covers all of the abnormal rates of failures that HP reported (INCLUDING THE DV6k and DV9k FAILURES THAT SEEMED TO BE GPU RELATED). Subsequently, I know Dell and others (including Apple) also reported failures. I don’t know what those are about but I suspect the failure rates were not abnormal.

          Contrary to your post which suggested that qualifying a new packaging material cost a lot of investment. Here is what I can tell you from my experience. The initial qualification of a new IC design is costly due to new test fixtures, ATE software and may be new test equipment validation. Subsequent quals (specifically for material change) is covered in JEDEC JESD47 which calls out for 3 lots of 77 samples for statistical relevance in reliability study; this is ALL Nvidia has to cover (for high volume chip makers, shipments for a mature product can be 10k per day). The subcons will foot the bill for test chamber times which is NOTHING compared to the business Nvidia brings. This is what I KNOW not what I speculate unlike you who posts what you speculate like it is what you know.

          I don’t mean this to be disrespectful or as a personal attack on you but as a reporter or blogger, I find the way you post things to be somewhat irresponsible and unprofessional.

    • Fighterpilot
    • 11 years ago

    Hey Charlie D from the Inquirer?
    Cool…welcome dude 🙂

      • PRIME1
      • 11 years ago

      Yes much in the same way as one would welcome intestinal cramps or a groin injury.

        • Jigar
        • 11 years ago

        He can out number thousands like you … 😀

    • Silus
    • 11 years ago

    Damage or whomever from TR it may concern:

    Why is #75 still in this board again ?

    It’s obvious that this “guy” doesn’t know how to discuss something in a civilized manner. I can take ironic comments, but I just won’t go through the personal attacks thing.

    • murtle
    • 11 years ago

    End users to the AMD, IBM, INTEL and VIA :

    We just buried nvidia.

    whooop whooop whooooop ….

    rofl

    • tesla120
    • 11 years ago

    where did my comment go?

    • lolento
    • 11 years ago

    Sorry, I’m away from the office so I don’t have access to IEEE database. But I encourage those who does have access please do the research yourself. Don’t take my word or anybody’s…

    Here is something I found via google:
    §[<http://people.ccmr.cornell.edu/~cober/mse542/page2/files/Cornell%202006%20Interconnects.pdf<]§ Go to page 98. I'm pretty surprised that I'm the only person who frequent this site with some understanding of materials. I'm speaking from personal experience, I do not work for neither Nvidia nor AMD but I worked for Neil Mclellan before. He is a fake. It does demoralize you to see people in un-deserving positions. W is one, Neil is another.

      • lolento
      • 11 years ago

      I just like to add that I know fanboism is in full effect here and I do have my own sentiments against Neil Mclellan.

      However, everyone please go with the facts and don’t look at anything one-sidedly. Don’t take somebody’s word just because he’s a director at someplace…

      I have had access to the FA report from Nvidia on this issue way back in March of this year. The failures are design related and NOT material related.

      Nvidia does not do their own manufacturing. They use TSMC, UMC, ASE and SPIL to fab their chips. So does AMD, ATi, Altera, Xillinx….all high volume flip chip IC vendors. If the failures are materials related, the failures won’t be limited to Nvidia only.

      • ludi
      • 11 years ago

      I’ve done a little bit of chip- and die-level RE in a previous life and can generally follow what is being presented, but here’s the problem: I’m not seeing where any of McLellan’s general statements about flip-chip assembly from the previous TR write-up are contradicted by the presentation you have linked here. Nor is it clear to me that he is wildly off-base in his claims about the advantages and disadvantages of high-lead solder balls versus eutectic solder structures, since he didn’t get specific about whether AMD was using a full-melt process or some sort of hybridized method that gradually moves toward RoHS compliance.

      In fact, the only thing that genuinely jumps out at me on that page 98 regarding hybrid solder design are the bullet points “more difficult card assembly” and “prone [to] handling damage”.

      You say you’ve seen Nvidia’s FA report and your story is credible so far, but if that’s the case, what /[

        • lolento
        • 11 years ago

        Neil Mclellan’s entire statement on why Nvidia’s gpu failed and AMD didn’t is based on his assessment that Eutectic Sn/Pb solder has better thermal fatigue life than High Pb solder. This is RIDICULOUS!!!

        This is entirely countering any basic knowledge of solder reliability. It is like going to a Physics class and saying F is not equal to MA…. High Pb solder is more visco-elastic or ductile than eutectic Sn/Pb solder, it has better thermal fatigue life. This should be basic knowledge for anyone who deals with solder!

        Please read page 98 on that presentation. When I get back to the office on Monday, I’ll dig up some comprehensive studies on SnPb solder alloys.

        • lolento
        • 11 years ago

        As for the root cause of the failure, I actually mentioned it when this whole thing happened many months ago. It is the C51 chip failing on the PCIX bus.

        Firstly, look at the symptoms of the failures that HP reported initially which are graphics related failures AND (this is important) wireless related failures. These components are independent of each other except for being controlled by the PCIX interconnect.

        Then, look at the temporary fix that HP released. It is a bios fix that kicks in the GPU fan early. Does this seem like it will solve the wireless problem? NO. And btw, this fix does not and did not fix the problem because even if the GPU fan ran 100% of the time, it still cannot cool the chipset.

        The root cause of the failure is because the C51 chip is not covered by the closed loop thermal management in the bios and it is not design for such. The notebooks that are designed around the C51 chip does not have active cooling on this C51 and the bios does not care what temperature it operates at. This chip overheats and fails; the actual failure mode IS solder bump crack but the failures are PIN SPECIFIC to the PCIX bus, thus Nvidia cannot conclude that this is material related faliure. (Thus, the said symptoms I mentioned earlier.)

        C51 chips are also use on desktops. But there is enough air flow on desktop to overcome the heat issue on this chip. (Even if there is no active cooling on desktops, the heated air has room to move away from the C51)

        If Nvidia ever comes clean, this will be exactly what you’ll find out.

          • ludi
          • 11 years ago

          Thanks.

    • YeuEmMaiMai
    • 11 years ago

    I’d guess the man from ATI is right

    1. statement out how many devices made
    2. Attack on the person whom said in the first place that “this is our take on it”
    3. Failure to come clean about what really happend

    • WaltC
    • 11 years ago

    nVidia can’t be specific about who is to blame here when nVidia knows it is to blame…;) It’s really that simple. When you know you’ve goofed up the last thing you’ll do is to admit the problem was your fault to begin with since that just gives all of the percolating lawsuits a concrete target…;) It’s in nVidia’s advantage to keep things vague and confused at this point. The problem with not owning up to a specific problem, of course, is that it then becomes impossible to make assurances that the problem is solved–since you’ve already done handstands trying to convince the world that you “aren’t sure” what caused the problem in the first place.

    • 5150
    • 11 years ago

    Typical big business marketspeak. Call me when they fess up, until then, I won’t give them a dime.

    • Price0331
    • 11 years ago

    I could probably summarize the whole statement as made by the nVidia representative in two words.

    NO U

    • Krogoth
    • 11 years ago

    Geez, when will anybody have the balls to admit that something went wrong?

    Crazy, chair-throwing Steve Ballmer managed to admit that Vista did not meet shareholder goals and is hardly getting adopted and deployed in the business sector.

    Why Nvidia cannot openly admit that there were inherent problems with new materials and their chip design? It is not like the it is the first-time that they try to bury some widely-known hardware-related problem. *cough* broken Purevideo on NV40 *cough* broken hardware firewall on NF3/NF4 *cough.

    • deinabog
    • 11 years ago

    I’m sure there’s a bit of truth on all sides regarding this issue but AMD needs to tread carefully here. If some of their parts fail then what would be their explanation since they use “superior” solder bumps?

    Never throw stones at glass houses if you live in one too.

      • BiffStroganoffsky
      • 11 years ago

      All my windows are double-paned tempered glass…neener, neener…

    • FubbHead
    • 11 years ago

    I’m willing to believe Nvidia on this. Because, companies like HP and Dell really do put the cheapest rubbish in their computers.

    • srg86
    • 11 years ago

    nVidia’s response sounds like more marketing rubbish to me.

      • Meadows
      • 11 years ago

      Both AMD and nVidia are nothing but marketing rubbish being thrown at each other, really.

    • kuwaiki
    • 11 years ago

    Personally, my experience is that 3 nVidia cards that I owned, in 2-3 years time they all died. First, sometimes you turn on the pc and the image is corrupted, reboot, and ok again. Then the problem get worse and you have to reboot 3-4 times till you get the image ok again. And it keeps getting worse until in few months it does not ever boot again. The same pattern repeated in these 3 cards: TNT2, 5900FX & 6800GT. Bad luck? Coincidence?

      • ludi
      • 11 years ago

      Were the fans still working when they died?

    • bogbox
    • 11 years ago

    Neil McLellan is AMD’s Packaging and Interconnect Director, so is not a stupid man. He knows something about packaging. He is probablly right about something . Every tehnological step has it’s plus and minus , everyone knows that. And he was MK for Ati . What’s new ?
    Nvidia did something wrong ,this is clear.
    Whend 3 companies like HP,Dell and Apple said Nvidia is guilty, I think Nvidia did it.
    About nvidia’s answer it’s just PR …
    If someone lied and tries to cover it up is Nvidia.

      • lolento
      • 11 years ago

      No one here is disputing whether Nvidia is a fault for the gpu failure issue.

      I think for clarity sake, I just want to repeat again that the claims that Neil made about Eutectic Sn/Pb solder is more reliable than High-Pb is completely countering all of the industry data that is available (specifically in flip chip on laminate configuration) for the last 20 years.

      The gpu failures have nothing to do with the solder bump alloy at all. High-Pb solder is MUCH MUCH more reliable in thermal fatigue and electromigration than eutectic Sn/Pb solder; this is established in the industry for years. The only reason to use eutectic Sn/Pb solder is cost reduction (or to a minor extend, there had been some noise about the flux used in high-Pb process is incompatible with some models of Pi passivation).

      I do not know why Neil made the comments as he did. I can only suspect that because both AMD and Nv use the same Taiwanese subcons, he is trying to fan the fire away from his yard. But his explanation of the gpu failures is completely ridiculous.

        • bogbox
        • 11 years ago

        No offense , but I better trust a director of Amd rather than you.
        I don’t know how are you , what do you know, etc. , maybe you are working for Nvidia or something.

          • Silus
          • 11 years ago

          Then why don’t you just read the academic papers that discuss this ? It’s better to be informed, than to believe someone that’s just trying to make a competitor look bad, don’t you think ?

            • flip-mode
            • 11 years ago

            lolento doesn’t provide any sources for the statements he made, and since lolento is making the claim, the burden is on him. Other people are merely saying as much.

            • SPOOFE
            • 11 years ago

            Actually, it was Bogbox that committed an Appeal To Authority fallacy. Just because McClellan has a high-paying job, that doesn’t mean he’s right, or even telling the truth. Frankly, we should be treating him more like a politician than an engineer.

      • Silus
      • 11 years ago

      Uh? I hope you mean it’s PR from both sides. What the AMD guy said is pure BS, which is meant to just hurt NVIDIA, while their down. Nothing more, nothing less. It’s actually sad to see this. I don’t remember Intel or NVIDIA throwind sand at AMD’s eyes, at the time of the TLB bug for example.

      But again, this is PR from both sides. Anyone that fails to see it, has chosen a color…

        • bogbox
        • 11 years ago

        The Amd guy really explain something not just PR like Nvidia.
        Showed a picture , give a reason , I really understand something .
        Plus if Amd haven’t commented neither Nvidia .
        I’m not a ATI fan.

        • Corrado
        • 11 years ago

        I also don’t remember AMD kicking sand when Intel recalled P3 1.13ghz, had bad i820 chipsets, or had the Pentium Floating Point bug. Theres a REAL PROBLEM. The TLB Errata didn’t make people’s chips burn up 8 months after they purchased them. They also acknowledged the issue and worked on a BIOS bandaid. nVidia is sticking their head in the sand. Dell and HP and Apple are the ones extending warranties, and issuing BIOS fixes, not nVidia.

          • Silus
          • 11 years ago

          For the last time, NVIDIA did admit to the problems. They were the first ones to do it. I already mentioned this in an earlier post. The problem with mobile chips is taken care of, in the sense that proper measures have been taken to assure that the people that really have their laptops affected by the mobile GPU failures, will see it fixed / repaired.

          As for AMD, don’t make me laugh. They downplayed the TLB bug as if it never occurred. But that’s normal. No company likes to admit that their product may fail or perform poorly under certain circumstances.

          The point remains. Both companies are just spewing PR speak at each other, but AMD made very boldish claims, which they cannot back up, especially since there’s absolutely no proof of desktop failures.

            • flip-mode
            • 11 years ago

            AMD offered a response to questions from Techreport. This is not a case of AMD hiring an ad agency to go spread FUD about Nvidia’s soldier bumps, which is what you are making it sound like. AMD never said anything definitive about Nvidia’s stuff either, but more or less gave a best guess.

            • Silus
            • 11 years ago

            Of course they did, but after OEMs started recalling Phenoms with the bug.

            Also, does this sound a “best guess” ?

            “What about Nvidia? McLellan was a little vague in his criticism of AMD’s rival, talking down the company for not paying closer attention to packaging and (allegedly) not caring a whole lot. However, he believes Nvidia’s mobile graphics parts are failing because they use high-lead bumps and are running into the soda-can problem. This problem has shown up in notebooks because those systems get turned on and off a lot, but McLellan said plainly that folks who power-cycle Nvidia-powered desktops regularly should start seeing the same issues eventually.”

            That’s PR speak. No more, no less and none is better than the other.

            • flip-mode
            • 11 years ago

            Dude? That is just an opinion. And it doesn’t sound like PR speak. Have you heard PR speak? He rendered an opinion. Jen Hsun or whatever his name is gets way more flamboyant than that.

            • Silus
            • 11 years ago

            An “opinion” that specifically says “Nvidia-powered desktops regularly should start seeing the same issues eventually”. If anything, he’s making assumptions and I’m sure even you know how foolish that is.

            • GrooWanderer
            • 11 years ago

            “An “opinion” that specifically says “Nvidia-powered desktops regularly should start seeing the same issues eventually”. If anything, he’s making assumptions and I’m sure even you know how foolish that is.”

            Are you always this ignorant, or are you just clueless this time and feel the need to shout about it?

            The problem is caused by thermal cycling, and laptops have a lot more thermal cycles in daily use than desktops. If you do the math, and from your comments, multiplication was a tough 3 years of school for you, desktops will see the problems in a bit.

            In fact, Apple said it is already happening to desktop G92, just not publicly.

            -Charlie

            • YeuEmMaiMai
            • 11 years ago

            wow u are full of it lol…………..Nvidia didn’t admitt squat until AFTER HP and others stated a problem

          • ludi
          • 11 years ago

          ROFLOL! Jerry Sanders III was still CEO during the P3 1.13GHz recall. Sand kicking was part of his weekly agenda, so maybe his opinions on Intel’s recall got lost in the broader noise. At least McLellan’s comments shed some technical light on the issue.

          Also, the PIII-1.13 recall was straightforward: Intel overvolted and overclocked a core beyond its reliable tolerances. Everybody knew it. In this case, my guess is that neither AMD nor anyone else outside Nivida has a clear why these Nvidia chips are failing — and wants to, so they can avoid the exact manufacturing procedure and/or botchy subcontractor that started this mess.

          McLellan’s comments were meant to prod Nvidia into saying something, either directly or by accident, that might provide clues as to the root cause. Looks like all he got was a fluffy PR release, although since Nvidia specifically took time to comment on the electromigration issue for no apparent reason, I can’t help but wonder if the RoHS explanation is on the money and the chips that are failing really were, in fact, an early attempt to change the bump composition — and possibly one that Nvidia didn’t explicitly design and test, or perhaps even authorize.

            • L1veSkull
            • 11 years ago

            Listen this guy, maybe you all will actually learn something from this fanboy troll fest.

    • GTVic
    • 11 years ago

    In the first article there are quite a few instances where the author does not quote Mr. McLellan so we have no idea what he actually said, only a brief summary that could very well be misleading.

    Then in the second article, it appears that Mr. Fisher is responding to quoted comments from Mr. McLellan that we have never seen. Show me where Mr. McLellan implied that AMD’s use of a power redistribution layer was unique?

    The only conclusions I can draw are that the Nvidia spokesman is spewing BS, or the author is being deceptive and playing the companies off on each other or these are just examples of badly written articles. My guess is all three.

    • HurgyMcGurgyGurg
    • 11 years ago

    What I still don’t know is what exactly are the symptoms your computer displays as the GPU starts to fail.

    Is it like the HP warranty said and your system won’t display or boot.

    Or is it a gradual thing where you start having instability such as “Your graphics device stop responding Windows must now restart” or games begin to crash kind of thing?

    I own two laptops with nVidia integrated graphics that could fail and am really starting to get worried about their lifetime. Any advice about prolonging it?

    So far it seems the general guideline is lower temps and don’t turn it on and off to many times?

      • randomly
      • 11 years ago

      Both a friend of mines and my laptop have had the Nvidia Go7200 graphics chip fail after a year and a half on HP laptops (DV2100). Both failures were sudden, failure to boot, black screen, boot warning beeps indicating Video failure. I didn’t notice any warning signs. HP refused to cover it because it was outside the 1 yr warranty. I was not impressed with HP support. I’ll avoid HP in the future.

      Probably best to avoid anything with Nvidia in it for at least a year till the defective chips get flushed out of the supply chain. It seems likely that any laptop in the last couple years with Nvidia GPUs will have abbreviated life spans. Better back your stuff up.

      • BiffStroganoffsky
      • 11 years ago

      If you game on your laptop, the symptoms may show up earlier but my experience with the failure was the garbled BIOS post screen. It looked a bit like the Matrix screen saver with all the bleeding columns of flotsam. That always precluded the beginnings of an unstable desktop that may lock up or reboot without any user input.

      The user actually used it as a true ‘laptop’ which covered the system fans on the underside of the pooter so I don’t rule out heat as a contributor to the system failure but neither do I rule out poor design. Who puts the main cooling fans on the bottom side of a system that spends most of its time resting on a surface with zero to two or three millimeters of air space?

      • xtalentx
      • 11 years ago

      Depending on what state you live in your friends may still be able to get support for their laptops. In Maine laptops have a 3 year warrenty that is enforced by the state. So if the vendor gives you a hassle and the laptop is less than 3 years old just call the state and they will make the vendor fix or replace the unit.

      Other states have this law too so you might get lucky here.

        • Kurlon
        • 11 years ago

        Er, wha?! I’m from Maine, and this is the first I’ve ever heard of this. Can you direct me to the statute?

    • Damage
    • 11 years ago

    Posts nuked for potty mouth.

    • ssidbroadcast
    • 11 years ago

    Somebody tell me this’ll come up in this weekend’s podcast…

    • tesla120
    • 11 years ago

    “[Apple] would not be launching the most important product in their history with a product they felt was at risk.”

    like when they launched the Mac Book Air, which most Mac Fanboys agree is a hunk of shit?? Mac has screwed up a lot lately, steve Jobs and his anal views on computer appearance has kept Mac from taking more of the market at a faster rate. instead of including things people generally like (mouse buttons under the touchpad, smaller bezels real DVI ports) they find every possible way to get rid of screws and smooth corners.

    tl;dr Mac wouldn’t care if they had bad chips as long as they got their solid aluminum case

      • jdrake
      • 11 years ago

      Don’t speak for the mac fanboys…. as someone who enjoys apple products…. I don’t think the air is a hunk of shit…. it has it’s uses for certain people and applications…

      “Mac has screwed up a lot lately, steve Jobs and his anal views on computer appearance has kept Mac from taking more of the market at a faster rate”

      Are you serious? Apple has almost 17% of the US consumer market share…. they sell one of the most popular back to school laptops on the market (the macbook) and have sparked multi-million $ ads from Microsoft in response to their campaigns (if MS wasn’t worried… they wouldn’t try to fight back like they are). You can argue that they sell products that you don’t like or think are limited…. but to say that they have been lacking in a take over of the market share is really just oblivious.

    • TardOnPC
    • 11 years ago

    I have read things here and there about this. Never payed attention since I build my own desktops, and have no need for a laptop. NVidia has never failed me as well. Seems this issue is with a particular model of HP desktops/laptops? I wouldn’t doubt it is an HP issue though. Their QC is top notch on new products but not so much on anything 6months and older. At least that is how it is in my HP blades department… :/

      • stmok
      • 11 years ago

      Let me get this straight: You haven’t paid much attention to this issue. But that’s OK, as you don’t buy notebooks. Its an HP problem because you have experienced issues with HP servers?

      How does your post make any sense in relation to the topic at hand? Where’s the connection between your experience with HP servers and their notebook range? Are they developed by the same engineers?

      There is a problem with some Nvidia products. Dell, HP and Apple confirm it. Instead of coming clean and rectifying the problem immediately, they’re screwing around with quick fixes (BIOS updates that make fan spin more often) and vague explanations (as mentioned in this article). How hard is it to honestly admit a mistake/screw up nowadays?

      Its such actions that motivate me to avoid any future Nvidia products. (I’m a buyer of their TNT range, all the way to today’s GT2xx series…Not anymore).

      As for AMD’s rep…He should STFU. Its not his problem. Its an Nvidia one. He should focus on his own company.

        • Silus
        • 11 years ago

        They did admit it. They were the first ones to do it, It was then that it was revealed that they would be losing around 200 million, for warranties and partners compensation…

        It seems to me you’re riding what I call “Charlie’s bandwagon”, that made this mess a whole lot bigger, since obviously fanATIcs and AMD loyalists would post his links all over the place.

          • DrDillyBar
          • 11 years ago

          Oi! I take offence to that.. at least partially.

        • TardOnPC
        • 11 years ago

        STMOK

        Well, all of the HP engineers are from here in Houston pretty much. Very likely the laptop guys too. It is just a coincidence that they produce the HP servers in Houston. The Houston site is not just server manufacturing…

        If you read my post carefully, with a clear head, you will find that I did not care about the issue as it was not related to me. Since I saw a mention of my company, decided to make a comment about the QC here at HP. I was not defending NVidia btw, I really can’t since like I said, do not know the issue at hand, I just like to complain about things here at HP. =P

    • shaq_mobile
    • 11 years ago

    I wish klingons were real.

    • Vasilyfav
    • 11 years ago

    Do we even know what percentage of Nvidia chips “failed” , concretely because of THIS problem and not something else?

    • eitje
    • 11 years ago

    q[

      • CampinCarl
      • 11 years ago

      He must be from Ohio, ending so many sentences with prepositions.

        • ludi
        • 11 years ago

        Ending a sentence in a preposition is only grammatically incorrect if the preposition has no object. It’s a good idea to avoid the practice in professional writing, but the notion that sentences should never end in prepositions in conversation or anywhere else is mainly promulgated by pretentious literary authorities and high school English teachers who feel their lifetime experience and payroll has not yet vindicated that master’s degree.

        Or, told in lighter form:

        An undergrad from a midwestern university has managed to land a graduate position at Harvard’s education department, and in his first day on campus has managed to become a bit lost. He finally intercepts another student and asks in a classic midwestern drawl, “Sir, I’m new to Herverd — kin you tell me where the Gutman Librerry is at?”

        The student looks at him sideways and tries to move on without answering. In persistent midwestern fashion, the new graduate moves in front of the student and asks again, “Please — kin you jus’ tell me where the Gutman Librerry is at?”

        The student lets out an exagerated sniff and then responds in a voice that sounds like money, “Suh, aht Hah-vahd, we do /[

      • lolento
      • 11 years ago

      Yea, this is a mis-statement in terms of the technology on-hand that is referred to.

      Actually, ATI and AMD are both pioneers in the flip chip on organic substrate that we are talking about; ATI was the first to use flip chip laminate in the 9700 chip and AMD was the first to use this material set in their first gen athlon chips. Intel and Nvidia did not adopt the technology until the subsequent generation.

      That said, the true pioneers with flip chip on laminate is really Motorola and IBM whom developed the “underfill” material and the C4 process.

    • Corrado
    • 11 years ago

    Sounds like all he did was call McLellan a liar without actually addressing their issue… what a douche nozzle.

      • lolento
      • 11 years ago

      Mclellan really doesn’t know what he is talking about. None of his claims can be backed up by academic or industry data…..they are made up. He should embarrassed because now that his statements are posted everyone in the industry knows what a fool he is.

      Anyway, that still doesn’t justify Nvidia hiding the truth. I agree.

        • Corrado
        • 11 years ago

        I’m not defending nor chastising McLellan. Its irrelevant. The problem is that a lot of the industry has been pointing their fingers at nVidia, but nVidia won’t acknowledge the situation that;s is factually existent. Dell, HP nor Apple would pretty much say ‘this is nVidia’s problem’ without it being known. They took a charge for the notebook chips, but never openly admitted any problems, and this just further puts them into non-answer status. “Why is your stuff failing?” …. “That guy is a liar”. Doesn’t cut it for me. I want admission or denial with facts to back those statements up.

        • clone
        • 11 years ago

        Mclellalan is a fool for commenting on what has become obvious?

        HP, Dell, Apple all commenting openly and privately along with an Nvidia admission and Mclellan is the fool?

          • lolento
          • 11 years ago

          Neil Mclellan made a fool of himself by saying that Eutectic Sn/Pb solder is more reliable in thermal fatigue life than High-Pb.

          Is this more clear?

          Problem with Nvidia’s gpu failures is not related to whether the solder bumps are high-Pb or eutectic Sn/Pb….In fact, eutectic Sn/Pb is LESS reliable considering standalone thermal fatigue or electromigration. Neil is a dumbass for saying such things publicly….gosh, read a few papers or go back to school.

          Nvidia not coming clean on their failures doesn’t help….but Neil’s comment on behalf of AMD really is moronic.

            • clone
            • 11 years ago

            r[

            • GrooWanderer
            • 11 years ago

            Because the substrate (Have mommy say ‘green thing under the die’ when her reading this confuses you) has eutectic pads, you get a better bond with eutectic/eutectic vs eutectic/high lead. You also have more similar melting points meaning better adhesion. This all means you get a better, cheaper and more reliable bond with eutectic bumps.

            Basically, you are a loud twit without a clue, Neil is right. Then again, he is out there making chips, you sit in your basement wishing you had a card that could run halo 2 at speed.

            God, I love trolls. Especially one without a clue.

            -Charlie

            • BiffStroganoffsky
            • 11 years ago

            Did I err…???

            Could you explain mulch to me?

            • ludi
            • 11 years ago

            I think he grabbed the wrong post before replying.

            • GrooWanderer
            • 11 years ago

            I might have. It was meant to reply to #27, and it said 27 in the reply box. Sorry.

            -Charlie

            • ChronoReverse
            • 11 years ago

            ++

            If this is true then lolento should get a clue before flaming.

            • Silus
            • 11 years ago

            eutectic SnPb has a much lower melting point than just Pb

            §[<http://www.seas.ucla.edu/eThinFilm/report.html<]§ And the fact that you signed your post with "Charlie" makes it very funny!

            • GrooWanderer
            • 11 years ago

            “And the fact that you signed your post with “Charlie” makes it very funny!”

            Why? Because I always do on my posts? Because I don’t anonymously snipe? Not meaning to attack, just curious.

            -Charlie

            • green
            • 11 years ago

            l[< Why? Because I always do on my posts? Because I don't anonymously snipe? Not meaning to attack, just curious.<]l have you been following this story long?...

            • RagingDragon
            • 11 years ago

            So if understand correctly, when it comes to packaging:

            1. high Pb bump + high Pb pad = good
            2. eutectic bump + eutectic pad = good
            3. high Pb bump + eutectic pad = bad
            4. eutectic bump + high Pb pad = bad

            When the Nvidia people say high Pb bumps are good, and are used in Intel and AMD CPU’s, they’re refering #1 (and thus are technically correct). However they carefully avoid saying the defective GPU’s use #3, thus implying they’re using the same technique as the CPU’s (which is incorrect, but implied rather than stated directly).

            When Neil McLellan (the AMD guy) said AMD’s eutectic was good and Nvidia’s high Pb was bad, he meant #2 (AMD) vs. #3 (Nvidia), and thus was correct. However his statement did not mention the pad material at all, and thus was not clear: he didn’t specify whether AMD were using #2 or #4, nor whether Nvidia were using #1 or #3. Nvidia took advantage of this lack of clarity, and intentionally mis-interpreted McLellan’s statement to unfairly ridicule him and undermine his credability.

            • lolento
            • 11 years ago

            1. high Pb bump + high Pb pad = good
            2. eutectic bump + eutectic pad = good
            3. high Pb bump + eutectic pad = bad
            4. eutectic bump + high Pb pad = bad

            That’s only half correct. But thanks for thinking it through!!!

            Yes, high Pb bump + high Pb pad is the best situation! But on laminate, this is impossible to do because high Pb (whether 90-10 or 97-3 (IBM patent)) reflows at 325C. Laminate substrates degrade at 280C.

            High Pb bump + eutectic pad is better than full eutectic solder joint. The bulk high Pb can carry more current AND is more ductile than eutectic solder. On top of that, the processing is to reflow the eutectic pad at 183C (or 220C in practice) thus the High Pb bump does not “collapse” (hence C4 process; controlled collapse chip carrier) leaving a bigger standoff between the die and the substrate. If you have a basic understanding of mechanics, a bigger standoff will give you less “strain” thus less “stress” during thermal cycles; or just imagine if the chip and the substrate is infinitely away from each other (zero stress) or if the chip and substrate is directly attached to each other (infinite stress due to CTE mismatch).

            In practice, having the solder bump “not collapse” during process is the difference between a 60um standoff versus a 90um stanoff. The stress-strain relation is linear so just preventing the solder joint from collapsing is a 50% improvement in thermal cyclic stress. (EDIT: before some PhD comes in to correct me, I assumed the solder material is linear-elastic and not visco-elastic which is a good assumption for this discussion)

            Thus high Pb bump + eutectic pad is better than a full eutectic joint!

            (That’s why I’m saying Neil Mclellan is a moron)

            • RagingDragon
            • 11 years ago

            So high Pb pad is only an option for ceramic substrates, which are more expensive.

            According to others the full eutetic gives a stronger bond between the pad and bump, so the choice is less stress (greater standoff distance) with weaker joint, or more stress (less standoff distance) with a stronger joint. So depending on how much stronger the pure eutetic bond is, the best choice could go either way. Additionally, pure eutetic has lower current density; therefore, more bumps required; therefore, physically stronger.

            So it’s not obvious that McLellan’s statements were incorrect (though they are certainly incomplete). Nor is it obvious Nvidia’s decision to use high Pb bumps was wrong. Just different tradeoffs.

            /[

            • lolento
            • 11 years ago

            I don’t know where Charlie gets his information from but the underfill material is not proprietary to Nvidia. There are only a few underfill vendors out there (you can count them with one hand). If it is an underfill problem, the failures will not be limited to Nvidia only.

            • green
            • 11 years ago

            oh good. but still not sure how you don’t see how silus saw humour in it.
            but since i’ve pushed my way in to this thread i guess i’ll try to explain.

            – lolento says the eu-SnPb joints aren’t more reliable at thermal fatigue
            – you point out eutectic joints have similar melting points are more reliable
            – silus point out paper indicating en-SnPd has a lower melting point

            not knowing the that the two materials didn’t similar melting points made you look incorrect in his mind.
            you tagged your replies with “Charlie’ (and you hadn’t introduced yourself prior to this point)
            a person that has been writing all these articles on this issue on the inquirer would have made you seem silly in his mind
            hence: humour

            rewinding though, lolento was pointing out that mclellan was incorrect that eu-Sn/Pb is more reliable in thermal fatigue compared to high-pb
            you point out eu/eu bonds better than eu/high-pb
            which while true, misses lolento’s point of an eu-Sn/Pb *[

            • bfellow
            • 11 years ago

            I would give you a “F” in reporting if your only references are self-references. My facts are based on my statements I made 5 days ago. You can even ask me if what I said was true.

            • Silus
            • 11 years ago

            Precisely. This happens in other forums, where people with the same bias as “Charlie”, back up his articles, with other articles written by…er…”Charlie”. There’s no real proof, nothing. Just conjecture / speculation, that spew after NVIDIA admitted to the mobile GPU failures, and made this a mess beyond what it really is. And that’s why I can’t help but feel that there’s money involved from one of NVIDIA’s competitors, to pay this guy. I mean, no one can be so biased for free ? Or can fanboyism reach such proportions ??

            • lolento
            • 11 years ago

            I think we can appreciate people who do their research before posting.

            KN Tu is a well respected Professor and I have attended his seminars before. The problem I have with this “paper” exercise (aside from not being publish or peer reviewed) is the following:

            1.) The thermal cyclic stress between the chip and substrate should be applied at the cross-section on the die side instead of assuming the stress is applied on the substrate side where the eutectic solder pad is. A simple explanation of this is because the CTE of the substrate is greater than the CTE of the die and thus deformation is applied to the die. Typical failure modes for bump cracks are always on the die side.

            2.) The energy calculation in the exercise is based on the dropping the height of the illustrated solder joint from 100um to 10um for the composite joint. This is quite a stretch. The stress should actually apply to the full standoff and I would argue that the High-Pb joint will take the bulk of the stress. And if the stress is applied over the height of 100um, thus simply comparing the composite joint vs the eutectic joint, the modulus of the composite joint will give it the advantage.

            My opinion here is that this exercise is written up explain the Nvidia failures based on the data the Nvidia released; face it, no researcher will spend time on Pb based solder studies anymore… BUT the data that Nvidia released is NOT the entirety of what happened.

            (EDIT) Actually, if you follow the theory in KN Tu’s exercise and apply it to Pb-free solder, you will find that Pb-free solder will perform better in thermal cyclic life than eutectic SnPb solder. But this is not the case. The only Pb-free flip chip implementation right now is by Intel, even they didn’t do a full Pb-free solder joint as KN Tu’s theory would suggest one should. Intel built up a 50% copper bump and then applied a thin layer of Pb-free solder to the substrate side MUCH LIKE the composite Hi-Pb solder joint configuration. In other words, based on KN Tu’s theory, a full Pb-free solder joint will be more reliable than the one Intel uses which is half copper joint and then a thin Pb-free solder joint; I don’t work for Intel but assumed they tested the full Pb-free solder joint config (because it is much simpler to manufacture) and find that it didn’t work….

            • lolento
            • 11 years ago

            I found an IEEE paper that covers this solder joint reliability issue. It contradicts what Neil Mclellan said in his earlier comments and also KN Tu’s theory.

            People who are still interested in this should take a read.

            §[<http://rapidshare.com/files/155905932/Solder_Shape_and_Height_Effect.pdf.html<]§

            • Damage
            • 11 years ago

            Hey, Charlie, we have a rule here against ad hominem attacks. We’re not super-strict about enforcing it all of the time, but we do pay attention. You’re pretty clearly stepping over the line. Please dial it back and just stick to arguing the facts, please.

            • GrooWanderer
            • 11 years ago

            No problem and will do.

            -Charlie

      • ssidbroadcast
      • 11 years ago

      douche nozzle… are u bychance an Adam Carolla fan?

      • asdsa
      • 11 years ago

      Well what would you expect. Nvidia guy saying that, yeah our chips fail, buy AMD instead. Of course he tries to sweep this under carpet.

        • Corrado
        • 11 years ago

        You can admit your have a problem without encouraging someone to buy something else. Offer a recall or an exchange program. Tell people what the problem is and what the risks are. Don’t deny it if you know there is, in fact, a problem.

      • Murso24
      • 11 years ago

      right, all he did was claim that Mcllelan from AMD was a liar. similar to John McCain. all he does is ramble on about “the other guy” and say that “he” is going to cause failure in the Economy and etc due to lower taxes and etc.

      i dont think he had the l[

      • ClickClick5
      • 11 years ago

      I said the same thing to myself.
      Standard “politician” or businessman.

      We don’t address the issue, we address the other guy.

    • cass
    • 11 years ago

    So, I just read a bunch of “He said that and we ain’t saying anything”.

    It took two weeks of legalese to get that?

      • DrDillyBar
      • 11 years ago

      indeed.

    • Forge
    • 11 years ago

    OK, so the UNANSWERED ten million dollar question:

    So Mr. McCain, err, McLellan: You’ve dodged the issue with your non-answer. You’ve stated that Nvidia’s production means are beyond reproach.

    WHY THEN are parts in the field failing at unheard-of rates? WHY is Nvidia not taking ownership of a problem which EVERYONE else emphatically lays at Nvidia’s doorstep??

      • MadManOriginal
      • 11 years ago

      McLellan is the AMD guy but I like your style nonetheless. In fact the misstatement of facts makes it more like a real-world politician’s statement than if it were factually correct 🙂

      • Meadows
      • 11 years ago

      Cite these failure rates that are unheard of.

        • JustAnEngineer
        • 11 years ago

        Good point. They’ve certainly been heard before with the well-publicized GPU failures in the X-box.

          • Meadows
          • 11 years ago

          I’m willing to believe that other factors contributed to the abnormal failure rates of the x-box, specifically another component that may have been too hot, or inadequate cooling measures, or plain simply a bad arrangement or case construction. I’m not going to blame anyone unless there’s hard evidence against them.

        • TheEmrys
        • 11 years ago

        Catch-22. Only Nvidia would know failure rates.

    • BoBzeBuilder
    • 11 years ago

    Debate night in America. Mr. Mcllelan from the AMD party vs Mr. Fisher from Nvidia. Which chip will you buy?

    • JustAnEngineer
    • 11 years ago

    Politicians suck.

Pin It on Pinterest

Share This