news intel finds flaw in sandy bridge chipsets halts shipments

Intel finds flaw in Sandy Bridge chipsets, halts shipments

As phenomenal as Intel’s new Sandy Bridge processors have turned out to be, nothing in this world is truly perfect. Intel announced earlier this morning that it has discovered a flaw in the 6-series chipsets that accompany the new processor family. While it reassures users that they can "continue to use their systems with confidence," the chipmaker has nonetheless halted chipset shipments until a new, bug-free version of the silicon starts to ship out late next month.

What’s the problem? Intel explains, "In some cases, the Serial-ATA (SATA) ports within the chipsets may degrade over time, potentially impacting the performance or functionality of SATA-linked devices such as hard disk drives and DVD-drives."

For folks who have already crossed the Sandy Bridge, Intel adds that it will "work with its OEM partners to accept the return of the affected chipsets," and it plans to "support modifications or replacements needed on motherboards or systems."

Yes, that likely means the replacement of all Sandy-Bridge-based motherboards, laptops, and pre-built PCs currently on store shelves or already in the hands of consumers.

That sounds like a fair amount of hassle for all involved, but it probably beats the alternative—degraded storage performance on a state-of-the-art quad-core PC.

Beside the obvious inconvenience and bad PR, this little slip-up will cost Intel quite a bit of money, too. The firm expects to see a $300-million dent in first-quarter revenue (since full volume production of 6-series chipsets won’t resume until April), not to mention $700 million in total repair and replacement costs.

Intel stockholders might not need to cut and run just yet, though. Intel claims it can make up for the lost revenue by year’s end, and in the same press release, the chipmaker goes on to say it now expects first-quarter revenue to be in the $11.3-12.1 billion range, an increase from the previous forecast of $11.1-11.9 billion. Gross margin will, however, be understandably lower than initially expected (59-63% instead of 62-66%).

We are currently checking with Intel and motherboard makers to see how they plan to assist affected customers. Stay tuned for more info as we get it.

Update – 11:43 AM: Intel just held a conference call to talk about the Sandy Bridge chipset problems, and we now have a few more details to share with you.

The problem that’s caused Intel to initiate a billion-dollar chipset recall affects the SATA ports on all 6-series chipsets, including the H67 and P67 chipsets most prominently used in consumer products. All of these chipsets are collectively referred to as "Cougar Point" inside of Intel. Because there are no third-party chipsets compatible with Sandy Bridge processors, all Sandy Bridge-based systems are potentially affected, including desktops, laptops, and DIY motherboards.

The issue is a circuit design problem resulting in a gradual degradation over time of SATA connectivity on the affected ports, manifesting itself as high bit-error rates on those ports and eventually as total device disconnects.

That’s a serious issue, but it’s limited in scope. Intel says storage devices connected to those ports should not be damaged, and data on the devices should be intact and readable on another system.

The ports potentially affected, interestingly enough, are the four 3Gbps SATA ports on the chipset. The two 6Gbps SATA ports aren’t at risk.

Because this is a chip design-level problem, it will require the replacement of the Cougar Point chips embedded in the motherboards of affected systems. Intel expects to be producing an updated, fixed version of Cougar Point silicon in late February, with "full volume recovery" coming later, in April or possibly even late March. Implementing the fix will involve the replacement of a photomask for one of the layers of metal on the chip. The layer in question is apparently a "later" layer in the production process, so we expect there’s some potential for partially completed chips currently in production to have the revised layer applied to them. Note that the 6-series chipset is produced on Intel’s very mature 65-nm fabrication process, not the cutting-edge 32-nm process on which Sandy Bridge CPUs are produced, so this isn’t likely to be an especially thorny issue to untangle. Intel says the change should be "very straightforward" and it has "very high confidence" that the fix will be effective.

As you may know, Intel pours millions of dollars into validation testing for products like these, and its partners at major PC makers do the same. This problem apparently wasn’t detected early on because of its nature, involving a slow degradation of SATA connectivity over time. Intel estimates that something like 5% of systems could develop problems over a three-year life span, assuming typical laptop usage patterns. Beyond that time window, the failure rate might rise further. For systems with heavier usage patterns, the failure rate during that initial three-year window could be as high as roughly 15%. That’s obviously high enough to warrant the drastic action Intel is taking.

The first evidence of the problem cropped up during extended testing by PC makers, after the chipsets had passed the initial validation stages within Intel and within the OEMs. Intel says it learned of the problem last week; understanding and characterizing the problem then took a few days. That analysis concluded last night, and the company put shipments of its chipsets on hold this morning. From what we can gather, Intel partners were only very recently notified of the problem, too.

In addition to affecting systems already on the market, the chipset hiccup will delay the release of a host of laptops and other systems based on the dual-core variants of Sandy Bridge. Those systems were originally scheduled to begin hitting store shelves in the first couple of weeks of February, but Intel now estimates another "few weeks" will be added to those release schedules, depending on how long it takes PC manufacturers to incorporate the revised chipset silicon into their production pipelines. Intel’s estimate sounds a little too optimistic to us, though. Given that the 6-series chipsets won’t likely return to full production volumes until at least late March, we suspect the delays may add up to at least a couple of months in total.

Fortunately, Intel doesn’t expect the upcoming, enthusiast-oriented Z68 chipset to be delayed as a result of the SATA problem.

If you’ve already built a Sandy Bridge system, there are some obvious workarounds available. Most enthusiast-class motherboards these days ship with extra SATA ports driven by auxiliary SATA controller chips from third-party suppliers like Marvell, and those ports aren’t at risk for this problem. As we’ve noted, the two 6Gbps SATA ports on the 6-series chipset aren’t, either. For a great many users, sidestepping this problem should be as simple as moving their storage device connections to the other ports. For those without enough ports, there’s always the option of slapping in a cheap PCIe SATA card, too.

Given the relatively strong performance that we’ve seen out of Intel’s SATA 6Gbps controller, we’d recommending attaching any fast, primary storage devices like SSDs or 7,200-RPM drives to the 6Gbps SATA ports if possible. Other drives, like large and slow-rotating HDDs, should be fine on the third-party controllers. Just be careful to ensure that you have all the right drivers installed and the boot order in the BIOS set correctly before making the move, so you don’t cause yourself the headache of an unbootable system.

This is obviously still a developing story, and we are working to understand how motherboard makers will address the problem for consumers. They seem to have been caught off-guard by this morning’s developments, so sorting that out may take some time.

0 responses to “Intel finds flaw in Sandy Bridge chipsets, halts shipments

  1. I agree with you about AMD….the vaunted gaming benchmark numbers between competing cpus really mean very little when playing at high resolutions–especially if you are a person who doesn’t like tearing and runs with vsync on. And also I harbor lingering suspicions that many of the non-gaming benchmarks commonly used in cpu reviews are overly optimized for Intel architectures/compilers. For the money, AMD is still the best game in town.

    The chipset scenario may not be as clear as it looks, though. I recall Jerry Sanders years ago in an interview saying that AMD did in fact have a P4 bus license from Intel (as a result of their cross-licensing agreement–not shared by VIA, whom Intel sued) but that AMD simply didn’t want to use it–preferring AMD’s own bus designs instead. That said–yes, Intel is notorious for wanting to lock up and dominate entire markets and sectors, and for using every tactic available to it for stifling and/or destroying emerging competition. But to a real extent Intel’s been castrated by the various rulings against it and the various settlements it has had to make, not limited to the legally binding stipulations Intel had to agree to as a part of the $1.25B settlement with AMD (I think AMD viewed these binding stipulations from Intel as worth far more going forward than the damage award it agreed to accept.)

    But, that doesn’t mean Intel will stop trying to corner the market…;) In that sense, Intel is still working hard as we see with its lock-down control of the Sandy Bridge support environment to date. But of course Intel’s current chipset woes are merely poetic justice to some extent. Intel is a chip company and I doubt very much if it will relinquish the opportunity to sell more of its chips whenever it presents itself…;)

  2. Maybe one of [url=<]these[/url<]? Other than that, I think you're stuck.

  3. Except for the fact that… modern nvidia chipsets are pretty damn good. They have good drivers, and are no less stable in any significantly perceivable manner. Also, if you ever read this site, you might also know that they tend to have good peripheral (USB, Firewire) performance relative to both AMD [i<]and[/i<] Intel chipsets. Perhaps in mission-critical applications where 99.99999% uptime is the sought-after goal, but for the average gamer/consumer's desktop? You're blowing hot air. I don't personally care for nvidia chipsets all that much, but I've fixed MANY a computer that features them, and I just download the latest drivers from their site and they work fine. They've long since had integrated graphics that far supersede Intel's garbage, until this most recent iteration of them. I might add, I can't remember a time when Nvidia chipsets had an issue with SATA performance that degrades over time -- a bug which is known to affect ALL of Intel's latest chipsets. I'll give Intel props for being upfront and responsive about the issue... ...but that doesn't even get away from the fact that Intel [i<]is[/i<] being anti-competitive by artificially limiting third-parties from making chipsets. You suggest that nVidia is the only possible alternative to Intel's chipsets, but I disagree -- VIA could certainly make chipsets that the bargain-hunter who desires i7 performance but at a more reasonable price than the ones commanded by Intel chipsets? AMD could even make chipsets for Intel, and that'd be fucking awesome. I'd totally buy an AMD-chipset Intel system, and it'd be a way for AMD to get profits off of... Intel's desireable processors. So, no -- you're wrong. Artificially stifling competition is bad. It's bad for consumers, and it's even bad for Intel. It was once, it will be again -- and I'm not gonna subsidize that. AMD's been playing a tough game of cards these past few years, and they've been panning out. And, they're not relying on political bullshit to do it, so yeah, I'll reward that with my patronage.

  4. Scuze me but you’re in a deep confusion. Intel DOESN’T let anyone else to produce chipsets compatible with their latest processors. AMD on the other hand will be very happy if someone else like VIA or SIS or NVIDIA would still make chipsets because they do have THE RIGHT. Problem is that this companies are not able to produce something competitive but they still have products on AMD socket (true, mostly AM2/AM2+).

  5. I just purchased a custom built Pavilion Dv6 Notebook with Sandy Bridge from HP Home & Office and Wi-Fi would not work straight out of the box.

    Spent 5 hours on phone with tech support, reinstalled the entire operating system using system recovery and still nothing. They now are unable to replace the custom built model with the iCore 7 2630QM processor and I am being forced to return the unit.

    Is this issue related to this recalled 6 series chip set and does anyone have an alternative suggestion so that I may temporarily keep my notebook and get the Wi-Fi working? I plan to use my manufacturers warranty to return the laptop at a later date for a replacement.

  6. [i<]I had the money to buy a higher performance, more efficient Intel-based system when I built my most recent desktop. I went with an 890GX/SB850 system with a Phenom II X6 1055T precisely because of that affront to fair competition.[/i<] Hrmm... Trying to use logic with an obvious troll: 1. Intel is "evil" because they don't let third parties make chipsets for their CPUs. Considering about the only company that would *want* to make a chipset for an Intel CPU other than Intel might be NVIDIA, and considering the fact that the Nforce chipsets had known bugs and other problems that dwarf Intel's.. I'm not sure this is so bad. 2. AMD is good and pure because over 5 years ago during the Athlon 64 days when they didn't really make very good chipsets for their own CPUs they let other companies (Via / AMD / NVIDIA) make chipsets for their CPUs. Of course, at the magical time in the past you are referring to, Intel ALSO had third party suppliers making chipsets. Fast forward to today, however, and the only real options you have for a modern AMD cpu are... AMD CHIPSETS! Don't give me crap about some Nvidia board that might have a compatible socket but likely doesn't even run with the 1055T you're bragging about, the only real option for an AMD CPU is an AMD chipset. So in conclusion: Intel CPUs only work with Intel chips proving that Intel is worse than Hitler. AMD CPUs from 5 years ago worked with different chipsets the same way that Intel CPUs from 5 years ago did. Modern AMD CPUs only work with AMD chipsets, therefore AMD are beings of pure energy from a higher plane of existence who have come to bring us all peace and enlightenment. Fanboi logic at its finest.

  7. I purchased an Intel BOXDH67CL motherboard and built a new HTPC which is performing nicely so far. Today I got an email from that says they will be replacing the board when it becomes available …..probably in April. They said I could continue to use the board until then and should connect the drives to the 0/1 ports to avoid problems. I have connected drives to all the ports already so I have my fingers crossed!
    NewEgg said I could get an RMA anytime in the next 90 days or when the new boards become available, whichever is longer. They stated they would email me when the replacement board is available. If I don’t want to replace the board, they offered to refund the purchase…..of course that probably wouldn’t include the cpu, memory and other components.
    I am very pleased with the performance of Sandy Bridge in my HTPC build……the core i5-2300 cpu seems to be snappier than my main PC with an extreme motherboard and an 875k cpu in it! Kind of overkill for a HTPC but hopefully when I get a CETON cablecard digital 4-tuner it will be able to serve as a server without any problems!

  8. I picked up a P8P67 Pro and 2500K on Sunday and went to bed thinking how lucky I was. Then on Monday I heard the news and went to bed thinking how UN-lucky I was. Now on Tuesday after more info is available I’m actually back to thinking I was lucky again.

    I’ll have my system up and running while all of this is getting sorted out and since this board has four sata 6gb ports I shouldn’t see any problems (other than normal ones anyway…).

    The MB will still need to be replaced in the long run, but until SSD prices come down some more, the four 6gb ports are enough.

  9. I just built a P67 setup and shipped it to a friend across the country, tested it for a week, ran vantage, played some L4D2 on it, primed it for a few hours, etc, left the computer on for hours at a time. Over the past week she started getting hard locks randomly, now every hour or so in windows and gaming, and is getting worse, even locks when idle. I wonder if this is related. I am going to have her move the sata cord into another port.

  10. While my product may lack in quality compared to competition, I can certainly compete in volume and time-to-market.

  11. Competition, yes, as long as the competing product/service/person is, in fact, competitive OR there are enough reasons against the one holding the monopoly.

    Now look here Tiffany, in this case I can’t honestly come up with anything against myself, so let’s stick to the first part and focus on [b<]your[/b<] competitiveness.

  12. It pays that well..? I should start doing it, too.. break your monopoly. You’re a proponent of competition, no?

  13. I guess this is my punishment for switching to Intel after being an AMD user for so long. Oh well … I don’t think it will affect me much anyway. My two main drives are using the 6Gbps controllers and my two optical drives are on the 3rd party controllers, which just leaves my backup drive on the affected port. I’ll probably never see a problem. But I’ll still take a shiny new motherboard to replace mine when they get around to exchanging them. Anand thinks Intel ought to give everyone a Z68 board to make up for the trouble, so I’m keeping my fingers crossed that they do.

  14. It is, but their damage control is not the problem. The problem are the “damaged” brains of the people that buy their products blindly. To even suggest that your customer is wrong, is a bad practice in any business, but Apple takes it to new levels every time, yet people keep buying their products at the millions…They almost seem like those battered wives, that always keep going back to the abusive husbands…

  15. A new revision ? When was that ?

    All I read is that they finally caved in and offered free bumper cases to customers, when at first they wanted to charge for them, because people were “holding it wrong”…

  16. What part of the filling I posted didn’t you understand ? NVIDIA admitted to the problem and they were the first to do so, as they should anyway.
    And if you actually bothered to read NVIDIA’s 10K from that period, you would see that they also said they would be working with OEMs to cover the costs of replacements and warranty services, to which they had a $150 to $200 million charge in this 10K, plus 2 more 10Ks with about $200 million each. In total, NVIDIA paid ~$600 million for the costs of replacements and warranty services of OEMs, over the mobile chip failures.
    The problem was pinpointed to affect some chips, but even those chips wouldn’t all fail. It wasn’t easy to detect and if those situations you mentioned occurred, it’s obviously unfortunate, but what else could they do ? These things are covered by warranties and as you mentioned, they were extended.

    If you think that Intel will send you or anyone else, a direct replacement to anyone with a motherboard that has been proved to suffer from this flaw, then you are not living in the “Real World”. They will use warranty services from OEMs, as any company would.

  17. [quote<]Because there are no third-party chipsets compatible with Sandy Bridge processors, all Sandy Bridge-based systems are potentially affected, including desktops, laptops, and DIY motherboards.[/quote<] Oh? So their dickish strategy of artificially stifling competition in the chipset market "because their processors have integrated memory controllers" comes home to roost? Hmm. Color me not surprised, nor particularly sympathetic. I had the money to buy a higher performance, more efficient Intel-based system when I built my most recent desktop. I went with an 890GX/SB850 system with a Phenom II X6 1055T [i<]precisely[/i<] because of that affront to fair competition. AMD made no such demand for the Athlon 64, which [i<]also[/i<] included a high-speed interconnect and an integrated memory controller -- and we got better chipsets for it. Intel, on the other hand, seems content to stifle competition by locking competitors out of the chipset market for their processors with a damn [i<]business decision[/i<]. How is that even legal, let alone not driving people up a wall?

  18. Your confusing media defects with transmission errors. The error in the chipset will not cause data to be recorded to media incorrectly. It will only corrupt packets between the controller and the drive, which will be caught and discarded by the CRC algorithm.

  19. Intel stock was almost up today on the massive recall, stopping of production and sale, bad reputation, lower guidance… (If this was an AMD news, the stock would have dropped by 15% or more)


    Intel announced that they have allocated 10billion $ to buy back Intel shares a couple of weeks ago…
    I’m sure they knew then of this bug and thats why they increased their stock buyback by 10 billion.

    I think Intel will use that 10 billion $ fund to soak the stock losses over the next 6 month.

    Anyways, Wall street was aware of this news weeks ago. Insider trading is still alive and well.

  20. Does anyone else think that Wall Street’s reaction hasn’t been strong enough? They’re expecting to lose $1 billion on this. That’s likely 15-20% of total profit for the year.

  21. “A lesser company would be tempted to just sweep it under the rag and leave it to OEMs to deal with the damage eventually through their warranty processes..”

    Wrong!! AMD did the same with the Barcelona TLB error.IMO Intel copied the strategy from AMD.Earlier Intel was as tight lipped as anybody about any of its Silicon flaws. I maybe wrong on this but that is what I remember about pre-C2D era intel.

  22. Intel is very conservative compared to the “other guy” who is attempting an integration of a massive GPU and a quad-core CPU while migrating to a new process node with brand-new hi-K metal gates…?

  23. I’ll give Intel credit for coming out and admitting there is a problem but rest baffles me.

    So problem supposedly can be entirely worked around by simply using only the 6Gbps ports? How many laptops use more than 2 ports (1 HDD/SSD and 1 optical drive)? I can see vendors wanting to cut cost with 3Gbps but since the chipset provides 6Gbps support already, I can’t see any any cost savings to be had here and it would simply give the competition a leg up (how would this sell? oh this model has 6Gbps support and that one only got 3Gbps).

    And as TR update details, desktop users don’t seem to have any trouble bypassing issue…

    Then there is this:
    [quote<]Intel estimates that something like 5% of systems could develop problems over a three-year life span, assuming typical laptop usage [/quote<] So how many systems will have seen this issue in say another month or two? Can't they just issue a warning now and how to work around it in the mean time, promise a fix will be forthcoming and customers will be made whole (i.e. pay for damages/free addon card/replacement/free upgrade to the Z68 or however sweet they think is needed for amount of damage done)? In short, the response seem overblown to me for a supposedly such a contained problem that is unlikely to affect too many people and is easily worked around.

  24. If this was AMD , people would have cried for blood!! Because it is Intel see how understanding and considerate all the reports are…bl***y hypocrites.Deep pockets can sure buy good sympathy.

  25. Well remember that if your bit error rate is like 1 in 1,000,000 that means that there are already about 8,000,000 errors on a terabyte drive at any given time. A lot of those errors go unseen because they are in unallocated space. Almost all of them get self-corrected during read or write cycles or isolated by marking the cluster as bad. It IS possible for errors to slow down the throughput without actually causing permanent errors on the

  26. Not for the initial product reviews. Those are generally coming out on or near the release date of the product, and they don’t have their hands on the items long enough for that.

    If they aren’t required to return products after the review, they could run some longer term reliability tests on selected products, but they couldn’t get everything — too many tests to run simultaneously.

  27. Yes, I was thinking this “design flaw” causing “slow degradation over time” was very similar to nVidia’s bumpgate. I still don’t trust nVidia’s circuit engineering, even though they seem to have regained the performance crown.

  28. “How issue was discovered: Last week customers started telling Intel that there was an issue. As Intel stressed the part, then Intel’s labs started seeing a failure to access ports 2 through 5. The Intel stress test simulated time passing and it showed that over time this issue could come up.

    Read more: [url<][/url<]"

  29. True enough, it’s not like it’s front page news on teh New York Times. The avg person (95%) of north america won’t even know about this.

  30. Unless their whole chipset team missed it, I can’t see them firing all of them. Maybe management should fire all of themselves for not waiting 1 more month for bug searching.

    Either way at they are fixing it, and I didn’t buy one so I care even less, one of the reasons I usually wait 1 yr for a new process to mature.

  31. What about accounting for inflation between 2009 and now with all the money printing like tomorrow since then? 😉

  32. New Coke wasn’t a conspiracy, either.

    Sometimes, big market-leading companies still make huge mistakes.

  33. …And I guessed wrong. The information was a bit misleading, but Anandtech has a pretty detailed explanation of what is going on..

    SATA2 PLL is messed up because of some legacy circuits that aren’t needed anymore. Silicon fix appears to be easy metal edits. But, if any of the SATA2 ports are used, the PLL is active and degrades over time… my glorious plan to switch the HDD to a different SATA2 port doesn’t help… once the PLL dies, all the ports die.

    At least it now sounds even less likely that CRC would let bad packets through..

  34. Ok, so I guessed wrong. If it’s the PLL, I can’t just switch around the hard drive to a ‘fresh’ port… if I use any of the SATA2 ports, the PLL is on and degrades over time.

  35. Anand has posted some insider info after further speaking with Intel:

    [url<][/url<] The source of the problem is a single transistor, apparently. I bet they never had problems like this in Star Trek.

  36. Man, I’d hate to be the one who let this error slip through.

    Kind of like being the one who wrote the bad script that created the Pentium floating point bug.

    In a chip as big and complicated as this, there’s never one person solely responsible for any given feature, so you can’t really point and say, “It was all their fault,” but I guarantee that there was some designer that originally made some small mistake that eventually turned into this billion dollar hit to Intel’s bottom line, and I bet that person (and their supervisor, and their coworkers) know who they are.

    Talk about the Monday to end all Mondays.

  37. Yeah, that about covers it.

    Some linkies:

    [url<][/url<] [url<][/url<] [url<][/url<]

  38. Monte Carlo is incorrect. There is a quick deterministic test to confirm/refute the result, so this is a different complexity class, namely Las Vegas methods.

  39. Not sure why you were downvoted? I would think so too. SATA uses a 32-bit CRC, which isn’t bulletproof.

    Assuming completely random corruption (which it probably isn’t…), the chance of a corrupted frame passing the check would be around 1 in 4 billion. Since each sata frame is 8KB, that’s one corrupted frame getting through for every 32 terabytes rejected.

  40. I think you mean i820, the RAMBUS native chipset that had some sort of translator for PC100, which worked very poorly. The i815 IIRC was just fine.

  41. The article mentioned a faulty layer in the chip’s construction; I imagine something is actually wearing away due to the heat or electrical activity, resulting in a transistor that starts giving false output. It would start intermittently, occasionally, but getting more regular as the wear gets worse and the false reading becoming more and more likely, approaching certainty.

    Disclaimer: Mostly conjecture from a layman, pieced together from the article and a few other comments already posted, but possibly buried in the flurry of discussion below.

  42. I bet Apple is glad they haven’t released the new PowerMac yet.

    On an unrelated note, I don’t think I’ve seen a dual-author post before. Nifty.

  43. Thanks, this makes sense. This is also a hot page right now and I didn’t think of that either until after your post.

  44. How can performance degrade over time ?.. cant you just reboot your pc and start again ?.. are they saying the trannies inside the chipset are failing/degrading/going rotten ?

  45. Well, there’s at least one way (other than any PR issues) this might help AMD.

    I’m in need of a new system. Ideally, I would wait for both Sandy Bridge and Bulldozer to be released, and see how they compare against each other before making a final decision. However, because of how well Sandy Bridge is performing, and given that I would prefer to get my new system *now* rather than waiting half a year or so, I had decided to go ahead and buy a Sandy Bridge system when some of the next gen SSDs had started showing up (expected to be in Feb, or so they say).

    Now, if I would be waiting until April for new Intel MBs, I’m suddenly much closer to Bulldozer’s release date, and I would be more inclined to wait for Bulldozer to be released, and see how well it performs before making a final decision (especially if there have been indications that it will perform well in tasks that interest me). I expect that I’m not the only one that’s decided to go with Sandy Bridge because of its earlier (than Bulldozer) release, and this artificial delay *might* effect that decision for *some* buyers.

    In addition to that, supposedly the next gen SandForce SSDs are supposed to show up around May, and if I’m already waiting for April, I might decide to wait until May in order to find out how those perform (partially depending on how the *other* next gen SSDs end up performing). And if I’m waiting that extra month, the Bulldozer release date would be even closer…

    On the other hand, I’m seriously considering ordering a current Sandy Bridge MB before they’re being pulled. The 6Gbps SATA ports would probably be enough, and there’s always the chance that I could get a replacement later…

  46. Well, outside of the behavior Vista SP1 and later exhibit forcing the software fix upon everyone regardless of whether you want it or not (lowering performance overall), no, I don’t think so.

  47. It’s a conspiracy. All your data will be corrupted by this Intel SATA controller unless you buy the $50 software unlock card from Best Buy, which will also turn your Pentium G6950 into a Core i3 530.

    [spoiler<]also sarcasm. I think.[/spoiler<]

  48. Other people are rating the post down in the time between your loading the page and your “thumbs up”. Probably iq100 is one of those.

  49. I think it’s some sort of conspiracy.

    Now hear me out – I think he’s saying that you knew ahead of time but you withheld the information from your review, and he wants some sort of “freebie” (consideration, in his vernacular) for not exposing you. It’s…blackmail? Extortion? Heck, I dunno, I feel dumber just for typing this all out.

  50. OT: I just rated this up (or thought I rated this up) and it went from +6 to +5, not +7. I grant that my eyesight is going to hell in a handbasket but it just seems weird.

  51. When you are in post editing mode, look down at the lower right where it says “Formatting tips”.

  52. This is why I never buy bleeding-edge hardware. There are invariably kinks that need to be smoothed out.

  53. Yeah – I’m the one who got confused about what he’s saying…

    He does seem to mean that CRC may not protect the disk from corruption. I’m sure theoretically it’s possible, I just don’t know how possible

  54. [quote<]When iq100 says there is "data corruption" despite what Intel says, I think he means data being read from the disk and passed though the SATA link can get corrupted because of this degradation issue.[/quote<] If that's what he meant then he doesn't understand the issue at all. But I still think he's trying to claim that there is a potentially-undisclosed data corruption risk to the drive.

  55. The masses that don’t have the knowledge and understanding to make those conclusions think AMD is something you find in a bottle of Centrum.

  56. You’re just upset that your stones are tiny compared to some others. 🙂

    Enjoy the learning, whenever you decide to start.

  57. You’re one of them really sensitive types, eh? Ugh… get thicker skin, man. His was the slightest of slights, so it’s pretty annoying that you’re tying to make it sound like such an outrage. Gross.

  58. Yes, but you’re one of the smart ones that can read between the lines and figure out what happened and why.

    I’m talking about the masses that don’t have the knowledge and understanding to make those conclusions. They’ll just think “intel stuff gets recalled, maybe I should buy AMD, I don’t remember them ever recalling anything”

  59. You know, you could’ve been less lazy and actually read the article before bothering everyone with questions that would’ve been clearly answered in the article..

  60. The scary part is that the 6Gbps port doesn’t have problems.

    Could it be that the 3Gbps port was copied directly from Ibex Peak…? If that’s the case, P55/H55 boards might have this problem on [i<]every[/i<] SATA port? This just hasn't been announced yet..? I sure hope not... I would hate having to replace the mobo on my Clarkdale rig

  61. Wuh? That’s the disclaimer, man! If I actually made a comment before reading without saying “before I read”, that’d be daft!

  62. I guess you did NOT read this:
    “Use this to learn about the details of design and testing methods.
    Not to hurl stones at each other.
    I am hear for learning and fun.”

  63. I think it was mostly the first three words in your comment “Before I read”.

    Yeah… Don’t do that.

  64. My argument is that Intel is asserting one thing while iq100 is asserting another; and of those two parties, one knows more about the issue than the other. And maybe I just missed a significant part of the story above, but it wasn’t iq100 that told the world about Intel’s faulty chip. He’s got a pretty high bar of expectation to meet with evidence for his assertions before his assertions are treated with anything other than dismissal.

  65. Heh, for the companies hoping to piggy-back on enthusiasts building new SB systems this has to suck (ironically, including AMD for graphic cards).

  66. I asked because I thought you might have the technical and marketing experience to make a recommendation. Sometimes it is easier to get a full refund earlier in such an announcement cycle. Some of us have more than one system, in which case my advice would be to immediately arrange for an in-writing ability to return, including pre-payment of shipping charges.

  67. Me neither.

    [url<][/url<] As I write this, all the Gigabytes and Intels are already history. EDIT: Sandy Bridge processors de-activated, too. Example: [url<][/url<]

  68. A recall of every Sandy-Bridge compatible mobo. Wow. That’s all I have to say, wow.

  69. Yeah; I would personally keep the drives in the two SATA3 ports. If I have more than two, I’d keep the others in SATA ports driven by third-party chips.

    If I’m stuck with Intel SATA2 ports, I’d just keep the extra drives in those, and switch to a “fresh”, unused ports at the first sign of data errors… that should buy me some 2-3 years of extra lifetime.

    EDIT: the above is completely wrong – Anandtech has new info. If one of the SATA2 ports is failing, the others are failing too.

  70. lol …Hey guys, this is NOT going to be decided on by voting.
    My only point is that until and unless Intel and/or independent testers/designers share actual data rate curves, based on analytical or monte carlo models, the jury is and should be out.
    Use this to learn about the details of design and testing methods.
    Not to hurl stones at each other.
    I am hear for learning and fun.

  71. No, I wouldn’t return the mobo yet, since that would make the system totally useless. I’d wait and see how the replacement programs are going to work and move my HDDs to other ports, as I recommended above.

    Simple answer, right? Why do you ask? Seems like you think you’re playing “gotcha” somehow.

  72. NewEgg seems to be in the process of pulling their H67/P67 boards.

    Down to 9 as I write this, instead of the ~60 models available before.

  73. EDIT: Ok, I guess I misunderstood the argument – sorry about the confusion.

    I don’t have solid knowledge of that part, but I would guess/think that CRC error detection would prevent bad packets to be written to the disk… error checking logic would detect bad packages and request resend.

    Probably there is some threshold for a number of bad packages that makes the link just give up and shut off, but I don’t know the details of SATA logic.

    Also, mathematically, I bet there is a worst-case, corrupt bit sequence combination that can pass through the CRC error checking undetected and written to the disk, but I would guess the probability of that happening is so insanely low that it’s more likely the SSD/HDD itself has internal data corruption.

  74. Nah, I come from the Real World™. I know multiple people who were affected by that problem as it manifested with the 8600M chip, either on pre-unibody MacBook Pros or on the Dell XPS M1330. For all of them, the situation was the same: defective chips were being replaced with defective chips. One particular friend had her laptop’s motherboard replaced three times in a period of a year or so. So no permanent fix, and the only concession that was offered was a warranty extension.

    NVIDIA did fix the problem when they finally admitted it by changing the materials used, but only for their then-current 9x00M chips, and without offering any advance replacement (i.e., it had to break first to be replaced). And of course no fix was offered for those with 8x00M chips, they were simply screwed. Intel, on the other hand, is [i<]proactively[/i<] offering replacements to people long before they are actually affected by the problem. So if you think it's all the same, then I guess we simply cannot agree.

  75. I think there are some misunderstandings here…

    When Intel says “data corruption”, they likely mean that the data saved on your SSD/HDD doesn’t get ‘destroyed’ by this issue.

    When iq100 says there is “data corruption” despite what Intel says, I think he means data being read from the disk and passed though the SATA link can get corrupted because of this degradation issue.

    The link not working right means data doesn’t get passed through to the chipset/CPU correctly, but it doesn’t mean the data on the SSD/HDD is dead. Switching to an unused SATA port would likely fix the issue (as this port hasn’t been degraded over time due to active use).

  76. Yes flip-mode. I was doing exactly what you surmised: “just asking if you’d (TR) recommend returning sandy bridge mobo if you’ve got one.

    There may be other reasons why Damage resorted to ad-hominem statements.

  77. Thanks, that’s good info, however it doesn’t speak to the assertion that this degradation could produce data integrity errors on the drive. According to Intel and my limited familiarity with the topic, the controller begins producing an increasing number of garbage data packets, tying up the channel’s processing time and bandwidth as these packets are flagged, discarded, and re-sent. Eventually the attached device says “screw it, nobody in this conversation is answering the original question, I’m out”, and disconnects.

    Do you have any info about this problem which would logically lead to corrupted data packets being accepted and written by the target disk, contra Intel’s public statements so far?

  78. Damage wrote>”… I think you misunderstood it … Not that the dude is a font of clear and consistent thinking …”

    Please do NOT resort to ad hominem comments.
    If you disagree with a specific detail, then post the detail you disagree with and why.

    flip-mode already pointed out your error, when he wrote>
    “… I think you misunderstood his question. -1 to stick it to the man! I think he’s just asking if you’d recommend returning sandy bridge mobo if you’ve got one.’

    That is all I was doing. Asking, if you could honestly share with the TR community ‘if you’d recommend returning sandy birdge mobo if you’ve got one.’

    I do NOT know who Damage is? If it is Scott Wasson, then in the interest of full disclosure, it should be stated that I communicated via email to Scott regarding what I thought was his flawed article:
    [url<][/url<] I do NOT mean to get this article OT, but if Scott is going to make ad hominem statements here, I think TR reader benefit from full disclosure.

  79. What? This is your argument?

    Overall, I don’t see anything from you anywhere in this thread that shows some actual understanding of the issues. Maybe you do know what the issues are, but don’t let us know.

  80. The problem here is the degradation of the physical I/O circuits over time, presumably caused by high gate oxide stress or electromigration issues due to a miss in circuit reliability validation.

    The analytical/montecarlo/other signal integrity analysis approaches don’t capture this kind of stuff – they focus on channel losses, inter-symbol interference, noise/jitter modeling etc. Circuit degradation over time isn’t usually taken into account, because the basic assumption is that the circuit is reliable and won’t degrade too much..

    If a design mistake causes the circuit to start failing more and more over time, no amount of error checking can save you. If every other bit is wrong, your link is done.

    The point here is that if these I/O circuits are active, because of bad design they degrade much faster than “good” I/O circuits would, and bit error issues would start happening in 2-3 years instead of 10-15 years.

  81. First, as SPOOFE noted, you may be confused on what the tick-tock strategy is and does within Intel.

    Second, there is no design process in existence that can eliminate all possibility of failure or bugs. For you to point at A, and say it must automatically indict B, is very poor reasoning unless you’ve got some inside knowledge to share about what went wrong, or else have a much deeper theoretical analysis of how A and B are intertwined.

  82. Yeah, his posts have a strangeness to them. I was giving him the benefit of the doubt and taking the chance to tease you a little. No harm intended.

  83. I’m still reading you as “Unless you can prove that you’ve stopped beating your wife…”, or basically, FUD.

    Per the post update, Intel is claiming that there is no data corruption risk, only degrading performance culminating in device disconnects. The legal ramifications for lying at this stage, particularly when they’ve already announced a recall that will cost them roughly a billion dollars, are large.

  84. Read his other posts to give his question context. I think you misunderstood it. Not that the dude is a font of clear and consistent thinking.

    Also, note we haven’t yet recommended a single *system* based on those CPUs. We just recommended CPUs and motherboards. I don’t think his question makes sense, taken literally.

  85. Blah blah blah. As soon as you typed “IMHO” you should have stopped and bashed your head into the wall until you lost consciousness; that would have contributed more to the discussion than your ignorant opinion presented as fact.

  86. Right about what? I know what analytical and monte carlo testing are. His use of the words in that context was non-responsive. If he both (a) understands what he’s talking about and (b) seriously believes there is a bit-error combination that can get past a 32-bit CRC check, then I presume he can characterize that for us with some basic math.

  87. Not buzzword spam. It is serious food for thought. Until the details of actually tested data error rate curves, and/or their analytical models, are provided by Intel, independent parties will NOT be able to determine whether Intel’s (initial) claim that data will NOT be corrupted is or is NOT valid. After all Intel is stopping production of the chipset! IMHO they would NOT do this only because of slightly increasing correctable ‘data errors’. In theory, alpha radiation (and similar natural phenomena), cause random data errors. If the error rates stay low, these are caught by ECC, CRC, and other error detecting/correcing mechanisms. Listen to the Intel audio broadcast:
    [url<][/url<]{3db17df9-9491-4a3a-966c-4b2527a8a999}&RGS=3&IndexId= There is a lot of 'hemming & hawing'. Until an analytical or monte carlo derived bit error curve can be provided, the jury as to the possibility of loss of actual data remains out.

  88. I think you misunderstood his question. -1 to stick it to the man! I think he’s just asking if you’d recommend returning sandy bridge mobo if you’ve got one.

    [spoiler<]I am going so-o-o going to get the vote downs for this one! LOL[/spoiler<]

  89. Huh? This makes me feel that if there were a problem with any other Intel products, they’d identify the problem before anyone else and immediately get to work fixing it, damn the cost. How on Earth do you arrive at any other conclusion?

  90. Yes – any way you look at it, this is FAIL, and will affect consumer perception of Intel’s reliability, regardless of how quickly they fix this, or how this only affects a couple of SATA ports on a single chipset line. People will start wondering about all mobos, CPUs, SSDs… anything with an Intel logo on it.

    Somebody’s gonna get f-i-r-e-d.

  91. [quote<]They seem determined to stick to their tick tock strategy which is supposed to help avoid cock ups like this[/quote<] I don't think you know what you're talking about. Their tick-tock strategy is designed so that they're never dealing with both a new arch AND a new process at the same time. And it's a strategy for their CPU's. This problem affects a manufacturing process that is several generations old, and oh yeah, it's on the motherboard. It's a completely different realm than their CPU development.

  92. I liked the “Dual Core Optimizer” errata solution for Athlon X2. That was a fun evade. 😉

    Phenom also had that major issue with CnQ and Windows XP, where the cores would clock down independently and Windows would shift active threads to the 800 MHz cores. That never got fixed in XP even with Phenom II and you need to use 3rd party utils to lock the core clocks together. I’ve never figured out why they didn’t address that in their CPU driver.

  93. Been too busy writing the story above and getting information out to folks ASAP to sit around, stare at my belly button, and contemplate the ethical dilemma of having recommended a Sandy Bridge CPU whose associated chipset had a then-unknown flaw in a SATA port. Had I done so, I don’t think I would be feeling terribly guilty, though, for not being omniscient.

    Voted -1 for trolling and/or lack of clarity of thought.

  94. This is buzzword spam. If you’re simply appealing to the Infinite Monkey Theorem, then we don’t have much to work with here.

  95. Definitely, that’s because you’re in a position of knowledge and have the time to swap out the board when you get it replaced, and then deal with Windows 7 de-authorising itself because the hardware has changed, and so on, and so forth. Hassle.

    If you’ve got a system already, and you have the option to not use the SATA2 ports, then keep it, especially if you don’t have a backup system because you sold it or handed it on because you got the new one. Wait until there’s plenty of replacements available (later this year) and then RMA it – especially if you can get a replacement board before returning the existing one.

  96. Agreed, an example would be lovely. As SATA uses a 32 bit CRC, I’d love to know the odds of a few random bits being able to create a pattern that is able to also match that big of a CRC.

  97. There are two alternative ways this is usually done. One is called ‘monte carlo’. The other ‘analytical’. I will need to work on this. Is this going to be a paid effort? (lol).

  98. Yeah, I just picked up an Asus P8P67 Pro (and 2500K) as well from MicroCenter.
    I am going to think hard about returning it.
    The reason is I dont know what form the solution is gonna take.

    *Is Asus gonan send me a “repair” kit?
    *Will I have to send my board in first, THEN get one back, leave me without a system for a while, or will they do a advance RMA?
    *If advance, do I have to have a credit card?

    What the hell?

  99. I guess I have to agree that the PR sucks. Although, given that Intel’s chipsets usually provide better performance in nearly every aspect (when there isn’t a problem with the SATA ports), I can’t imagine this affecting them for too long.

  100. If you had purchased one of the 2nd gen systems you recommended, would you return it, and demand full compensation including shipping charges?

  101. I’ve got a P8P67 Pro and a i7-2600k on my desk at home and I am still going to dig in and install them in my desktop tonight. I’m going to attach my drives to the affected ports and I’m going to watch that beast chew through some 4.5ghz goodness on my big Nikon RAW files. This is s no-brainer. When Asus finally gets some new silicon in April or whenever, I’ll RMA the board and get a shiny new one. This seems like another hiccup to me.

    I agree with vvas, kudos to Intel.

  102. Well I guess what I’m getting at is that customers are going to be more wary of Sandy Bridge in terms of reliability.

    They are more likely to give Bulldozer a chance. Intel’s execution of SB was nothing short of incredible. This chipset issue has destroyed their excellent launch of SB. I am sure they will recover fine but it does open the door for AMD and Bulldozer.

  103. [quote<]If the rate gets high enough, it is conceivable that a wrong bit pattern will get thru CRC checks.[/quote<] Considering the steps that are required to get a SATA data packet successfully transmitted, this claim is going to need a defense that includes an example of how such a thing might occur. If you've got one, I'm genuinely curious.

  104. Yeah, I don’t know how fast they’ll get pulled. I guess that will determine how many sell. Like I said though, I for one wouldn’t be afraid to get one and have it replaced later on. It’s not a perfect situation but some people can live with it.

  105. A CPU isn’t much use sitting there without a motherboard.

    Presumably Bulldozer will ship with motherboards – hopefully without problems that require a recall – at around the same time that Intel is getting fixed chipsets out the door in quantity.

    The late February shipment will be very limited – it’s a metal layer fix, so they will have some hot-fix wafers ready to go with the fixed metal layer masks very soon. Once these are done, there will be a waiting period until the new stepping grinds up a gear – hence April before new chips appear in quantity.

  106. Hey, the keyword there is “eventually”.

    FWIW I still think it was a fiasco, because it took them lots of bad press before they finally caved in and did what they should have done in the first place, which is to make a new revision of the phone and offer refunds to people.

    So yeah, they made things right… all the while kicking and screaming.

  107. We’ve updated this story and then added to the update multiple times throughout the morning. If you’ve not read the latest revision above, I’d recommend doing so. We now have info about the fate of dual-core Sandy Bridge systems, and we’ve added a possible workaround for those already using SB systems themselves.

  108. But will there be any SB motherboards to buy soon? I assume that the OEMs are now going to be getting them shipped back from retailers (as it is a full recall by Intel, and affected products in the channel seem ripe for not actually getting sold to customers), then either junking them, or removing and fitting the fixed chipset *when* it becomes available (ooh, they love having inventory just hanging around waiting to get fixed).

    Maybe some people at home will keep them going, knowing that if they have issues they can get it fixed later on. I’m sure Intel will not be getting them actively recalled until then, apart from implementing an extended warranty on this issue (you would hope!).

    And because two SATA ports work, many laptops won’t be affected.

  109. I’m not sure why people keep saying this helps AMD and BD. I guess it will slow sales for a little bit, but that’s about it. This doesn’t affect the CPU, either SB or BD, at all. Care to explain your reasoning there Tejas?

  110. One concern is people that put it off and already have systems built might “forget” about ports 2+ and then later on populate these ports and experience issues.

    Or if you resell the MB and an unsuspecting buyer populates these ports unknowingly.

    Or just the stigma of this chipset leaves people not buying used parts later on. It might be better for resale just to avoid this chipset altogether.

  111. That’s just insane. This has nothing to do with CPU performance, and even less to do with Bulldozer considering it won’t even ship until this little crisis is resolved.

  112. [quote<]Intel stockholders might not need to cut and run just yet, though. Intel claims it can make up for the lost revenue by year's end, and in the same press release, the chipmaker goes on to say it now expects first-quarter revenue to be in the $11.3-12.1 billion range, an increase from the previous forecast of $11.1-11.9 billion. Gross margin will, however, be understandably lower than initially expected (59-63% instead of 62-66%).[/quote<] Jee, I wonder why revenue is higher. Maybe because they've had time to dump a lot of SB platforms in wake of positive reviews?

  113. Intel alludes to a rising error rate. If the rate gets high enough, it is conceivable that a wrong bit pattern will get thru CRC checks.

  114. It’s not going to affect anything on the drive though. I’m not saying it’s a good thing man. It’s just not going to hurt what you have on the disk.

  115. Sure DancinJack … it is like ‘dancing around the issue’ … like telling your girlfriend’s parents that she is partially pregnant. To me, once there are data errors, all bets are off.

  116. the bit errors will be caught by the CRC error checking and packets resent. it should just cause a slowdown until eventually no good packets or not enough get through and the device simply disconnects.

  117. Yeah, I wouldn’t say corruption. Errors for sure. Maybe someone else can elaborate/correct me.

  118. You can hear the explanation/details of the problem direct from Intel here:

  119. They made things right ? When ?
    When they wanted to charged for the iphone case ? Or said it was just a software problem, where the number of bars was being rendered incorrectly ? Or when they finally admitted the problem, but proceeded to bash other phones for the same problem (by holding them in the most ludicrous ways possible), so that the iphone’s problem seemed less important ?

  120. Since Intel stated>”… high bit-error rates on those ports …”,
    it is hard to believe that this cannot cause data error/corruption to attached disk drives, including SSD.

  121. NVIDIA didn’t sweep anything under the rug. They were the first to talk about the problem…

    [url<][/url<] [url<][/url<] But by mentioning "bumpgate", it's pretty clear where you're coming from: Lala land of semiaccurate 🙂 Plus, how do you think Intel is going to solve this ? The same exact way. It's right there in the artcile's text: "For folks who have already crossed the Sandy Bridge, Intel adds that it will "work with its OEM partners to accept the return of the affected chipsets," and it plans to "support modifications or replacements needed on motherboards or systems." Work with OEM partners is essentially using the warranty process, but making sure that the OEMs are aware of the problem to which they might be replacing a product.

  122. “Self-builds, desktops and high-end laptops can’t use it for a while.”

    Exactly. Not Bobcat competitors, not OEM breadwinners. Not significant, even all combined.

    Again, I just have to question how convenient it is. OEMs can keep their laptops coming and the pressure stays on AMD, and it only costs Intel some of their ginormous stash to keep it growing.

    The current iteration of Sandy Bridge was pitted against Bulldozer from the start. Intel doesn’t care. The “enthusiast” desktop market is not their concern.

  123. I wouldn’t be afraid to buy a SB motherboard today. I know it’s a hassle a lot of people won’t want to deal with, but most people buying won’t even know this existed. SB will sell. SB motherboards will continue to sell. Some will be replaced and some won’t.

  124. Me too, I just recently bought two P55 Motherboards and I’m glad that I did, P67 will wait a while.

  125. Yeah, but now there’s a situation where customers want Sandy Bridge, and it won’t be available until April or May, except possibly in certain laptop configurations that will only use two of the SATA ports.

    Self-builds, desktops and high-end laptops can’t use it for a while. And in that while it is likely that AMD will release it’s consumer version of Bulldozer, and thus make things interesting for the consumer to choose.

  126. Before I read, does that flaw have anything to do with the fact that the P67 cannot access the integrated graphics? [spoiler<]The preceding was sarcasm.[/spoiler<]

  127. continuing with your ‘conspiracy theory’ maybe this is just a step in the Internet kill switch political efforts. (lol).

  128. Going through a recall is never a good thing for any company and I wish Intel all the best for fixing the problem and getting the excellent product that is SB to the consumers hands.

  129. Oh, believe me, it’s tough to be more cynical than me. :p

    That’s why I’m saying not to give them any credit lol. I expect that they still could have done more, within reason, and this was a choice made as the result of some sort of well calculated balancing act.

    This is Intel we’re talking about here. Don’t you find it a bit fishy that they already have these costs worked out? They are made of money. If that’s all a decision will cost them, then that is the route they will go.

    But what keeps that money coming? Market share. The alternative was not being first to market with CPU+GPU chips, and not just ceding momentum to AMD, but then releasing more expensive chips that serve largely similar purposes AFTER AMD. That doesn’t look good. OEMs already seem to have taken a liking to Bobcat. They’d have been bleeding market share right out the gate.

    They aren’t in full swing with Sandy Bridge yet, so this isn’t a catastrophy. It’s as minimized of a problem as possible, and they still managed to completely overshadow AMD’s efforts. It works out entirely in their favor. So there’s my conspiracy theory harharhar.

  130. This seems to be the only thing in the short term that would allow the majority of folks to press on. I have a SSD, another HDD, and a BR/DVD Drive. I could get a PCI-Sata card as you suggested to tide me over if the performance of the BR/DVD drive is that abysmal. It has been noted though that it’s only a possible degradation and “long term” that may never show up.

    I ordered lats night so I’m already waiting and sweating it out.

  131. This is a gift to AMD and Bulldozer.

    They have no excuse to not put up a decent fight against Sandy Bridge now.

  132. Yep… were it Apple, they’d just say “don’t plug it that way” 🙂

    Yes, I know Apple eventually made things right regarding the antenna thing. I’m just joking.

  133. [i<]"A lesser company would be tempted to just sweep it under the rag and leave it to OEMs to deal with the damage eventually through their warranty processes."[/i<] You mean like how Intel handled the floating point bug initially?

  134. Was there ever a single case of the TLB bug affecting a system in the wild?

    Failure rates were far below 0.1% of all systems. This affects 5-15%. Big difference. Also Intel had a similar TLB bug that affected corner cases but they didn’t publicise it because it would effectively never come up just like all the other CPU errata on the list.

    Have you seen Intel and AMD’s errata lists? There are significant bugs in the designs that won’t affect anyone really. AMD just make the mistake of announcing it, if they’d kept quiet no one would have experienced it.

  135. Renaming the fixed version “Bricked Bridge” might be a step in the wrong marketing direction (hehe).

  136. It pretty much means that the only Sandy Bridge products you’ll be able to buy in the next couple of months will be laptops using SATA ports 0 and 1 only.

    I guess there will be a new MacBook Pro then!

  137. My point was that their strategy is flawed. They seem determined to stick to their tick tock strategy which is supposed to help avoid cock ups like this because it is very incremental with upgrades and improvements.

  138. I don’t think the launch date would have made much difference. I mean, they only moved it a few [i<]days[/i<] earlier; while issues such as this one take [i<]weeks[/i<] to address. As for the general sentiment, I guess I'm more cynical than you: I almost [i<]expect[/i<] companies to screw their customers, and thus I'm positively surprised when they don't. ;^)

  139. Yes, but what about eSATA ports? Just because 0 and 1 are being used internally doesn’t mean the others are walled off from the user. Particularly on high-end products where Sandy Bridge is being introduced.

  140. Actually, that $300 million is lost revenue from stopping production. Another $700 million is estimated for the repair/replacement costs of existing shipments. That’s a net of $1 billion. The $475 million figure you quoted for FDIV would be around $660 million in 2009. So this looks to be way more expensive.

  141. Below is (allegedly) Intel’s statement of the defect, in a more exact manner that TRs summary. Notice that this can effect optical as well as hard and SSD drives.

    From Intel;

    “This is an issues with the 6 series chipset (Cougar Point) impacting SATA ports 2-5. If you are using ports 0 and 1 there are no issues. The issue was root caused and a new stepping (B2) is coming end of March.

    If you have purchased 6 series platforms, call your supplier to return them (if you are intending to use SATA ports 2-5) All the ODM’s and OEM’s are notified and are being notified and they can give you more detail (or you can use me if you have more questions)

    I will keep you posted with any new information I get on this chipset.”

    Keep in mind that SATA port’s numbering system starts with 0, so they are saying it does not affect ports 0 & 1 which is all most laptops use. Even the Sager NP8170 uses ports 0 & 1 for the hard drive bays, and only uses ports 2-5 for Optical Drive etc.”

  142. It’s not low impact if it turns into a ton of lawsuits and OEMs drop them in favor of the very serious competitors that are starting to line up, with pitchforks in hand.

    They announced this pretty much, what, two weeks after these started showing up in retail? Someone probably already knew about it beforehand.

    The trouble is that they moved the Sandy Bridge launch up at the last minute, and then this happened immediately after.

    If someone goes running through a china store and knocks something over, but pays for the damage, should they be congratulated?

    Let them handle the problem they created, and just leave it at that.

  143. Some people might still prefer the Intel over the notorious Nvidia chipsets, if I’m remembering all those complaints correctly (but I never had anForce board).

  144. How can you possibly know that? Are you an Intel employee or contractor with inside knowledge on how this errata occurred?

    Bugs happen. They will happen even in a conservative development strategy, and once in a while, something will get through to the market. All Intel’s PR says is, “In some cases, the Serial-ATA (SATA) ports within the chipsets may degrade over time.” What that means, remains to be determined, but that’s not an everyday type of problem where the issue shows up quickly and repeatably.

  145. I don’t see how that fits here. They’re the LEAST conservative in quite possibly the entire industry with die shrinks, which is a major part of that.

    They were likely not being conservative here, either. They seemed to be rushing Sandy Bridge, and in doing so, what likely happened is that they overlooked a very minor change to a chip that has not been changing much.

  146. Brick Bridge? Granite Bridge?

    Better than the erosion the Sandy Bridge brand will suffer because of this.

  147. Real kudos to Intel on how they’re handling this. Even though it seems to be a relatively low-impact problem, they’re being as proactive as they can, taking the PR hit and monetary hit in the process, so that their customers won’t be affected. A lesser company would be tempted to just sweep it under the rag and leave it to OEMs to deal with the damage eventually through their warranty processes. I’m thinking NVIDIA’s “bumpgate” here (which some friends of mine are still affected by); the contrast is striking.

  148. Some Details: [url<][/url<] "ALL six series chipsets both desktop AND mobile are part of this. ALL 6 series chipsets have been recalled and are being pulled." "the issue is with SATA port 2-5" Are these the SATA2 ports, not the SATA3 ports 0-1? If so, I bet Intel wish they'd gone straight to six SATA3 ports, eh?

  149. But Intel refuses to do too much at once hence the tick tock and no USB3 chipset support yet. Their strategy is ultra conservative, making changes in stages one by one. But it hasn’t paid off.

  150. Frankly I suspected failure for the Sandy-Bridge processor from the beginning.
    Who in their right mind would build a bridge out of sand (lol)?

    In order to separate the failed design, I think the name Sandy-Bridge should now be withdrawn, and the re-designed product be called Bidge, or New-Bridge, or at least New-Sandy-Bridge. I know I would NOT drive over any structure who’s label remains Sandy-Bridge.

  151. Until Intel reveals/shares the details, all is speculation as to what it does/would take to reveal the defect.

    Until TR prominently withdraws its recommendations for the Sandy Bridge products it has already tested, Intel will have less incentive to share the details in a manner that might allow TR-like testers to detect such defects in the future.

  152. What I’m especially interested to hear is if this bug is in any danger of damaging the hard drive itself, or just the chipset SATA controller. Intel are in for a world of hurt if this ends up nuking data.

  153. That won’t work unless it breaks or degrades within hours of testing. Intel themselves would spot that.

  154. At least this time around, they did not wait until [i<]millions[/i<] of chips are in customers' systems before the problem was found and a plan of fixing is in place. So that's why it is only going to be $300 million. The FDIV bug cost $475 million in 1995 money (src: wikipedia). Plus the PR fallout is going to be much better than the last time since they are now being pro-active. I happen to have a copy of the "Pentium Chronicles" and it contained a few pages of account about the FDIV thing and how Intel realized that the game had changed in terms of silicon errata. Looks like the lessons were not completely forgotten. And that's not a bad thing IMO.

  155. The money is the biggest thing here. Average joe buying from Best Buy won’t even know this happend. ~1B in Q1, maybe half that in Q2? Not that they can’t deal with it. As long as people don’t get screwed with returns/exchanges I think Intel handled this just fine.

  156. Damn. Sandy Bridge was the first platform I was ever an “early adopter” on, any time before and I’d agree with you. I figured that Intel’s been on a winning streak on everything since the P4, so What Could Possibly Go Wrong? 🙂

    That said, zero problems here so far, and I push the discs quite hard (C300 for OS and video editing on the HDD’s).

  157. Hmm, the use of the word ‘defect’ might not be the most accurate one to use WRT the design tools. Think of this like a bug in a compiler. The C code might be correct, but the compiler make an executable (the chip design) that had a problem. The executable is broken–at least in some situations–but the real flaw is in the compiler.

    I wouldn’t say we’re doomed. These kinds of glitches are very rare and can be very very hard to find. Fortunately, they are very rare. 🙂 They require for there to be a flaw in the logic synthesis code and in the timing analysis code.

  158. No, just leave the test running overnight. Compare the perfomance from the first minute, to that of the last minute.

  159. Will you guys pull your recommendation because of this?

    The SB and associated motherboard reviews still show Editor’s Choice and no mention of the bug. Those previous articles should probably be updated with the new info.

  160. Is this not the second time round that an Intel chipset has had release problems. First it was the Foxconn CPU socket problem and now this. Quality control is not doing their job me thinks…

  161. You speculate that the defect is in a design TOOL. If that turns out correct, we are DOOMED :-). It would mean that every product that used that design tool would have to be re-examined. Will Intel reveal enough about the nature of the defect, so its technical customers can share in and understand such details?

  162. Heh, they weren’t kidding when they said that Sandy Bridge would be like the Pentium in terms of impact. Big performance increase and some sort of shipment-stopping bug that’s really going to suck for those that find it.

  163. Nobody’s review found ti because it’s (supposably) a gradual de-awesomeing of the SATA controller.

  164. It’s 65nm and extremely similar to the previous Core iX “PCH,” which also was so similar to the previous Core 2 southbridge that the driver seemed to be the same thing and it would actually show up in the Windows hardware manager as ICH10. These things don’t change much and I don’t think that had anything to do with it.

  165. If you’ll only ever attach two SATA drives, you could use the motherboard with the Marvell controlled ports, then when you wanted an upgrade you could claim “teh drives don’t werk” at Intel and get a nice replacement. Or just buy a PCI-e SATA card to get more ports.

  166. Oh hell, I just picked up an Asus P8P67 Pro (and 2500K) over the weekend. I may try to return it since I havn’t broken the seal on the static bag yet…

    The Asus boards use a Marvell 9120 chip for two of the four SATA 6 connections. I wonder if this bug would affect them? I did a couple of searches, but there just doesn’t seem to be enough info out yet.

    Just my luck, this seemed like a solid upgrade from my AMD 720 BE. Looks like I jumped too soon.

  167. Yeah, I don’t think this is comparable at all. The TLB issue, particularly once you got into Windows Vista SP1 and Windows 7 where the “fix” is enabled all the time, is a constant problem. This one will creep up over time for some people, but it’s not going to instantly degrade performance for everyone.

  168. At least all of the potential Sandy Bridge purchasers – who now cannot buy motherboards for said chip (if Intel will recall all those on the channel as they should) – will now get a chance to see the Bulldozer reviews if fixed motherboards are only available from April.

    Whilst the problem isn’t with Sandy Bridge, it’s going to put a taint on the entire platform. If only Intel had allowed a different chipset manufacturer to co-exist, then they could still be selling CPUs.

  169. [quote=”Nutmeg”<]And that's why i'll never be a [b<][i<]paying beta tester[/i<][/b<] ;)[/quote<] fixxed

  170. I am curious as to the details of the defect, and how it was measured/detected/proven?
    TR has tested, and in fact recommended, certain Sandy Bridge systems.
    Some here may have based their purchasing decisions based on TR’s testing results and recommendations? So what is the nature of the defect, why did TR tests fail to reveal it, and is there anyway to modify TR testing procedures to detect material defects in products that TR recommend?

  171. Coppied from todays short bread as this is a more appropriate place:

    Short of fab related issues, I’m trying to think of ways this could happen. Intel calling this a ‘design issue’ makes it pretty certain that this isn’t a fab issue. The first thing that comes to mind is that there’s a timing glitch that turns on both the high side and low side drivers on the SATA tranceivers at the same time. That’ll put a huge current spike through both of them. Repeat that enough times and you can damage the transistors. If the glitch is in a sufficiently rare circumstance, this could be a very hard thing to find.

    These kinds of timing bugs can be very hard to find in synthesized logic, you have to rely on the tools to prevent them from happening. It’s possible that they just got a new version of their timing analysis tool and it showed a possible glitch that the old one missed. An annoying property of these types of problems is that they may only occur in some very odd process/use edge cases–for example, they might only happen at -40 degrees on chips produced at night on a friday. 🙂 In other words, it’s possible, but it may never happen. A prudent company will fix the problem and alert their customers–like Intel just did.

  172. Shades of the TLB bug in the original Phenom chips… Unfortunately, this isn’t in the CPU itself; the whole motherboard will have to be replaced.