AMD ships revised Ryzen CPUs with a compile bug fix

A little while ago, the Linux-headed sleuths over at Phoronix came across an obscure bug in Ryzen CPUs that triggered segmentation faults ("segfaults," which cause application crashes) in a compiler-specific workload. The bug was confirmed by AMD's engineers, who described it as a "performance marginality problem" and indicated that owners of affected CPUs could reach out to AMD Customer Care. The engineers also noted that Ryzen Threadripper and Epyc processors are unaffected. Phoronix's Michael Larabel took AMD up on its replacement offer and recently received a revised CPU that's apparently clear of the bug.

Larabel performed his due diligence and got to torturing the revised Ryzen 7 1800X under the same conditions that triggered the compilation segfaults. He's happy to report that he came across no issues whatsoever, whether related to compilation or not. The system used for testing was the same for old and revised processors alike. That also seems to clear any suspicion that the bug could somehow also be related to the motherboard or another component.

In his article, Larabel goes on to note that judging from his observations, Ryzen CPUs manufactured before week 25 (mid-June) of 2017 appear to be the ones affected by the problem. The bug is reasonably difficult to trigger, but users that want to rest easy can reach out to AMD Customer Care and get a replacement part.

Comments closed
    • ronch
    • 2 years ago

    By the time I buy Ryzen I hope they already would’ve ironed out all the kinks so that it’s so reliable it’ll run buggy code like it doesn’t have any bugs! See, it even irons out buggy code for you! ^_^

      • just brew it!
      • 2 years ago

      Well, apparently it is smart enough to skip code that doesn’t do anything (the CPU-Z benchmark weirdness that happened shortly after Ryzen launched). So who knows? 😉

    • timon37
    • 2 years ago

    Heh, funny feels like “the telephone game”.
    Actually AMD didn’t officially state anything to my knowledge.
    The news is mostly based on Michael saying “they sent me one working cpu, so they exist” and AMD sending working CPUs as RMA replacements to about 7 people.

    As far as I managed to follow thread [2]:
    1. The working CPUs were acquired via RMA from AMD where they tested the cpu with the same motherboard and ram as what the customer had
    2. User sat: received a CPU from week 25 as first RMA (without the testing being done by AMD) and it didn’t work, then after the second RMA he receiving a tested CPU from week 30 that worked, which means it was a CPU problem
    3. User hifigraz: received a CPU from week 30 that doen’t work, but I’m not sure if it was tested by AMD, also it could be something else that’s the issue (though I doubt it)

    [1] list of CPU weeks and status: [url<]https://docs.google.com/spreadsheets/d/1pp6SKqvERxBKJupIVTp2_FMNRYmtxgP14ZekMReQVM4/edit#gid=0[/url<] [2] [url<]https://community.amd.com/thread/215773?start=1050&tstart=0[/url<]

    • synthtel2
    • 2 years ago

    Welp, I guess I’ll be getting one of these replacements.

    That linked AMD support page didn’t have anything on it except “Online Service Request” and a language selector, for some reason. Firefox 55.0.3, and I tried disabling both my addons. Chromium works. Weird. On an off-topic note, when did a fresh install of Chromium get so much slower at everything than an install of FF I’ve been using for nearly a year? Ads weren’t even at fault.

      • TheMonkeyKing
      • 2 years ago

      Have you tried the nightly build of 57? It’s my new favorite browser. It does take a bit of getting used to, though.

        • synthtel2
        • 2 years ago

        Despite Arch, I tend to stick with stable versions of anything not strictly gaming-related. Was 57 where they added Quantum CSS?

    • mcarson09
    • 2 years ago

    Finally the Rev B chip is out. It’s rumored to have better pci-express support too.

    [url<]https://www.techpowerup.com/234476/amd-readies-b2-stepping-of-the-ryzen-summit-ridge-silicon[/url<]

      • just brew it!
      • 2 years ago

      Or not?

      [url<]https://hardforum.com/threads/ryzen-b2-stepping-a-false-rumor-apparently.1937644/[/url<]

    • ShadowEyez
    • 2 years ago

    Remember the Pentium FDIV bug? When this floating point error was discovered by a math professor and reported to Intel, Intel initially was not going to replace the defective chips. This made people’s perception of them worse, and it also didn’t help that Intel likely knew about the bug internally.
    Like this Ryzen bug, it was relatively difficult to trigger and most people wouldn’t even know it existed but for reading about it. AMD’s doing the right thing here, by giving a replacement part upon request. While this may effect some short term profitability, the long term PR nightmare of not doing it would likely be worse especially as AMD tries to really compete with Intel again.

      • ronch
      • 2 years ago

      I’ve had to RMA a couple of Athlon 64 X2 chips back in 2009. No quibbles, got replacements in complete retail packaging 5 days later. That’s something that gives you peace of mind when buying from AMD.

    • ronch
    • 2 years ago

    This isn’t such a big deal but nonetheless this is the sort of thing that makes me skip the first iteration of a newly released product. This goes for computers, cars, medicines (don’t wanna be a guinea pig), new kinds of appliances like LCD TVs when they first came out, new game consoles, etc.

      • ZZZTOPZZZ
      • 2 years ago

      Would that apply to a brand new Kate Upton also? Just kidding. I’m sort of like you on new stuff, but curiosity also gets the best if me. I got lucky, I suppose with the Asus GTX 1080 Turbo. I jumped right on it and haven’t been disappointed.

        • just brew it!
        • 2 years ago

        Can you name any features that a Kate Upton v2.0 could potentially provide which are not already available from a competing product? (And before I get accused of being sexist, IMO “productization” is the name of the game in the entertainment industry, regardless of gender.)

          • rwburnham
          • 2 years ago

          Charlotte McKinney is Kate Upton 2.0.

        • ronch
        • 2 years ago

        Well sometimes I jump the gun right away too. Like when Piledriver came out, I got my FX-8350 just 2 months later. But then again boards for it have been out for more than a year back then anyway and it’s the second iteration of Bulldozer so it was a pretty safe bet.

          • just brew it!
          • 2 years ago

          Yeah, I wouldn’t call that “jumping the gun”. Piledriver was essentially the “we fixed all the stuff we screwed up in Bulldozer” release. If you bought Bulldozer on launch, [i<]that[/i<] was jumping the gun. And for threaded workloads, Piledriver was not an unreasonable chip.

            • ronch
            • 2 years ago

            Yeah, I realized ‘jumping the gun’ wasn’t quite right after I posted it but I left it as it is anyway. :-p

            Piledriver is a good design and fixes bugs that may have been discovered in Bulldozer’s first year and implements the ‘low hanging fruits’ improvements that AMD engineers can do without doing a major revision that only came with Steamroller.

            • Anonymous Coward
            • 2 years ago

            Would have been interesting to see them add SMT to a Dozer. Might have been a good server chip, both IBM & Sun seem to think higher clocks with wide SMT are the way to go.

            • ronch
            • 2 years ago

            SMT on Bulldozer would’ve seemed kinda redundant because the CMT concept was pretty much AMD’s way of making the core handle 2 threads concurrently. It’s no doubt an interesting concept but then they limited each integer cluster to just 2 ALUs which is really narrow by today’s big core standards. Having just 2 ALUs means a simpler integer scheduler, a cache system that doesn’t need to be smart enough to keep a wider core fed, fewer decoders, etc. Looking back CMT seems to me like a cheap way of handling 2 threads. They could’ve gone with a wider core with SMT like they did with Ryzen™ 6 years later but they chose the less difficult path. Also, CMT means the 4 ALUs within a core (which they dubbed a ‘dual core module’) are effectively partitioned into 2 sets of 2 ALUs so a single thread can use a maximum of 2 ALUs, unlike in Ryzen™ and Intel Core™ where all 4 ALUs are available to a single thread if data dependency is not an issue.

            • Anonymous Coward
            • 2 years ago

            I’m thinking in particular about Sun’s Ultrasparc T chips, which started out with incredibly thinly provisioned FPU power. Loading each “core” of a Dozer “module” with say 4 threads would have been quite mild in comparison to what Sun did. Its true that Dozer has pretty narrow execution for so many threads, but I’ll hazard to guess it would have nonetheless been effective on IO-heavy workloads. Long pipes, tons of threads, and server workloads appear to go well enough together that IBM and Sun can sell the stuff.

            • ronch
            • 2 years ago

            Compared to the sort of SMT Sun did, yeah anything would be considered mild. 🙂 8 threads on the T5 is just bonkers!

            Speaking of SPARC, there’s a version of the T4 called T4-2. I imagine IT guys having endless tea puns with it. Maybe someone already set up a tea table with 2 chairs next to a T4-2 server rack somewhere on the planet?

            • Anonymous Coward
            • 2 years ago

            Yeah, T4-2, and who says Oracle knows no joy. Maybe I should check ebay.

            I think throughput-oriented computing is as interesting alternate perspective on performance, closer to industrial machinery. Its also inevitably the future of computing.

    • torquer
    • 2 years ago

    Whew, all of the 2 people impacted by this in the real world are breathing a big sigh of relief.

      • just brew it!
      • 2 years ago

      A lot more than 2 developers were impacted. Multi-threaded compiles are an important part of the workflow for C/C++ developers. A large percentage (probably majority) of C/C++ code in the Linux and embedded worlds is compiled with gcc these days.

      The uncertainty this issue caused also made people who were thinking of buying Ryzen to use as a development system to hold off, or go Intel instead. When you have an issue which causes random segfaults (as this one did), it also calls into question the reliability of the output of the compiler even when it [i<]doesn't[/i<] segfault. Random segfaults on a workload that runs perfectly fine on every other x86 platform is a pretty big red flag, because it calls into question the stability and ability of the CPU to consistently produce correct results. These kinds of doubts about the Zen architecture are the last thing AMD needed, just as they are re-entering the HEDT and enterprise markets. This is why the fix is so important, even if only a small percentage of current Ryzen users are affected.

        • torquer
        • 2 years ago

        Didn’t say it wasn’t a problem, just vastly overstated considering how difficult and artificial the catalyst was. The TLB bug was way, way, way worse/more impactful. It would have been fixed regardless as is every other piece of errata in every other CPU. My point is that its a tempest in a teacup due to fanboyism and the fact that its a new architecture. 99.9999% of users would never notice a difference.

          • just brew it!
          • 2 years ago

          Large parallel gcc compilations are not “artificial”; they are part of my normal day job workflow! This issue literally would have disqualified Ryzen as an option for me and a non-trivial percentage of my co-workers. (I say “would have”, since my employer wasn’t considering it to begin with. But it’s a show-stopper.)

          And as I’ve noted elsewhere, there’s nothing “special” about gcc. It’s just a C/C++ compiler. If the CPU was screwing up during parallelized gcc runs, there are almost certainly other corner cases which could trigger the same bug; we just haven’t noticed them yet. And maybe some of those other corner cases would not yield an unambiguous failure like the gcc segfault; silently producing incorrect results is worse than an application crash, and we don’t know enough details to know if that was a possibility or not.

            • torquer
            • 2 years ago

            From your own source material:

            “AMD engineers found the problem to be very complex and characterize it as a performance marginality problem exclusive to certain workloads on Linux. The problem may also affect other Unix-like operating systems such as FreeBSD, but testing is ongoing for this complex problem and is not related to the recently talked about FreeBSD guard page issue attributed to Ryzen. AMD’s testing of this issue under Windows hasn’t uncovered problematic behavior.”

            “With the Ryzen segmentation faults on Linux they are found to occur with many, parallel compilation workloads in particular — certainly not the workloads most Linux users will be firing off on a frequent basis unless intentionally running scripts like ryzen-test/kill-ryzen. As I’ve previously written, my Ryzen Linux boxes have been working out great except in cases of intentional torture testing with these heavy parallel compilation tasks. Even when carrying out other heavy, non-compilation (GCC or Clang) parallel workloads in recent days, from server tasks to scientific processing, my Ryzen test boxes have been stable. I’m still using Ryzen 5 on my main desktop system without any faults in day-to-day use on Fedora 26 Linux. ”

            So I suppose if your entire business model or use case is doing a crapload of parallel compilations and you have to use Linux then it was a big deal for the 3 weeks it existed. Truly Earth shattering, folks.

            • Waco
            • 2 years ago

            Just because it didn’t affect you doesn’t mean it didn’t affect a lot of people. Edge case bugs like this are hard to characterize, but not hard to reveal with heavy workloads. As someone looking to purchase quite a large number of Epyc systems, I was concerned as were many others looking for datacenter parts.

            Don’t marginalize this. It’s in AMD’s favor to do so, but there were many concerned about it and it was trivially easy to trigger on basic compiles. Compiling code in many many many fields is not “torture testing”, it’s an everyday workflow. If compiling can trigger it, so can many other unknown workloads. Further, there’s always worry something like this would cause corruption rather than a flat-out crash.

            • just brew it!
            • 2 years ago

            Just because [i<]you[/i<] didn't hear about it until 3 weeks ago doesn't mean it didn't exist before then! The issue has existed since Ryzen's launch, and there's been chatter about it on multiple web forums going back several months. It has gotten more attention lately because it has become apparent that it is a real defect (not just random unstable builds), and tech news sites have finally picked up on it. I run large parallel compiles as part of the workflow at my day job on a near-daily basis.

          • Waco
          • 2 years ago

          Anyone doing software development or deployment would find it quite easily. Now that there’s a fix, it’s a non-issue.

        • chuckula
        • 2 years ago

        I salute your monopolization of Top Comments!

      • TheRazorsEdge
      • 2 years ago

      My employer does a lot of code development. We’re not a software company per se, but it’s part of a delivering a product. We had a hold on buying any AMD CPUs until this issue was resolved.

      One major concern in cases like this: You don’t know the scope of the problem. The issue only showed up in GCC compiles in the initial reports, but who knows what else could be affected?

      E.g., this would kill our productivity if engineers couldn’t run their workloads in SAP, Matlab, or SolidWorks. It’s not worth taking the chance.

        • flptrnkng
        • 2 years ago

        Curious if you have any Skylake processors with hyperthreading?

        Have you disabled hyperthreading, to work around:

        [url<]https://arstechnica.com/information-technology/2017/06/skylake-kaby-lake-chips-have-a-crash-bug-with-hyperthreading-enabled/[/url<]

          • smilingcrow
          • 2 years ago

          Intel Releases Critical Skylake And Kaby Lake HyperThreading Bug Fix:
          [url<]https://hothardware.com/news/intel-hyperthreading-bug-fixed[/url<]

            • flptrnkng
            • 2 years ago

            I’ve been checking MSI’s website for a BIOS fix, but they still haven’t posted one for my Z170A motherboard/Skylake 6700K.

            I don’t know if I’ve ever actually run across the failure, but it’s just a desktop gaming/browsing machine.

            If I had this in a mission critical application, I’d be running with Hyperthreading Off.

            At some point, I imagine MSI will provide a BIOS fix.

            • smilingcrow
            • 2 years ago

            My Dell Latitude laptop had a BIOS update very soon after the fix was announced.
            I wouldn’t touch MSI after previous experiences and your experience reinforces that was a good call.

            • JustAnEngineer
            • 2 years ago

            Gigabyte provided a new BIOS for my GA-Z170N-Gaming 5 at the end of June.

            • Wirko
            • 2 years ago

            It’s amazing how many issues are fixable through a microcode update. What we call microcode is obviously much much more than simple instructions how to break up individual x86/x64 instructions into RISC ones.

          • TheRazorsEdge
          • 2 years ago

          Those bugs were so rare that they weren’t reported until long after release. Any purchases would have likely been made well in advance of the announcement. I’m sure we have Skylake CPUs, but I’m not aware of us being affected by that issue.

          Every CPU has errata. The important questions are the scope of the issue and the ability to fix or work around it.

          If a very serious issue is discovered with an unknown cause, then it is very difficult to estimate the scope—and nevermind implementing fixes or workarounds.

        • torquer
        • 2 years ago

        Considering it caused a crash during compile, I think you’d know right away.

          • just brew it!
          • 2 years ago

          Unless/until more details are released by AMD, we don’t know what other potential effects this bug may have. The gcc crash was just the first (and most obvious) reported manifestation.

      • K-L-Waster
      • 2 years ago

      I wonder how many people would say the same if it was Intel that had this problem… most likely there would be 200+ posts by now calling for them to be boycotted 4evah!

        • torquer
        • 2 years ago

        People would say it regardless. All new CPU architectures (and mature ones) have errata. Truly impactful stuff gets fixed with a new stepping or microcode fixes. Others are just documented so its known. This is just getting a lot of press because of fanboyism and that its a new architecture. Just as everyone predicted, it was quickly fixed and almost no one would truly have been impacted.

          • Waco
          • 2 years ago

          I’m not sure why you’re on this mission to marginalize the problem. Yes, all new architectures have problems. The issue lies with the fact that AMD didn’t know about it and didn’t catch it until after shipping chips – and it was easy to trigger in normal workloads (regardless of how much you think it was a torture test, it was not).

      • Pancake
      • 2 years ago

      It’s not the people impacted that are a problem for AMD. They’ve already bought into the Ryzen ecosystem.

      It’s more the people looking from the outside and going “NAWP!”

    • hansmuff
    • 2 years ago

    So it’s a new stepping then? What’s the stepping, how to identify it, etc?

      • just brew it!
      • 2 years ago

      Nobody’s explicitly said anything about a new stepping. It could just be tighter testing of chips manufactured using the same stepping, to check for a corner case failure mode that certain chips exhibit.

      The Phoronix article indicates that chips manufactured after mid-June should be the “fixed” ones, and there’s a pic showing where to look for the date code.

      • 5UPERCH1CK3N
      • 2 years ago

      No new stepping, same stepping as the earlier ones. Perhaps a binning issue or something, my replacement just arrived today and it was manufactured in week 28.

        • just brew it!
        • 2 years ago

        Yup, my bet’s on a binning/testing issue of some sort. But it is also possible they tweaked the manufacturing process slightly without changing the masks.

        It could’ve even been something like an issue with how the chip was being assembled to the substrate causing power delivery glitches.

        Unless AMD issues a statement explaining the issue (or someone inside AMD leaks the info), we’ll never know for sure.

          • Kougar
          • 2 years ago

          If it is a power issue, has anyone attempted to use it as an overclocking stability test?

            • just brew it!
            • 2 years ago

            Probably not a bad idea as part of an OC test suite regardless. Multi-threaded compiles give the CPU a pretty good workout, especially if run from an SSD or RAMDisk (so that storage isn’t a chokepoint and all the threads are running flat-out).

      • Action.de.Parsnip
      • 2 years ago

      Anything made from week 25 onwards. Not a stepping change

    • AMDisDEC
    • 2 years ago

    There is a ton of margin in Ryzen CPUs, so the fix barely registers on their bottom line.

      • Pancake
      • 2 years ago

      Which must be why they’re still losing money at a frightening rate.

      Yet another reason to let fanboys do the early testing – on any platform. Cautious engineers like me will sit back and wait and wait until everything seems ok and is proven. On my current build I even tested it for 2 months straight before using it for work.

        • ronch
        • 2 years ago

        Same here. Let them test it. I’ll buy it when it’s ready and when I actually need it. Right now I just don’t.

        • just brew it!
        • 2 years ago

        I’m pretty cautious with new builds too; I keep the previous one around until I’m very confident that the new one is rock-solid.

        That said, I think with this segfault issue being solved Ryzen really is “ready for prime time”. If I had a need for a new desktop right now, I’d seriously consider one.

      • ronch
      • 2 years ago

      Especially since Lisa Su is the bestest CEO EVERRRRR!!! RIGHT???

        • homerdog
        • 2 years ago

        Dr. Lisa Su you plebeian. And he has already told us who his favorite CEO is (not Lisa) in a surprisingly convincing post.

    • just brew it!
    • 2 years ago

    It isn’t really a “compile bug” per se, it just happens that lots of parallel gcc threads seem to be particularly good at triggering the issue.

      • AMDisDEC
      • 2 years ago

      Could be power or clock related, or a hundred possibilities. Bottom line is, much ado about nothing much.

        • just brew it!
        • 2 years ago

        Unless you’re one of the people who bought one to do Linux software development, and have been living with this issue for the past few months.

          • BobbinThreadbare
          • 2 years ago

          my understanding was that normal compiles usually didn’t trigger it, almost had to be a benchmark compile designed to strain the chip.

            • just brew it!
            • 2 years ago

            IIRC the person who originally discovered it was just trying to compile something large (Linux kernel, maybe). Any project of comparable size/complexity is going to put some pretty serious stress on all of your cores if the build is launched in multi-threaded mode.

            Edit: For reference, compiling a Debian kernel takes ~16 minutes in multi-threaded mode (-j 16, so 2 threads per core) on my FX-8350. All cores are running flat out for most of that time (it appears to drop back to single-threaded mode for a short while at the end, when it is building the final .deb packages).

            • 5UPERCH1CK3N
            • 2 years ago

            The kernel compile would get it, but also mesa, gcc, libreoffice, and sometimes random small packages where I didn’t have the problem repeat. And that was after I jacked the SOC voltage up to 1.85 per AMD’s recommendations. Knocking the make job size down to 8 or 9 would make it harder to hit, though it wasn’t a guaranteed solution. Also, not a satisfactory solution, I should note.

            Assuming this also cures the stability problems I had with the last processor just idling at a console for a couple days, I’ll be satisfied.

            • jihadjoe
            • 2 years ago

            The ‘benchmark compile’ usually cited is just a normal compile, packaged into a benchmark so people can use it to test their systems.

            The bug was originally discovered by people actually working in Linux and using Ryzen to compile their projects. It was heavily referenced in the Gentoo forums, and there’s a [url=https://docs.google.com/spreadsheets/d/1gzniXYcXm1uACXGoBLpbpq54KE6SlHxQ6M_wPnTkub8/edit#gid=950983791<]Google Doc[/url<] that the community put together describing what hardware/software was in use and they were doing that triggered a segfault. A lot of those were just plain compiles of stuff like Linux distros, Chromium, Webkit, Mesa and the like.

          • AMDisDEC
          • 2 years ago

          The odds are as likely as McGregor knocking out Mayweather.

            • just brew it!
            • 2 years ago

            As I’ve noted elsewhere, the biggest problem with it was that it cast doubt on the reliability of the CPU. A segfault in a program that runs fine on other processors implies a hardware fault; and if you’ve got a hardware fault that isn’t completely understood, you don’t know what other effects it could potentially have. It was very important for AMD to resolve this, and it is good to see that it is finally being put to bed.

            • Waco
            • 2 years ago

            This. I’m happy to see that there’s a fix – now there’s no reason to doubt the efficacy of Epyc. 🙂

          • ptsant
          • 2 years ago

          I have been compiling a lot of software in Linux on my 1700X and did not notice any strange behavior, although I use 12 threads for compilation, instead of 16.

          I will probably wait until the B2 stepping comes out before sending an RMA, if I can confirm the bug (haven’t yet bothered to try…).

        • brucek2
        • 2 years ago

        How do you, or anyone else, know it’s “nothing much”? To my knowledge no one has identified anything truly unique or extraordinary about the most widely published reproducible case. Here it is described as a “demanding” “multi-threaded” “gcc compilation” of which no part sounds like an unusual workload for a chip like this.

        How is anyone to know if a case that is uncommon today might end up being triggered by something super common tomorrow? Who would knowingly want an unpredictable liability that could bite them anytime in the future, and what’s worse with a signature that would be hard to immediately diagnose? No thanks.

      • Rza79
      • 2 years ago

      It’s a bug in the IRETQ instruction which manifests under certain conditions related to hyperthreading.

      • mcarson09
      • 2 years ago

      Intel does a better job of testing their CPUs than AMD does. The Pentium math bug is infamous and Intel still can’t live it down.

      [url<]https://www.youtube.com/results?search_query=intel+pentium+math+bug[/url<]

        • just brew it!
        • 2 years ago

        All complex modern CPUs have bugs. Here’s one from Intel that is much more recent than the Pentium FDIV bug: [url<]https://lists.debian.org/debian-devel/2017/06/msg00308.html[/url<] At least that one was fixable with a microcode patch and did not require an RMA of the physical processor. I could cite many more hardware bugs, from both Intel and AMD. It is unreasonable to expect a modern CPU to be 100% bug free at launch. What [i<]is[/i<] reasonable is for the vendor to acknowledge the bug, do a root cause analysis, and take corrective action.

          • NoOne ButMe
          • 2 years ago

          Mcarson09 is correct. Intel has tended to do more validation. Or at least longer validation. Typically around 2-4 months longer on consumer side.

          Not sure about professional side. Might be equal there.

            • ronch
            • 2 years ago

            Could be, but I can swear by the reliability of my AMD processors, and I’ve owned far more AMD chips than chips from any other chip house. I’ve used those AMD chips far longer than non-AMD chips too. Can’t really say they’re bad at all.

            The only issue I had with AMD chips is with the Athlon 64 X2 Brisbane chips which overheated and shut down. I had 2 of those. The heatsinks weren’t the problem (I of course first tried to see if I had a cooling problem) and they actually became very hot, suggesting the heat output of the chips simply increased and became excessive after about 9 months or so. I’m guessing the silicon somehow demonstrated higher electrical resistance later on.

            Nonetheless my RMA experience with AMD was straightforward. No quibbles whatsoever, got spunky new retail box replacements in less than a week. Gotta love AMD for that.

            • just brew it!
            • 2 years ago

            Physical reliability/ruggedness and absence of bugs are not the same thing. You could have a really buggy piece of silicon that can nonetheless take a lot of abuse without failing; conversely, you could have a relatively bug-free device which tends to fail when subjected to even a moderate over-volt or over-temp condition.

            That said, I’ve owned multiple AMD CPUs that continued to soldier on even after being run at excessively high temperatures for extended periods of time (due to failed CPU fans).

            Re your Brisbane issue… I wonder if there could have been a motherboard VRM issue that was subjecting the chips to voltage spikes, gradually degrading them over time.

            • srg86
            • 2 years ago

            This is the crux of it for me, and my differing experiences to yours. I’ve found Intel CPUs and Chipsets (especially) to be more rugged and reliable than AMD based setups and it sounds like yours have had a harder life than mine.

            Anyway its that ruggedness and feeling of solid stability that keeps me on Intel these days. All CPUs (and Chipsets for that matter) have their bugs.

            • just brew it!
            • 2 years ago

            Yes, AMD has certainly had a lot of chipset issues in the past. Starting with the AM2 generation they seemed to get the stability part sorted, but still tended to lag a bit in features and performance. Lack of native USB 3.0 support on AM3/AM3+ really hurt because they were forced to rely on dodgy 3rd party USB 3.0 chipsets. (I’m looking at YOU, ASMedia!)

            I also tend to stick with vendors I trust for the motherboards — almost exclusively Asus lately, but in days gone by I also used DFI, Tyan, and Abit. And I generally use ECC RAM, don’t overclock, use low-end GPUs, and over-spec the PSU. So a number of factors which all likely contribute in little ways to overall system stability.

            • srg86
            • 2 years ago

            Apart from the ECC RAM and Tyan boards (I have had ASUS, DFI, ABit and AsRock), I’ve done very much the same and had differing results. Also aren’t the Ryzen chipsets in part designed by ASMedia? As soon as I saw that, I thought *avoid*.

            • just brew it!
            • 2 years ago

            Yeah, I saw that rumor too. Haven’t heard much one way or the other regarding stability of the USB 3.x support on Ryzen. No news is good news?

            • synthtel2
            • 2 years ago

            All seems fine with it here, FWIW.

            • ronch
            • 2 years ago

            It’s kinda like it is with cars. Some people swear by Toyota, some by Honda, some by Ford or GM. Toyota is kinda like Intel: you can’t really go wrong with them but at the same time buying a Ford or Nissan doesn’t mean you’re making a terrible decision.

            • just brew it!
            • 2 years ago

            Longer does not necessarily mean better.

        • UberGerbil
        • 2 years ago

        As JBI says, all CPUs have errata¹ and [url=https://www3.intel.com/content/dam/www/public/us/en/documents/specification-updates/desktop-6th-gen-core-family-spec-update.pdf<]Intel CPUs are no different[/url<]. Their bugs have been lower-profile lately, but they also have not been changing much from generation to generation lately either. It's much easier to have a reliable and comprehensive test suite when you're doing little more than tweaking your fabs. [1][quote="Intel"<][b<]Errata[/b<] are design defects or errors. Errata may cause the processor's behavior to deviate from published specifications. Hardware and software designed to be used with any given stepping must assume that all errata documented for that stepping are present on all devices.[/quote<]

          • ronch
          • 2 years ago

          Yeah, AMD’s last 2 product generations (real generations, not what Intel passes for a generation these days) are from-scratch designs. For a small company with relatively little R&D money, AMD sure punches above their weight.

      • Mr Bill
      • 2 years ago

      Was compiling with 12 threads instead of 16 threads (like ptsant) a good workaround?

        • just brew it!
        • 2 years ago

        I imagine it reduced the odds of hitting the bug, though I have no idea by how much. It seems there may have been some variation in how susceptible individual CPUs were, so it is likely a “YMMV” type situation. Over in the AMD forums, there were people who were saying they were hitting the issue after just a few minutes, while other people’s systems could run parallel builds for hours before segfaulting.

    • Takeshi7
    • 2 years ago

    It couldn’t be fixed in microcode? That’s gonna be expensive.

      • just brew it!
      • 2 years ago

      Yeah must be a corner case in the silicon their testing didn’t catch, or something like power distribution on the chip itself. Sucks to be them. It’s unclear what percentage of the chips were affected though; if it was small it may end up not costing them too much. A lot of people probably won’t bother with an RMA if they’re not hitting the issue.

      • Andrew Lauritzen
      • 2 years ago

      In theory expensive, but I bet AMD know that the number of people that will bother replacing their CPUs in this manner (or even those who know about the issue!) is very low.

      Glad to see them do the right thing though, and I imagine it’s only possible since they don’t have OEM/laptop parts in the wild with Ryzen yet.

        • ClickClick5
        • 2 years ago

        Eh…in laptop form? No. OEM desktop, yes.
        [url<]http://www.dell.com/en-us/shop/cty/pdp/spd/inspiron-5675-gaming-desktop[/url<]

        • mcarson09
        • 2 years ago

        It is an example why you want to avoid 1st gen products.

        • ronch
        • 2 years ago

        I don’t think they have any other choice. The last thing they need these days is having people perceive them as a bunch of humbugs who just wanna sell chips and make money but not willing to replace faulty chips. Any company that wants to earn the trust of everyone will need to show that they’re there with you when you need them and they’ll fix whatever they did wrong with their products.

        Then again my experience with AMD RMA has been positive. I had to return two overheating Athlon 64 X2 chips back in 2009 and it was fast and no quibbles whatsoever. Got my new chips in full retail packaging about 5 days after I sent the faulty ones to Singapore.

      • ronch
      • 2 years ago

      Yeah. I thought you can even change a chip’s pin layout via microcode these days.

Pin It on Pinterest

Share This