Poll: Where do you use ECC RAM?

One of the more contentious divides between Intel's X299 platform and AMD's X399 platform is AMD's baked-in support for ECC RAM, a feature that Intel only offers workstation users as part of the much more expensive Xeon-W family. While we certainly understand the value of ECC memory and support its inclusion in any workstation platform worth its salt, we find ourselves wondering whether the number of users with ECC RAM in their systems holds up to the weight being placed on the feature in measuring up today's high-end systems. With that in mind, we turn to you, dear TR reader, to tell us where you use ECC memory using the poll options below.

Comments closed
    • Shouefref
    • 2 years ago

    Would you please stop those cheesy jokes?
    (yes, I know, that’s one too)

    • helix
    • 2 years ago

    There is a thing about this poll: So many options are true!
    I would like to have it everywhere, as it improves reliability. Especially in my main desktop and and anything that stores data as checksumming filesystems (esp. ZFS) are sort of designed with the assumtion that RAM has ECC.
    Do I have it in my current (aging) desktop system? No. 🙁
    Do I have it in my servers at work? Yes, of course! (Except that one really old box I, but I avoid thinking too much about that one. And that other ancient one… hmm.)

    • moose17145
    • 2 years ago

    I WAS about to say “In my servers”, and by that I of course mean “in servers at work…” because lets face facts here… I, personally, am not doing anything at home that needs the reliability of ECC….

    BUT…

    Then I noticed this option about cheese….

    • wingless
    • 2 years ago

    Consumer Intel products can’t use ECC…..You have to run AMD for that privilege. Luckily Ryzen is a good platform for it.

    I still have a lowly 4.7Ghz i7-2600K so it’s nothing but blue screens for me!

    • evilpaul
    • 2 years ago

    Back in the Athlon, Duron, Athlon XP, etc, days I always had ECC RAM. I don’t know that it ever made a difference, but it wasn’t much more money for a little more reliability potentially. I wish Intel supported it and ECC Enthusiast RAM was a thing.

    • Mr Bill
    • 2 years ago

    Supercomputers + massive memory, as cosmic ray detectors…
    [quote<][url<]https://spectrum.ieee.org/computing/hardware/how-to-kill-a-supercomputer-dirty-power-cosmic-rays-and-bad-solder[/url<][/quote<][quote<]Jaguar had 360 terabytes of main memory, all protected by ECC. I and others at the lab set it up to log every time a bit was flipped incorrectly in main memory. When I asked my computing colleagues elsewhere to guess how often Jaguar saw such a bit spontaneously change state, the typical estimate was about a hundred times a day. In fact, Jaguar was logging ECC errors at a rate of 350 per minute.[/quote<]

      • Waco
      • 2 years ago

      This is less true than the report would have you believe (I’m at 7500 feet with about 4 petabytes of DRAM sitting in the room next to me). They were plagued by many problems and cosmic rays were the least of their concerns.

        • chuckula
        • 2 years ago

        Sounds like you’re in LA.

        And of course I mean Los Alamos when I say LA.

          • Waco
          • 2 years ago

          My LA is the best LA. 🙂

      • just brew it!
      • 2 years ago

      That has gotta include defective DRAM chips. I can state with pretty high confidence that DRAM error rates (for non-defective RAM) were in the vicinity of 1 error per GB per month 25 years ago, and DRAM has demonstrably gotten a lot more reliable than that since. Even if we assume no improvement since the 1990s, 1 error per GB per month would equate to only around 10 errors/minute with 360TB, not 350 errors/minute.

      Edit: Looking at it another way, the raw error rate implied by that link would translate to 1 flipped bit per hour on a PC with 16GB of RAM. You would not be able to make it through an overnight run of Memtest86.

        • Waco
        • 2 years ago

        Exactly. There’s a lot more at play there than indicated.

        • Mr Bill
        • 2 years ago

        (1)First case, lack of error correction in bus
        (2) G5 case lack of ECC
        (3)One error per terabyte per minute = (1440min/day) 62 x 16GB machines = 23 flips per day/PC
        just checking
        (3) They did expose it to neutron flux and got a big uptick in errors
        (4) It was several years ago fab process may have improved since then

        These were single bit errors, do those even get reported if the ECC corrects them?
        [Edit]
        MEMTEST reports single bit errors but that would not crash the PC so MEMTEST would make it through the night. So I guess you are saying its not typical to see a bit flip per hour reported by MEMTEST for 16GB.
        [quote<] The study found that double-bit errors occurred about once every 24 hours in Jaguar’s 360 TB of memory.[/quote<][/Edit]

          • just brew it!
          • 2 years ago

          [quote<]1)First case, lack of error correction in bus (2) G5 case lack of ECC[/quote<] I was focusing on the Jaguar case. The first and second cases were defective by design. An error-prone system bus is just bad design. In the G5 case they could've mitigated the issue if they'd designed the system to be tolerant of individual node failures. But distributed fault tolerance is hard, and they obviously glossed over this issue and just crossed their fingers. [quote<](3)One error per terabyte per minute = (1440min/day) 62 x 16GB machines = 23 flips per day/PC just checking (3) They did expose it to neutron flux and got a big uptick in errors[/quote<] That was for the first one (based on AlphaServers), not Jaguar. Regardless, the reported background error rate for Jaguar is way too high unless there are defects in some of the DIMMs. I bet if you analyzed those errors you'd find that the vast majority of them are coming from a very small percentage of modules. And yes, I'd expect the rate to go up if Jaguar was exposed to radiation flux. But the baseline rate is way out of whack. [quote<](4) It was several years ago fab process may have improved since then[/quote<] 2009 was not that long ago, and memory sizes of desktop PCs was already large enough by then that the reported error rate would've been problematic for non-ECC systems. And as I noted, the "soft" error rate was already much lower than than 25 years ago. [quote<]These were single bit errors, do those even get reported if the ECC corrects them?[/quote<] They get logged so that a system administrator can identify DIMMs with flaky/stuck bits (and replace them). If they were getting that many double-bit (uncorrectable) errors a minute the machine would be completely useless because you would never be able to trust any of the results. [quote<][Edit] MEMTEST reports single bit errors but that would not crash the PC so MEMTEST would make it through the night. So I guess you are saying its not typical to see a bit flip per hour reported by MEMTEST for 16GB.[/quote<] Right. I meant you would be very unlikely to get a "clean" overnight run of Memtest86 on a 16GB non-ECC system. This is clearly at odds with reality. TBH I'm not entirely convinced that Memtest86 reads the ECC logs correctly. I generally turn ECC off in the BIOS if possible during the Memtest86 run, to make sure that I know about any single-bit errors. [quote<]The study found that double-bit errors occurred about once every 24 hours in Jaguar’s 360 TB of memory. [/Edit][/quote<] Given that their system clearly contains defective DRAM, I'm not even sure what to make of that number.

            • Mr Bill
            • 2 years ago

            [quote<]TBH I'm not entirely convinced that Memtest86 reads the ECC logs correctly. I generally turn ECC off in the BIOS if possible during the Memtest86 run, to make sure that I know about any single-bit errors.[/quote<] Not sure I understand this paper below but it seems to cover the same ground. This one section jumped out at me. See part 7.1 Error Logging Architecture; partially quoted below. [url=https://www.cs.virginia.edu/~gurumurthi/papers/asplos15.pdf<]Memory Errors in Modern Systems The Good, The Bad, and The Ugly[/url<] [quote<]In most x86 CPUs, DRAM errors are logged in a register bank in the northbridge block [4] [31]. Each register bank can log one error at a time; x86 architecture dictates that the hardware discard subsequent corrected errors until the bank is read and cleared by the operating system [5]. The oper- ating system typically reads the register bank via a polling routine executed once every few seconds [2]. Therefore, on a processor which issues multiple memory accesses per nanosecond, millions of errors may be discarded between consecutive reads of the register bank.[/quote<]

            • just brew it!
            • 2 years ago

            That’s a related (but separate) issue. I’m less worried about whether Memtest86 polls the register frequently enough, vs. does it poll it at all. Since it is a hardware register, and the implementation of hardware registers to monitor ECC is (AFAIK) not standardized, you’re relying on Memtest86 to have been updated to correctly identify and poll the memory controller in your specific CPU.

      • tipoo
      • 2 years ago

      So divide by 360 terabytes, times how many gigabytes you have, which makes a home computer very low on risk despite that 😉

        • Waco
        • 2 years ago

        Actually, if those numbers were true, everyone would need ECC.

    • llisandro
    • 2 years ago

    I never use ECC, but always build 3-10 identical computers and synchronize calculations to control for solar flares, and report my numbers +/- S.E.M. for maximum statistical reliability.

    Along these lines, I also reply:all to all work emails in triplicate after calculating md5 checksums. so there’s never any confusion as to what I’m bringing to the potluck.

    • ronch
    • 2 years ago

    ECC? No way. Computers don’t make mistakes, man.

    • SuperSpy
    • 2 years ago

    I’m firmly stuck between the ‘Anywhere I can’ and ‘Servers’ options. I’d like to use it anywhere I can, but thanks to Intel’s segmentation stupidity, I have to give up some combination of performance or price in order to get it.

    So I generally only have ECC in places that store files long-term.

    • just brew it!
    • 2 years ago

    TBH I am surprised the “Anywhere I can” choice got as many votes as it did.

    • f0d
    • 2 years ago

    i never really had an issue that i think has anything to do with memory errors, sure i have had power supplys and motherboards die but no actual errors that were caused by whats seems to be nothing (but were actually memory errors)

    i still have data from the 90’s that has been passed on from computer to computer with no errors

    all my important data is triple backed up so if something does happen no big deal

    so yeah dont see any need for ecc for me

      • just brew it!
      • 2 years ago

      [quote<]all my important data is triple backed up so if something does happen no big deal[/quote<] As long as you've taken steps to ensure that the [i<]backups[/i<] aren't corrupted (and can't be corrupted in storage), that's very good. If at least one copy is off site, I'd even say it's enterprise grade. When I'm creating "archival" backups, I run MD5 hashes of all of the files on the source, and again on the destination (backup) copy. The hashes are diffed to make sure they all match, and a copy of the hash list is stored on the backup media (so they can be verified again if the backup is needed).

    • hasseb64
    • 2 years ago

    When I built my WHS2011 (2011) server I wanted ECC, but it was to complicated and expensive, now almost 7 years later and not one single problem.
    Good I didn’t spend that extra cash on something completely unnecessary

      • just brew it!
      • 2 years ago

      Would you buy a car without seatbelts if it was available for $200 less than an otherwise identical car with seatbelts?

      “In 7 years seatbelts have never kept me from getting injured… why spend money on something unnecessary?”

        • hasseb64
        • 2 years ago

        I do not use seatbelts, so
        But ECC still a waste for normal people, there are of course un-normal people out there
        Anywho, I used my capital on a UPS instead, better investment

          • Spunjji
          • 2 years ago

          If you seriously don’t use seatbelts then you’re a danger to others as well as yourself and, frankly, not smart.

            • hasseb64
            • 2 years ago

            To others no, I am less dangerous to others due to the fact that I am NOT using seatbelt.
            But truth is I am only not using seatbelt while slow driving, but NOT using seat belt –>neck injuries is less freakvent in a crash.
            But I do not crash..

            • evilpaul
            • 2 years ago

            Please tell me you’re an organ donor.

            • just brew it!
            • 2 years ago

            TBH I’m not sure how seatbelts make a difference in how much of a danger you are to others at all (in either direction).

            As far as fewer neck injuries goes… yeah, maybe, or maybe not. But either way, this is at the cost of a higher rate of concussions from the driver’s skull impacting the windshield. Just because you’re driving slow doesn’t mean the person who hits you is. Also, if you ever drive a vehicle equipped with airbags, not wearing the seatbelt greatly increases your risk of injury [i<]from the airbag[/i<] in an airbag deployment situation. Just because you've never had a crash doesn't mean you never will, and it says absolutely nothing about how likely other drivers on the road are to crash into you.

            • hasseb64
            • 2 years ago

            As I wrote, in slow speed areas, you can assume others also drive in same pace. (I do not live in Russia)
            Neck is the most critical part in your body and easily hurt, if you TIE the body but not your head, then things happen, easy physics. I take a broken nose over any neck/nerve injury.
            We still speaking slow speeds.
            Some people crash, others don’t

        • tipoo
        • 2 years ago

        For consumer use it’s more like a flipped bit may or may not shift the shade of a pixel in a photo, not blow up the whole system. In the upper 99.9th percentile of worst case scenario, it’s a system reboot. Seatbelts even if rarely used have obviously higher stakes if not there.

        Now if you’re analyzing 3D protein structures for cancer research or are doing geological surveys for natural gas, obviously use ECC, because 300 dollars in ECC > flawed science or profits.

        • K-L-Waster
        • 2 years ago

        I’ll just leave this here….

        [url<]https://blog.codinghorror.com/to-ecc-or-not-to-ecc/[/url<] "Our study has several main findings. First, we find that approximately 70% of DRAM faults are recurring (e.g., permanent) faults, while only 30% are transient faults. Second, we find that large multi-bit faults, such as faults that affects an entire row, column, or bank, constitute over 40% of all DRAM faults. Third, we find that almost 5% of DRAM failures affect board-level circuitry such as data (DQ) or strobe (DQS) wires. Finally, we find that chipkill functionality reduced the system failure rate from DRAM faults by 36x."

          • just brew it!
          • 2 years ago

          Yes, DRAM has gotten a lot more reliable over the years, even without ECC. Personal experience from “back in the day” indicates that, circa early 1990s, the baseline “soft” error rate was on the order of a flipped bit per GB per month. Clearly, with the amount of RAM in current systems, we’d all be running ECC by now if the error rate was still this high.

          One thing I find interesting about that link is that he also basically says that he’d be using ECC if not for Intel’s product segmentation:
          [quote<]One thing is conspicuously missing in our 2016 build: Xeons, and ECC Ram. In my defense, this isn't intentional - we wanted the fastest per-thread performance and no Intel Xeon, either currently available or announced, goes to 4.0 GHz with Skylake. Paying half the price for a CPU with better per-thread performance than any Xeon, well, I'm not going to kid you, that's kind of a nice perk too.[/quote<]

        • K-L-Waster
        • 2 years ago

        Not sure that seat belts are the right analogy. This is more like having an array of temperature sensors all around the car to immediately detect if something has caught fire.

        Seat belts are more along the lines of using a UPS and nightly data backups.

      • Waco
      • 2 years ago

      How do you know your data is safe? How do you know you didn’t need it?

        • hasseb64
        • 2 years ago

        almost 7 years of 24/7 running says all

          • Waco
          • 2 years ago

          I disagree.

          • just brew it!
          • 2 years ago

          All that says is that you didn’t get an error that caused a crash (i.e. one that corrupted code or critical internal data structures). Given typical file server workloads, most of the data in RAM is going to be cached application/user data, not code or internal data structures. Odds are that when a bit flip occurs, it will corrupt that data since that’s the bigger target by far, without causing a crash.

          On a home media server this is “mostly harmless” since the data will typically be read once, and dropped from cache before it is accessed again; so any flipped bits get discarded and nobody is the wiser. Even if a few flipped bits make it back to the end user application, most people will just assume that a brief stutter in the video or glitch in the audio is due to a network hiccup, or Windows Update pegging the CPU in the background, or whatever.

          But on systems with significant write traffic, or where data is accessed, modified, and rewritten, eventually one of those flipped bits will hit a “dirty” cache page, and the bad data will get committed to permanent storage. And that’s when you have a classic case of bitrot.

            • hasseb64
            • 2 years ago

            Summed up: No need in typical home installations.

            • just brew it!
            • 2 years ago

            Many of the people on this forum are not typical home users.

            • hasseb64
            • 2 years ago

            I know, many still installes DVDs in fulltower cases.

    • nico1982
    • 2 years ago

    Cheese is actually better with errors (see Gorgonzola).

      • just brew it!
      • 2 years ago

      I’m allergic to Gorgonzola!

      (Which is really annoying because I like it… just can’t eat much of it or I regret it afterwards.)

      Edit: I guess my immune system is performing error detection (but not correction) when it comes to cheese errors…

    • Inverter
    • 2 years ago

    I voted “Anywhere I can put it”, which unfortunately is nowhere at all. My laptop, my phone, my tablet, my NAS, none of them support ECC. I’m hoping for Apple to run out of other ideas and finally push ECC mainstream across the board, like they did with other existing but not particularly wide spread technologies like IPS and SSD.

      • blastdoor
      • 2 years ago

      [quote<]I'm hoping for Apple to run out of other ideas and finally push ECC mainstream across the board, like they did with other existing but not particularly wide spread technologies like IPS and SSD.[/quote<] Hmm.... interesting idea there. I wonder how bit errors affect animated emojis? If somebody goes for a smile but ends up with a frown because of a bit error, you better believe there will be ECC in the next iPhone!

      • End User
      • 2 years ago

      Has anyone studied the impact of a lack of ECC memory in consumer mobile devices and the occurrence of corrupt data? Seems to me that would be a very interesting set of results.

      On the surface (pardon the pun) the lack of ECC in the consumer mobile space has had little to no impact.

        • just brew it!
        • 2 years ago

        We will probably never know how many annoying little random glitches are due to memory errors. I’m guessing that the vast majority are just software bugs, but it is entirely possible that some percentage of them are the result of memory corruption.

    • xigmatekdk
    • 2 years ago

    Hell, 99.9% Threadripper owners don’t even run ECC memory. It’s only used as a pointless bragging right after the crazy fast i9 processors released today to make them feel better about their purchase.

      • just brew it!
      • 2 years ago

      I’d bet it is a lot less than 99.9%. Probably more than half though…

    • jackbomb
    • 2 years ago

    I don’t even use ECC in my file server, and that thing has 20TB of storage.

    The thing is, in my 25+ years of messing around with computers, memory has never given me much trouble. Sure, I’ve had to replace bad DIMMs before, but in each case, the memory was so far gone that Windows wouldn’t boot properly and memtest86 would immediately throw hundreds of errors. I’ve never had “inconsistently bad” RAM where ECC could save me from data corruption.

      • remosito
      • 2 years ago

      Maybe with ECC you’d had noticed way before the situation got that bad as you get ECC correction logs.

      If my Server can’t boot anymore due to fucked RAM, 30+ ppl can go home and won’t come back till I have replacements installed. For the cost of that I can pay the “ECC tax” for the next 2000 years probably.

      It’s like with HDs/SSDs. I get error logs, and switch the damn things before they actually fail. (Even though we have RAID10)

        • just brew it!
        • 2 years ago

        [quote<]Maybe with ECC you'd had noticed way before the situation got that bad as you get ECC correction logs.[/quote<] Only if he's actually watching the ECC correction logs! If not, things would've probably appeared completely normal until the errors got bad enough to overwhelm the ECC. But even in this case, there'd be better protection from silent data corruption.

          • remosito
          • 2 years ago

          If he ain’t, he should.

          My number one job is making sure my servers run when people use them. ECC correction logs can be a kickass early warning system for memory hw issues.

      • just brew it!
      • 2 years ago

      Entirely possible that you had data corruption that you just didn’t notice before the DIMMs failed completely. IOW you got lucky.

        • Waco
        • 2 years ago

        It’s entirely possible he has data corruption and just doesn’t notice. That’s why it’s silent. 🙂

    • ptsant
    • 2 years ago

    My AMD FX 8350 workstation with 16GB ECC was the most stable system I’ve ever owned. I would have gotten ECC for Ryzen but at launch motherboard support was not clear and, unfortunately, prices have increased a lot so I couldn’t afford to experiment.

    I plan to buy 4x16GB ECC at some point in the future, when RAM prices return to sane levels and if I can find reports of people successfully using ECC with my motherboard (Asus Crosshair VI), something which I believe is highly likely.

    Globally, I would say that the slight price difference is worth it. The only reason for not using ECC is (a) because Intel said so with artificial product segmentation and (b) because you really, really only do web browsing and occasional games. Given the amount of effort people invest in their “digital lives”, including photo libraries, music libraries and other content, I believe that the second scenario is becoming increasingly rare.

      • just brew it!
      • 2 years ago

      [quote<]My AMD FX 8350 workstation with 16GB ECC was the most stable system I've ever owned.[/quote<] Ditto. I was annoyed yesterday when I got home from work to discover it had rebooted. But after some investigation, I figured out that we'd had a power "event" of some sort during the day (clock in the bedroom was blinking). I guess I need some new batteries for my UPS...

    • Jigar
    • 2 years ago

    Cheese please…

    EDIT: Voted Cheese 😀

    • JosiahBradley
    • 2 years ago

    My laptop has ECC RAM and all my servers. So lots of ECC RAM out there. If you have data to protect go for it.

    • TEAMSWITCHER
    • 2 years ago

    Isn’t system RAM just one thing. There are buffers strung throughout a modern computer on I/O devices like storage and networking to improve performance. To truly protect yourself from a memory error, wouldn’t ECC RAM need to be present everywhere? Otherwise you are only a strong as the weakest link.

      • just brew it!
      • 2 years ago

      Most of the links in the chain do have some sort of protection. Network and disk interfaces detect bad blocks of data via a CRC check at the receiving end, and request a retransmit if there’s an error (since there’s still a good copy of the data at the sending end). CPU caches are typically ECC protected as well.

      I’m not sure if the caches on consumer HDDs are protected (I suspect not), but the caches are small and the data spends only a few seconds there at most, so the risk is small. I know at least some enterprise drives provide “end to end” CRC checks of the entire internal data path from the host interface all the way up to right before the data gets written to the platters (and vice-versa).

      Data stored on the physical media (HDD platters or flash chips) uses very sophisticated ECC algorithms (more robust than what’s used for ECC RAM), but this is out of necessity since the raw error rate of the underlying media is much higher.

      Edit: Something else to consider… without ECC, the system RAM is probably the most likely place for undetected randomly flipped bits to occur. So by using ECC RAM, you [i<]are[/i<] addressing the weakest link. (I'm not counting software bugs.)

    • DragonDaddyBear
    • 2 years ago

    If reliability needs > performance needs then ECC.

      • just brew it!
      • 2 years ago

      Performance hit of unregistered ECC is minimal.

      Yes, registered ECC introduces additional latency. But registered modules are typically required only when pushing to extremely high RAM capacities.

        • mcarson09
        • 2 years ago

        You can overclock registered memory too. Back in the Westmire days Overclocking workstations was awesome. ASROCK makes a few haswell-e board with oc options. OCing memory without an unlocked chip outside of the 16xx series that matches up with their desktop counter parts is not really fun.

      • bhtooefr
      • 2 years ago

      I’d actually argue that when you’re pushing RAM clocks, ECC makes [i<]more[/i<] sense (except for the whole problem where Intel won't let you overclock and use ECC at the same time, because only Xeons can use ECC, and they can't OC practically). Unregistered ECC, as just brew it! said, has basically no overhead... but it can get you a hell of a lot more reliability when you're pushing the performance limits.

    • thecoldanddarkone
    • 2 years ago

    Where is the Fat oversized laptop option, or do we stuff them into workstations? The class is called mobile workstations so it does fit. Examples are Thinkpad p51/71, hp zbook 15/17, dell 75xx, and some other workstation models.

    • Krogoth
    • 2 years ago

    ECC memory should be a standard feature on modern desktop systems. Memory capacities and density on modern systems are reaching the point that silent errors from random flips are no longer a “once in a blue moon” issue.

    The performance penalty is almost nothing these days on modern systems. What really is holding back ECC memory support is market segmentation non-sense and the extra cost of adding extra tracing for it on motherboards.

      • mcarson09
      • 2 years ago

      Indeed especially now that there is unbuffered ECC. Back when I started using ECC there was only the buffered kind. One thing they bothers me with Xeons that can run in a desktop board you still need a WS chipset to use the ecc function. Ryzen is listed as having support, but I’ve seen no bios support for it.

        • just brew it!
        • 2 years ago

        IIRC unbuffered ECC has existed all the way back to the original (non-DDR) SDRAM generation. Support for it (and the modules themselves) definitely seems to be more common these days though.

        • shank15217
        • 2 years ago

        There is suport for ECC ram with Ryzen motherboards, many of the higher end boards have it.

          • just brew it!
          • 2 years ago

          This was far from clear at product launch (and for at least a couple of months after). AMD and motherboard makers were all being evasive about whether it was really supported or not.

    • TwoEars
    • 2 years ago

    Completely useless in a consumer PC but should be used in computers and servers with critical up-time. That’s about it really. You’re using it in your gaming and internet desktop? Fine.. I hear some people like gold plated rims too.

    • blastdoor
    • 2 years ago

    I have owned systems with and without ECC and I cannot perceive a difference. With both types of systems, I will leave them running for days at a time on tasks that max out all CPU cores and use several gigabytes of RAM.

    I certainly cannot prove that the non-ECC systems weren’t affected negatively by the lack of ECC, but I have never had even a hint of evidence to suspect that ECC has made any kind of difference.

    Naturally I recognize that I might have been affected by errors that I’m unaware of, and/or that I might have just gotten lucky.

    i wish that there was some evidence regarding the real-world efficacy of ECC RAM in a workstation context, but I have yet to see it. I recall a paper from a few years ago about ECC in a server context, but I couldn’t even begin to translate the findings from that paper into something that was meaningful to me.

    Oh well….

      • just brew it!
      • 2 years ago

      Well, I’ve seen single-bit errors (“bit rot”) creep into files. I can’t prove ECC would’ve prevented it (maybe it occurred in some other component like a disk interface instead), but it makes me want as much data protection as I can (reasonably) get.

      I’ve seen servers log corrected ECC errors. Without ECC, these could have potentially caused silent data corruption.

      I’ve seen systems protected only by parity (not full ECC) take parity exceptions, causing a processor halt. If not for the parity exception, this could have resulted in silent data corruption as well.

      Has my use of ECC on my home systems measurably made my life better? I couldn’t say, as I don’t religiously check the ECC logs. But it has given me some peace of mind, and other than a small hit on cost, it hasn’t hurt.

        • mcarson09
        • 2 years ago

        Were you overclocking said system that had bit rot?

        Were you using RAID in your system?

        If yes: Were you using software raid? I haven’t seen it with hardware raid.

          • just brew it!
          • 2 years ago

          No and no. I do use software RAID now, but have not observed bit rot on the last 2 servers.

          • Waco
          • 2 years ago

          I’ve seen bitrot on both software and hardware RAID configs on enterprise-grade hardware. It can happen, especially at scale. Full ECC for the entire stack.

      • Jubijub
      • 2 years ago

      Never used any either in any of the systems I built…

      On a server on 24/7 there might be a need, but on a personal workstation I never quite saw the use : I never had corrupted memory resulting in loss of files (got the occasional hang here and there, but it’s hard to pinpoint if it was RAM related), and almost all data loss I endured were due to faulty drives, or my own stupidity.

    • just brew it!
    • 2 years ago

    If the motherboard and CPU support ECC, I generally use ECC modules.

    Nearly all of the motherboards I’ve purchased over the past decade have been ECC-capable; it was one of the factors that kept me with AMD even after they started to fall behind the performance curve. It has also made me a loyal Asus customer, since — unlike other vendors — their consumer 939/AM2/2+/3/3+ motherboards consistently supported ECC through the years.

    I also tend to recycle old desktops as file servers — hardware generally moves from the primary desktop (newest) to the secondary desktop (1 gen older than primary desktop) and finally to the file server (2 gens older than primary desktop). Having the ECC is nice from that standpoint as well.

    None of the laptops I’ve owned supported it, so I haven’t used it there.

    I would never build or spec a business server or high-end workstation without ECC.

      • mcarson09
      • 2 years ago

      Laptops with it cost through the nose. To be honest it’s not WS/server unless it has ECC. Systems used as servers/WS without ECC are just desktops.

      • AnotherReader
      • 2 years ago

      For the same reasons, I have stood by AMD and Asus even when AMD was well behind Intel in performance. I use ECC in all of my desktops/workstations.

    • bhtooefr
    • 2 years ago

    Anywhere I can put it, but with qualifications.

    My server has it (registered, too, IIRC), obviously.

    My main desktop does not – I wasn’t going to build with X99 just to get it. None of my laptops have it either.

    I do, however, have an Abit BP6 build that I shoved ECC in, because when you’re buying PC133 in 2017, it doesn’t really matter whether it’s ECC or not, and the 440BX will do ECC, so why not? Of course, that doesn’t help when you’ve got dual Celeron 366s over 500 MHz, and the CPUs are making garbage data because you haven’t pushed the overvolt far enough… (I really should’ve held out for a pair of 333s, or even 300As…)

    • albundy
    • 2 years ago

    True Zen lies within parity.

      • just brew it!
      • 2 years ago

      Nit pick: ECC RAM uses [url=https://en.wikipedia.org/wiki/Hamming_code<]Hamming codes[/url<], not simple parity, which can only detect (not correct) errors. 😉

    • SuperPanda
    • 2 years ago

    Given how hard memory errors are to track down, and how they can so easily silently corrupt things that you might not notice until it’s too late, it’s insane that anyone would build anything but a toy box without ECC.

    Until, that is, you realize that Intel has been the only game in town for high performance CPUs for years. Intel’s insistence upon artificially restricting ECC RAM to Xeon chips has forced people to choose between higher performing and cheaper systems without ECC and lower performing and more expensive systems with ECC. That normalizes no-ECC and makes ECC RAM even rarer and more expensive, resulting in a feedback loop further diminishing the usage of ECC RAM.

    Asking “does anyone really need ECC?” is like asking “does a mainstream platform really need more than four cores?”. Just because Intel’s monopoly has normalized something doesn’t mean that something is a good thing. Hopefully AMD’s resurgent competitiveness continues to point out areas where Intel has been holding back progress for years.

      • just brew it!
      • 2 years ago

      While I’m definitely an advocate of using ECC, things are almost never that cut-and-dried. You don’t need it for surfing the web, gaming, or playing videos… which is what most home users do on their PCs. For those users, it’s not “insane” to have a non-ECC box.

      But if you work on or store anything “important” on a system, you really want as much protection from a data integrity standpoint as possible. I suppose some of my ECC fanboyism comes from my days as an independent contractor, when my PC was my sole source of income. I’m a real stickler for system stability; this is also why the Ryzen gcc segfault issue was a showstopper for me — it implied a fundamental stability issue in the hardware (which has thankfully now been fixed).

      Business/enterprise servers need ECC, full stop. I would argue that business desktops should have it as well, but I think it is fairly rare unless you go for “workstation” class gear since most business desktops use consumer Intel CPUs, which lack ECC capability.

        • Waco
        • 2 years ago

        What he said.

      • mcarson09
      • 2 years ago

      I need ECC in even my TOY box. Someone has to save the porn…

        • bhtooefr
        • 2 years ago

        I have ECC in one of my toy boxes, and [i<]not[/i<] in my main machine. (Because it's 440BX, and therefore supports ECC out of the box, whereas Z170 does not.)

    • Prototyped
    • 2 years ago

    I bought a Kaby Lake Xeon E3 processor (1275, about the same as a Core i7 7700 non-K) specifically to run it with ECC RAM. The cost delta between ECC and non-ECC DDR4 has all but disappeared.

    Turned out the main cost was actually in buying a suitable motherboard — a C236 board, the chipset for which is about the same as a Q170, cost about 1.5x what a usual Z170 or Z270 board would have cost. That and the uncertainty as to whether the board would even boot with a Kaby Lake Xeon, given that all these motherboards launched with Skylake Xeon E3 last year — I ended up going with a retailer that happened to have a C236 board and was willing to update the BIOS for me prior to shipment.

      • egon
      • 2 years ago

      Main thing that’s stood between me and ECC memory is the motherboard situation compared to Z170/Z270 – far fewer desktop-oriented options, not as many boxes ticked spec-wise. The small performance penalty going with a Xeon instead of an equivalent price Core isn’t as big a deal, but it factors in.

      Basically, for my personal needs, ECC would be ‘nice to have’, whereas Z170/Z270 motherboard choice is nicer to have. It’d be nicest to have both, but thanks to Intel’s market segmentation, nicest isn’t possible.

    • Air
    • 2 years ago

    I was going to have to abstain from aswering, but luckly it had the perfect option for me.

    • derFunkenstein
    • 2 years ago

    I’ve never built anything with ECC. Machines at work have it, but those aren’t “my” servers and I didn’t build them (so I didn’t choose).

      • mcarson09
      • 2 years ago

      No cheese for YOU.

    • jts888
    • 2 years ago

    I normally used it where I could, but Ryzen’s uncommonly strong performance scaling with DDR clocks has me hesitating.

    It seems I can only choose between: >3Ghz at 8 GB/channel w/o ECC, ~2.9 GHz at 32 GB/s w/o ECC, or much slower with ECC.

    I’d kill for 64-128 GB (on Threadripper) of 2933-3200 MHz ECC UDIMMs, but apparently I’m not waving enough cash around in my fist to get the results I want.

    • homerdog
    • 2 years ago

    Servers and the occasional workstation if the customer asks for it. But most of my customers who need “fast” computers are using old, bloated, barely functional line of business software that will never go multithreaded. One guy requested the fastest machine I could build (within reason) so I built him an 8 core Xeon beast with a wad of ECC RAM. He kept complaining about poor performance so I swapped it with a 4790K and some fast non-ECC memory. He never complained again.

    • Bauxite
    • 2 years ago

    Anything actually running programs or scripts 24/7 and not idle with no one at a keyboard (“serverish”) or fire-and-forget systems, where bad data, downtime or crashes would either not be noticed right away or would cause extra pain or hassle. Offsite security camera dvr or NAS would be classic examples.

    • srg86
    • 2 years ago

    I’d use it in a server, or high end workstation. But for anything else I think its a nice to have.

    I’d probably use it if my machines supported it, but for “warm fuzzy” feelings only, I’ve never felt I’ve needed it.

    • chuckula
    • 2 years ago

    I use it in servers where it’s an insurance policy that you hope you never need to rely on.

    Everywhere else it’s a ‘nice to have’ in the abstract sense, but then again faster clocked RAM is another ‘nice to have’ and anybody who actually has used ECC know that the two do not go together.

      • mcarson09
      • 2 years ago

      I’d rather have higher capacity with memory over speed.

    • Takeshi7
    • 2 years ago

    Everywhere I can put it, which is only my Mac Pro and Power Mac G5. None of my PCs support it.

    • LostCat
    • 2 years ago

    I don’t use it, but I wish I did.

      • DancinJack
      • 2 years ago

      Why? (legit asking)

        • LostCat
        • 2 years ago

        Extra reliability is never a bad thing to have in complex systems.

      • Klogg
      • 2 years ago

      This! I always feel like these poll lack an aspiration choice to match the dismissive choice. I don’t use ECC because intel won’t let me with affordable parts, and I wasn’t patient enough for Ryzen capable motherboards before I pulled the trigger.

      But I wants it.

Pin It on Pinterest

Share This