ECC error logging

Discuss the core components that make up the heart and soul of any good computer.

Moderators: Flying Fox, Thresher

ECC error logging

Postposted on Wed Apr 04, 2012 12:01 pm

Hello everyone, this is my first question in the techreport forums.

I am using 4x4GB of kingston unbuffered ECC DDR3-1333 with an AMD Phenom II 945 and an ASUS Crosshair Formula V (990FX). The system is supposed to support ECC. Despite some decent efforts, I couldn't find a log of RAM ECC errors. Maybe there are no logs for that? Or no ECC errors? Where would I normally find these errors in a Windows 7 system? (I am not afraid to use the console if I have to :-))

Also, in case anyone knows, the DIMMS are supposed to have a thermal sensor. Where can I find a temperature reading? It doesn't appear in any of the motherboard tools.

Thanks a lot!
ptsant
Gerbil
Gold subscriber
 
 
Posts: 54
Joined: Mon Oct 05, 2009 12:45 pm

Re: ECC error logging

Postposted on Wed Apr 04, 2012 2:28 pm

Windows 7 apparently understands how to leverage ECC. It should be in the System event log.

PFA Performed by WHEA

How WHEA Performs PFA on ECC Memory

Predictive Failure Analysis (PFA)

WHEA Policy Settings

Hypothetical example I snarfed:
Code: Select all
Log Name:      System
Source:        Microsoft-Windows-WHEA-Logger
Date:          10/12/1492 11:20:48 AM
Event ID:      19
Task Category: None
Level:         Warning
Keywords:       
User:          LOCAL SERVICE
Computer:      hal
Description:
A corrected hardware error occurred.   
Error Source: Corrected Machine Check
Error Type: Bus/Interconnect Error
Processor ID Valid: Yes
Processor ID: 0x0
Bank Number: 4
Transaction Type: N/A
Processor Participation: Local node responded to the request
Request Type: Generic Read
Memory/Io: Memory
Memory Hierarchy Level: Generic
Timeout: No
"Welcome back my friends to the show that never ends. We're so glad you could attend. Come inside! Come inside!"
Ryu Connor
Global Moderator
Gold subscriber
 
 
Posts: 3563
Joined: Thu Dec 27, 2001 7:00 pm
Location: Marietta, GA

Re: ECC error logging

Postposted on Thu Apr 05, 2012 12:32 pm

Ryu Connor wrote:Windows 7 apparently understands how to leverage ECC. It should be in the System event log.


Thanks for the detailed answer. I managed to find the Microsoft-Windows-Kernel-WHEA log but it contained no errors! Well, I guess that's good news in a way although it makes me feel as if I didn't really need ECC.

I did find in the same log the following:
Code: Select all
WHEA successfully initialized.
   4 error sources are active
   Error record format version is 10.


I assume that the 4 error sources are the 4 dimms, which makes sense.

However, according to the links you provided, it appears that windows will only log errors after a certain threshold. Corrected ECC errors are normally not logged, which I think is fairly frustrating.

I'll keep you up to date if I find something more...
ptsant
Gerbil
Gold subscriber
 
 
Posts: 54
Joined: Mon Oct 05, 2009 12:45 pm

Re: ECC error logging

Postposted on Thu Apr 05, 2012 12:52 pm

Just thinking out loud more than anything, but why would you care about *corrected* ECC errors? You basically want to know that ECC is doing it's job, or are you wanting to keep track for some strange reason?
Siglessness is boring.
Image - M4800-Eight1
Image - Vargr-Z97
Forge
Lord High Gerbil
 
Posts: 8047
Joined: Wed Dec 26, 2001 7:00 pm
Location: SouthEast PA

Re: ECC error logging

Postposted on Thu Apr 05, 2012 4:21 pm

Forge wrote:Just thinking out loud more than anything, but why would you care about *corrected* ECC errors? You basically want to know that ECC is doing it's job, or are you wanting to keep track for some strange reason?


For the same reason that you want to know that your car is overheating before it burns down. I think there is a difference between getting say 1 error a week and getting 1 error per hour. Most importantly, I'm just curious, and I'm looking for a way to justify my purchasing decision. If I don't get any (or very, very few) corrected errors, I could have gotten away with non-ECC RAM, which is cheaper and faster...
ptsant
Gerbil
Gold subscriber
 
 
Posts: 54
Joined: Mon Oct 05, 2009 12:45 pm

Re: ECC error logging

Postposted on Thu Apr 05, 2012 4:26 pm

I could very possibly be wrong, but it's my understanding that if you see corrected errors per WEEK, much less per day or per hour, then you have an unusual situation and or a developing problem. When I ran an ECC-enabled system, many moons ago, I think I saw two corrected and zero uncorrected in two or three years.
Siglessness is boring.
Image - M4800-Eight1
Image - Vargr-Z97
Forge
Lord High Gerbil
 
Posts: 8047
Joined: Wed Dec 26, 2001 7:00 pm
Location: SouthEast PA

Re: ECC error logging

Postposted on Fri Apr 06, 2012 3:28 am

Forge wrote:I could very possibly be wrong, but it's my understanding that if you see corrected errors per WEEK, much less per day or per hour, then you have an unusual situation and or a developing problem. When I ran an ECC-enabled system, many moons ago, I think I saw two corrected and zero uncorrected in two or three years.


I haven't got any corrected errors yet under win7, but I understand that they are only reported after a certain threshold per chip. So I imagine that some have occured, but were not logged. Linux does more detailed reporting, but I don't have a very long uptime right now. Anyway, I read this http://research.google.com/pubs/pub35162.html article, published by google, where they estimate approximately 2000-6000 correctable errors per year per GB. With 16GB, I could be getting something like 1000 corrected errors per week. Uncorrectable errors (detected ones), are at least 1000 times less frequent, so I don't expect many to occur.
ptsant
Gerbil
Gold subscriber
 
 
Posts: 54
Joined: Mon Oct 05, 2009 12:45 pm

Re: ECC error logging

Postposted on Fri Apr 06, 2012 4:43 am

I wouldn't be surprise if the desktop versions of Windows lack any built-in ECC monitoring tools, since ECC memory is typically found in server-level equipment.

IMO, ECC support will start to become more important as memory capacities in desktop system start to go into tens of GiBs.
Ivy Bridge i5-3570K@4.0Ghz, Gigabyte Z77X-UD3H, 2x4GiB of PC-12800, EVGA 660Ti, Corsair CX-600 and Fractal Refined R4 (W). Kentsfield Q6600@3Ghz, HD 4850 2x2GiB PC2-6400, Gigabyte EP45-DS4P, OCZ Modstream 700W, and PC-7B.
Krogoth
Maximum Gerbil
Silver subscriber
 
 
Posts: 4439
Joined: Tue Apr 15, 2003 3:20 pm
Location: somewhere on Core Prime

Re: ECC error logging

Postposted on Fri Apr 06, 2012 5:54 am

Krogoth wrote:I wouldn't be surprise if the desktop versions of Windows lack any built-in ECC monitoring tools, since ECC memory is typically found in server-level equipment.


It certainly isn't easy to find out what's happening, which is why I asked. Even under linux, this functionality, although present, is not really advertised anywhere! According to the sources cited previously, ECC support IS present under Win Vista and 7 and is on by default. Furthermore, predictive failure analysis (PFA) will report recurrent errors after a certain threshold. As you say, monitoring tools (other than the event viewer) are lacking... Well, that's how server manufacturers make a living I guess.
ptsant
Gerbil
Gold subscriber
 
 
Posts: 54
Joined: Mon Oct 05, 2009 12:45 pm

Re: ECC error logging

Postposted on Mon Nov 11, 2013 10:47 am

ptsant wrote:here they estimate approximately 2000-6000 correctable errors per year per GB. With 16GB, I could be getting something like 1000 corrected errors per week. Uncorrectable errors (detected ones), are at least 1000 times less frequent, so I don't expect many to occur.


???

The abstract plainly says: "...and more than 8\% of DIMMs affected by errors per year."

If only ~8% of DIMMs are affected by errors per year, how is what you are saying even remotely possible?

ptsant wrote:It certainly isn't easy to find out what's happening,


Perhaps, but more likely you haven't actually encountered any errors yet and so there is nothing to report.
Glorious
Darth Gerbil
Gold subscriber
 
 
Posts: 7877
Joined: Tue Aug 27, 2002 6:35 pm

Re: ECC error logging

Postposted on Mon Nov 11, 2013 10:57 am

Krogoth wrote:IMO, ECC support will start to become more important as memory capacities in desktop system start to go into tens of GiBs.

We're already getting there. My new builds have 16GB.

Glorious wrote:
ptsant wrote:here they estimate approximately 2000-6000 correctable errors per year per GB. With 16GB, I could be getting something like 1000 corrected errors per week. Uncorrectable errors (detected ones), are at least 1000 times less frequent, so I don't expect many to occur.


???

The abstract plainly says: "...and more than 8\% of DIMMs affected by errors per year."

If only ~8% of DIMMs are affected by errors per year, how is what you are saying even remotely possible?

It is certainly possible if those 8% marginal DIMMs are getting lots of errors!
(this space intentionally left blank)
just brew it!
Administrator
Gold subscriber
 
 
Posts: 37853
Joined: Tue Aug 20, 2002 10:51 pm
Location: Somewhere, having a beer

Re: ECC error logging

Postposted on Mon Nov 11, 2013 1:15 pm

JBI wrote:It is certainly possible if those 8% marginal DIMMs are getting lots of errors!


Sure, but I don't think that's what he was talking about.

ptsant wrote:and I'm looking for a way to justify my purchasing decision. If I don't get any (or very, very few) corrected errors, I could have gotten away with non-ECC RAM, which is cheaper and faster...


Far be it from me to discourage anyone from ECC, and I personally think it's a darn dirty shame that it is somehow seen as a server-only "feature" these days, but odds are that ptsant won't see any corrected errors in a year.

EDIT: ACK! Spam Revival! :(
Glorious
Darth Gerbil
Gold subscriber
 
 
Posts: 7877
Joined: Tue Aug 27, 2002 6:35 pm


Return to Motherboards, Chipsets, & RAM

Who is online

Users browsing this forum: No registered users and 1 guest