ECC error logging

Discuss the core components that make up the heart and soul of any good computer.

Moderators: Flying Fox, Thresher

ECC error logging

Postposted on Wed Apr 04, 2012 11:01 am

Hello everyone, this is my first question in the techreport forums.

I am using 4x4GB of kingston unbuffered ECC DDR3-1333 with an AMD Phenom II 945 and an ASUS Crosshair Formula V (990FX). The system is supposed to support ECC. Despite some decent efforts, I couldn't find a log of RAM ECC errors. Maybe there are no logs for that? Or no ECC errors? Where would I normally find these errors in a Windows 7 system? (I am not afraid to use the console if I have to :-))

Also, in case anyone knows, the DIMMS are supposed to have a thermal sensor. Where can I find a temperature reading? It doesn't appear in any of the motherboard tools.

Thanks a lot!
ptsant
Gerbil
 
Posts: 31
Joined: Mon Oct 05, 2009 11:45 am

Re: ECC error logging

Postposted on Wed Apr 04, 2012 1:28 pm

Windows 7 apparently understands how to leverage ECC. It should be in the System event log.

PFA Performed by WHEA

How WHEA Performs PFA on ECC Memory

Predictive Failure Analysis (PFA)

WHEA Policy Settings

Hypothetical example I snarfed:
Code: Select all
Log Name:      System
Source:        Microsoft-Windows-WHEA-Logger
Date:          10/12/1492 11:20:48 AM
Event ID:      19
Task Category: None
Level:         Warning
Keywords:       
User:          LOCAL SERVICE
Computer:      hal
Description:
A corrected hardware error occurred.   
Error Source: Corrected Machine Check
Error Type: Bus/Interconnect Error
Processor ID Valid: Yes
Processor ID: 0x0
Bank Number: 4
Transaction Type: N/A
Processor Participation: Local node responded to the request
Request Type: Generic Read
Memory/Io: Memory
Memory Hierarchy Level: Generic
Timeout: No
"Welcome back my friends to the show that never ends. We're so glad you could attend. Come inside! Come inside!"
Ryu Connor
Global Moderator
 
Posts: 3103
Joined: Thu Dec 27, 2001 6:00 pm
Location: Marietta, GA

Re: ECC error logging

Postposted on Thu Apr 05, 2012 11:32 am

Ryu Connor wrote:Windows 7 apparently understands how to leverage ECC. It should be in the System event log.


Thanks for the detailed answer. I managed to find the Microsoft-Windows-Kernel-WHEA log but it contained no errors! Well, I guess that's good news in a way although it makes me feel as if I didn't really need ECC.

I did find in the same log the following:
Code: Select all
WHEA successfully initialized.
   4 error sources are active
   Error record format version is 10.


I assume that the 4 error sources are the 4 dimms, which makes sense.

However, according to the links you provided, it appears that windows will only log errors after a certain threshold. Corrected ECC errors are normally not logged, which I think is fairly frustrating.

I'll keep you up to date if I find something more...
ptsant
Gerbil
 
Posts: 31
Joined: Mon Oct 05, 2009 11:45 am

Re: ECC error logging

Postposted on Thu Apr 05, 2012 11:52 am

Just thinking out loud more than anything, but why would you care about *corrected* ECC errors? You basically want to know that ECC is doing it's job, or are you wanting to keep track for some strange reason?
pa' ngaSwI' nuq vay' Data'nISbogh roD tu'lu' tlhIH'a'?! yIn jaj tera' na'ran wIb laH HInob Qub 'oH rue munISbogh 'Iv ghaH DaSov'a'? loD chay'pen tuqwIj meQ 'Iv jIH! TERA' NA'RAN WIB! chay'pen burns tuq vI'ogh 'ogh, chombuStIble tera' na'ran wIb!
Forge
Darth Gerbil
 
Posts: 7631
Joined: Wed Dec 26, 2001 6:00 pm
Location: SouthEast PA

Re: ECC error logging

Postposted on Thu Apr 05, 2012 3:21 pm

Forge wrote:Just thinking out loud more than anything, but why would you care about *corrected* ECC errors? You basically want to know that ECC is doing it's job, or are you wanting to keep track for some strange reason?


For the same reason that you want to know that your car is overheating before it burns down. I think there is a difference between getting say 1 error a week and getting 1 error per hour. Most importantly, I'm just curious, and I'm looking for a way to justify my purchasing decision. If I don't get any (or very, very few) corrected errors, I could have gotten away with non-ECC RAM, which is cheaper and faster...
ptsant
Gerbil
 
Posts: 31
Joined: Mon Oct 05, 2009 11:45 am

Re: ECC error logging

Postposted on Thu Apr 05, 2012 3:26 pm

I could very possibly be wrong, but it's my understanding that if you see corrected errors per WEEK, much less per day or per hour, then you have an unusual situation and or a developing problem. When I ran an ECC-enabled system, many moons ago, I think I saw two corrected and zero uncorrected in two or three years.
pa' ngaSwI' nuq vay' Data'nISbogh roD tu'lu' tlhIH'a'?! yIn jaj tera' na'ran wIb laH HInob Qub 'oH rue munISbogh 'Iv ghaH DaSov'a'? loD chay'pen tuqwIj meQ 'Iv jIH! TERA' NA'RAN WIB! chay'pen burns tuq vI'ogh 'ogh, chombuStIble tera' na'ran wIb!
Forge
Darth Gerbil
 
Posts: 7631
Joined: Wed Dec 26, 2001 6:00 pm
Location: SouthEast PA

Re: ECC error logging

Postposted on Fri Apr 06, 2012 2:28 am

Forge wrote:I could very possibly be wrong, but it's my understanding that if you see corrected errors per WEEK, much less per day or per hour, then you have an unusual situation and or a developing problem. When I ran an ECC-enabled system, many moons ago, I think I saw two corrected and zero uncorrected in two or three years.


I haven't got any corrected errors yet under win7, but I understand that they are only reported after a certain threshold per chip. So I imagine that some have occured, but were not logged. Linux does more detailed reporting, but I don't have a very long uptime right now. Anyway, I read this http://research.google.com/pubs/pub35162.html article, published by google, where they estimate approximately 2000-6000 correctable errors per year per GB. With 16GB, I could be getting something like 1000 corrected errors per week. Uncorrectable errors (detected ones), are at least 1000 times less frequent, so I don't expect many to occur.
ptsant
Gerbil
 
Posts: 31
Joined: Mon Oct 05, 2009 11:45 am

Re: ECC error logging

Postposted on Fri Apr 06, 2012 3:43 am

I wouldn't be surprise if the desktop versions of Windows lack any built-in ECC monitoring tools, since ECC memory is typically found in server-level equipment.

IMO, ECC support will start to become more important as memory capacities in desktop system start to go into tens of GiBs.
Ivy Bridge i5-3750K@stock, Gigabyte Z77X-UD3H, 2x4GiB of PC-12800, EVGA 660Ti, Corsair CX-600 and Fractal Refined R4 (W). Kentsfield Q6600@3Ghz, HD 4850 2x2GiB PC2-6400 = 4GiB total, Gigabyte EP45-DS4P, PC P&C Silencer 610W, and PC-7B.
Krogoth
Maximum Gerbil
 
Posts: 4247
Joined: Tue Apr 15, 2003 2:20 pm
Location: somewhere on Core Prime

Re: ECC error logging

Postposted on Fri Apr 06, 2012 4:54 am

Krogoth wrote:I wouldn't be surprise if the desktop versions of Windows lack any built-in ECC monitoring tools, since ECC memory is typically found in server-level equipment.


It certainly isn't easy to find out what's happening, which is why I asked. Even under linux, this functionality, although present, is not really advertised anywhere! According to the sources cited previously, ECC support IS present under Win Vista and 7 and is on by default. Furthermore, predictive failure analysis (PFA) will report recurrent errors after a certain threshold. As you say, monitoring tools (other than the event viewer) are lacking... Well, that's how server manufacturers make a living I guess.
ptsant
Gerbil
 
Posts: 31
Joined: Mon Oct 05, 2009 11:45 am


Return to Motherboards, Chipsets, & RAM

Who is online

Users browsing this forum: Google Adsense [Bot] and 3 guests