Personal computing discussed

Moderators: renee, Flying Fox, Thresher

 
ptsant
Gerbil XP
Topic Author
Posts: 397
Joined: Mon Oct 05, 2009 12:45 pm

ECC error logging

Wed Apr 04, 2012 12:01 pm

Hello everyone, this is my first question in the techreport forums.

I am using 4x4GB of kingston unbuffered ECC DDR3-1333 with an AMD Phenom II 945 and an ASUS Crosshair Formula V (990FX). The system is supposed to support ECC. Despite some decent efforts, I couldn't find a log of RAM ECC errors. Maybe there are no logs for that? Or no ECC errors? Where would I normally find these errors in a Windows 7 system? (I am not afraid to use the console if I have to :-))

Also, in case anyone knows, the DIMMS are supposed to have a thermal sensor. Where can I find a temperature reading? It doesn't appear in any of the motherboard tools.

Thanks a lot!
Image
 
Ryu Connor
Global Moderator
Posts: 4369
Joined: Thu Dec 27, 2001 7:00 pm
Location: Marietta, GA
Contact:

Re: ECC error logging

Wed Apr 04, 2012 2:28 pm

Windows 7 apparently understands how to leverage ECC. It should be in the System event log.

PFA Performed by WHEA

How WHEA Performs PFA on ECC Memory

Predictive Failure Analysis (PFA)

WHEA Policy Settings

Hypothetical example I snarfed:
Log Name:      System 
Source:        Microsoft-Windows-WHEA-Logger
Date:          10/12/1492 11:20:48 AM
Event ID:      19
Task Category: None
Level:         Warning
Keywords:       
User:          LOCAL SERVICE
Computer:      hal
Description:
A corrected hardware error occurred.   
Error Source: Corrected Machine Check
Error Type: Bus/Interconnect Error
Processor ID Valid: Yes
Processor ID: 0x0
Bank Number: 4
Transaction Type: N/A
Processor Participation: Local node responded to the request
Request Type: Generic Read
Memory/Io: Memory
Memory Hierarchy Level: Generic
Timeout: No
All of my written content here on TR does not represent or reflect the views of my employer or any reasonable human being. All content and actions are my own.
 
ptsant
Gerbil XP
Topic Author
Posts: 397
Joined: Mon Oct 05, 2009 12:45 pm

Re: ECC error logging

Thu Apr 05, 2012 12:32 pm

Ryu Connor wrote:
Windows 7 apparently understands how to leverage ECC. It should be in the System event log.


Thanks for the detailed answer. I managed to find the Microsoft-Windows-Kernel-WHEA log but it contained no errors! Well, I guess that's good news in a way although it makes me feel as if I didn't really need ECC.

I did find in the same log the following:
WHEA successfully initialized.
   4 error sources are active
   Error record format version is 10.


I assume that the 4 error sources are the 4 dimms, which makes sense.

However, according to the links you provided, it appears that windows will only log errors after a certain threshold. Corrected ECC errors are normally not logged, which I think is fairly frustrating.

I'll keep you up to date if I find something more...
Image
 
Forge
Lord High Gerbil
Posts: 8253
Joined: Wed Dec 26, 2001 7:00 pm
Location: Gone

Re: ECC error logging

Thu Apr 05, 2012 12:52 pm

Just thinking out loud more than anything, but why would you care about *corrected* ECC errors? You basically want to know that ECC is doing it's job, or are you wanting to keep track for some strange reason?
Please don't edit my signature for me. Thanks.
 
ptsant
Gerbil XP
Topic Author
Posts: 397
Joined: Mon Oct 05, 2009 12:45 pm

Re: ECC error logging

Thu Apr 05, 2012 4:21 pm

Forge wrote:
Just thinking out loud more than anything, but why would you care about *corrected* ECC errors? You basically want to know that ECC is doing it's job, or are you wanting to keep track for some strange reason?


For the same reason that you want to know that your car is overheating before it burns down. I think there is a difference between getting say 1 error a week and getting 1 error per hour. Most importantly, I'm just curious, and I'm looking for a way to justify my purchasing decision. If I don't get any (or very, very few) corrected errors, I could have gotten away with non-ECC RAM, which is cheaper and faster...
Image
 
Forge
Lord High Gerbil
Posts: 8253
Joined: Wed Dec 26, 2001 7:00 pm
Location: Gone

Re: ECC error logging

Thu Apr 05, 2012 4:26 pm

I could very possibly be wrong, but it's my understanding that if you see corrected errors per WEEK, much less per day or per hour, then you have an unusual situation and or a developing problem. When I ran an ECC-enabled system, many moons ago, I think I saw two corrected and zero uncorrected in two or three years.
Please don't edit my signature for me. Thanks.
 
ptsant
Gerbil XP
Topic Author
Posts: 397
Joined: Mon Oct 05, 2009 12:45 pm

Re: ECC error logging

Fri Apr 06, 2012 3:28 am

Forge wrote:
I could very possibly be wrong, but it's my understanding that if you see corrected errors per WEEK, much less per day or per hour, then you have an unusual situation and or a developing problem. When I ran an ECC-enabled system, many moons ago, I think I saw two corrected and zero uncorrected in two or three years.


I haven't got any corrected errors yet under win7, but I understand that they are only reported after a certain threshold per chip. So I imagine that some have occured, but were not logged. Linux does more detailed reporting, but I don't have a very long uptime right now. Anyway, I read this http://research.google.com/pubs/pub35162.html article, published by google, where they estimate approximately 2000-6000 correctable errors per year per GB. With 16GB, I could be getting something like 1000 corrected errors per week. Uncorrectable errors (detected ones), are at least 1000 times less frequent, so I don't expect many to occur.
Image
 
Krogoth
Emperor Gerbilius I
Posts: 6049
Joined: Tue Apr 15, 2003 3:20 pm
Location: somewhere on Core Prime
Contact:

Re: ECC error logging

Fri Apr 06, 2012 4:43 am

I wouldn't be surprise if the desktop versions of Windows lack any built-in ECC monitoring tools, since ECC memory is typically found in server-level equipment.

IMO, ECC support will start to become more important as memory capacities in desktop system start to go into tens of GiBs.
Gigabyte X670 AORUS-ELITE AX, Raphael 7950X, 2x16GiB of G.Skill TRIDENT DDR5-5600, Sapphire RX 6900XT, Seasonic GX-850 and Fractal Define 7 (W)
Ivy Bridge 3570K, 2x4GiB of G.Skill RIPSAW DDR3-1600, Gigabyte Z77X-UD3H, Corsair CX-750M V2, and PC-7B
 
ptsant
Gerbil XP
Topic Author
Posts: 397
Joined: Mon Oct 05, 2009 12:45 pm

Re: ECC error logging

Fri Apr 06, 2012 5:54 am

Krogoth wrote:
I wouldn't be surprise if the desktop versions of Windows lack any built-in ECC monitoring tools, since ECC memory is typically found in server-level equipment.


It certainly isn't easy to find out what's happening, which is why I asked. Even under linux, this functionality, although present, is not really advertised anywhere! According to the sources cited previously, ECC support IS present under Win Vista and 7 and is on by default. Furthermore, predictive failure analysis (PFA) will report recurrent errors after a certain threshold. As you say, monitoring tools (other than the event viewer) are lacking... Well, that's how server manufacturers make a living I guess.
Image
 
Glorious
Gerbilus Supremus
Posts: 12343
Joined: Tue Aug 27, 2002 6:35 pm

Re: ECC error logging

Mon Nov 11, 2013 10:47 am

ptsant wrote:
here they estimate approximately 2000-6000 correctable errors per year per GB. With 16GB, I could be getting something like 1000 corrected errors per week. Uncorrectable errors (detected ones), are at least 1000 times less frequent, so I don't expect many to occur.


???

The abstract plainly says: "...and more than 8\% of DIMMs affected by errors per year."

If only ~8% of DIMMs are affected by errors per year, how is what you are saying even remotely possible?

ptsant wrote:
It certainly isn't easy to find out what's happening,


Perhaps, but more likely you haven't actually encountered any errors yet and so there is nothing to report.
 
just brew it!
Administrator
Posts: 54500
Joined: Tue Aug 20, 2002 10:51 pm
Location: Somewhere, having a beer

Re: ECC error logging

Mon Nov 11, 2013 10:57 am

Krogoth wrote:
IMO, ECC support will start to become more important as memory capacities in desktop system start to go into tens of GiBs.

We're already getting there. My new builds have 16GB.

Glorious wrote:
ptsant wrote:
here they estimate approximately 2000-6000 correctable errors per year per GB. With 16GB, I could be getting something like 1000 corrected errors per week. Uncorrectable errors (detected ones), are at least 1000 times less frequent, so I don't expect many to occur.


???

The abstract plainly says: "...and more than 8\% of DIMMs affected by errors per year."

If only ~8% of DIMMs are affected by errors per year, how is what you are saying even remotely possible?

It is certainly possible if those 8% marginal DIMMs are getting lots of errors!
Nostalgia isn't what it used to be.
 
Glorious
Gerbilus Supremus
Posts: 12343
Joined: Tue Aug 27, 2002 6:35 pm

Re: ECC error logging

Mon Nov 11, 2013 1:15 pm

JBI wrote:
It is certainly possible if those 8% marginal DIMMs are getting lots of errors!


Sure, but I don't think that's what he was talking about.

ptsant wrote:
and I'm looking for a way to justify my purchasing decision. If I don't get any (or very, very few) corrected errors, I could have gotten away with non-ECC RAM, which is cheaper and faster...


Far be it from me to discourage anyone from ECC, and I personally think it's a darn dirty shame that it is somehow seen as a server-only "feature" these days, but odds are that ptsant won't see any corrected errors in a year.

EDIT: ACK! Spam Revival! :(

Who is online

Users browsing this forum: No registered users and 1 guest
GZIP: On