ECC weirdness

Where Penguins and Daemons chill together in the warmth of the Sun.

Moderators: SecretSquirrel, notfred

ECC weirdness

Postposted on Sat Nov 07, 2009 9:00 pm

I noticed last night that my syslog file was full of messages similar to this:
Nov 7 12:57:59 tripel kernel: [ 3950.210383] EDAC amd64 MC0: ExtErr=(0x8) F10-ECC/K8-Chipkill error

My first thought was "Uh-oh... looks like a DIMM is starting to fail."

After running Memtest and much Googling, I've determined that the memory is most likely OK after all. The reason these messages started showing up just recently is a cooperative screwup (Asus and Ubuntu):

1. Ubuntu 8.10 apparently did not log ECC errors correctly (I just upgraded to the new 9.10).

2. The old BIOS in my Asus M3A78-CM (I think it may have still been running the original BIOS) was apparently causing spurious ECC errors to be reported.

So it looks like Asus screwed up, but their screwup was being masked by Ubuntu's screwup until I upgraded to 9.10. I've probably been getting spurious ECC exceptions all along, but didn't know it because Ubuntu wasn't logging them properly. :roll:

So I updated the BIOS... and as best I can tell, things are functioning properly now.
(this space intentionally left blank)
just brew it!
Administrator
Gold subscriber
 
 
Posts: 37520
Joined: Tue Aug 20, 2002 10:51 pm
Location: Somewhere, having a beer

Re: ECC weirdness

Postposted on Sun Nov 08, 2009 12:09 pm

There's also Bug 422536 where if you have an nVidia chipset and non-ECC memory it can claim a kernel error on every boot to scare you. I'm disappointed that this one is taking so long to turn off the spurious warning.
notfred
Grand Gerbil Poohbah
 
Posts: 3716
Joined: Tue Aug 10, 2004 10:10 am
Location: Ottawa, Canada

Re: ECC weirdness

Postposted on Sun Nov 08, 2009 12:34 pm

notfred wrote:There's also Bug 422536 where if you have an nVidia chipset and non-ECC memory it can claim a kernel error on every boot to scare you. I'm disappointed that this one is taking so long to turn off the spurious warning.

Yeah, I ran across that while Googling to figure out my own issue. Good to know, since I will probably be setting up an Opteron 165 on an nVidia-based motherboard with non-ECC memory in the near future (system built out of junkbox parts to use as a testbed for some stuff I want to mess around with).
(this space intentionally left blank)
just brew it!
Administrator
Gold subscriber
 
 
Posts: 37520
Joined: Tue Aug 20, 2002 10:51 pm
Location: Somewhere, having a beer

Re: ECC weirdness

Postposted on Sun Nov 08, 2009 7:34 pm

Turns out that the stock BIOS also was the cause of another issue. I had noticed that the clock on this system tended to lose a couple of minutes a day; this is outside the range that the Linux Internet time service daemon (ntpd) is able to correct cleanly without periodically stepping the time. (To get a clean "lock" where ntpd keeps the time continuously in sync, your clock needs to be off by less than 500 ppm, which is approximately 43 seconds/day.)

I had dealt with this by installing an optional package called adjtimex, which allows the system clock to be tweaked manually; I told adjtimex to speed the clock up to get it within a few seconds/day, and ntpd took care of the rest.

After installing the BIOS update, the system clock apparently started running at the correct rate on its own. So my adjtimex hack was making it run about 2 minutes/day too fast! :roll:
(this space intentionally left blank)
just brew it!
Administrator
Gold subscriber
 
 
Posts: 37520
Joined: Tue Aug 20, 2002 10:51 pm
Location: Somewhere, having a beer

Re: ECC weirdness

Postposted on Wed Nov 18, 2009 9:56 am

just brew it! wrote:Turns out that the stock BIOS also was the cause of another issue. I had noticed that the clock on this system tended to lose a couple of minutes a day; this is outside the range that the Linux Internet time service daemon (ntpd) is able to correct cleanly without periodically stepping the time. (To get a clean "lock" where ntpd keeps the time continuously in sync, your clock needs to be off by less than 500 ppm, which is approximately 43 seconds/day.)

I had dealt with this by installing an optional package called adjtimex, which allows the system clock to be tweaked manually; I told adjtimex to speed the clock up to get it within a few seconds/day, and ntpd took care of the rest.

After installing the BIOS update, the system clock apparently started running at the correct rate on its own. So my adjtimex hack was making it run about 2 minutes/day too fast! :roll:


On Gentoo, there's an ntp-client Init script that runs before ntpd is started. ntp-client first sets the clock to the proper time no matter how far off the system clock is from the Internet clock. Are you sure that Ubuntu doesn't have a similar script?
The best things in life are free.
http://www.gentoo.org
Guy 1: Surely, you will fold with me.
Guy 2: Alright, but don't call me Shirley.
titan
Grand Gerbil Poohbah
 
Posts: 3276
Joined: Mon Feb 18, 2002 7:00 pm
Location: Great Smoky Mountains

Re: ECC weirdness

Postposted on Wed Nov 18, 2009 10:29 am

titan wrote:On Gentoo, there's an ntp-client Init script that runs before ntpd is started. ntp-client first sets the clock to the proper time no matter how far off the system clock is from the Internet clock. Are you sure that Ubuntu doesn't have a similar script?

The problem wasn't that the clock was incorrect on startup. The problem was that it was drifting so fast after startup that ntpd was refusing to deal with it. Apparently there's a hard limit of 500 ppm in ntpd; if your clock is running fast or slow by more than that, it basically throws up its hands in disgust and says "your clock is hosed".
(this space intentionally left blank)
just brew it!
Administrator
Gold subscriber
 
 
Posts: 37520
Joined: Tue Aug 20, 2002 10:51 pm
Location: Somewhere, having a beer


Return to Linux, Unix, and Assorted Madness

Who is online

Users browsing this forum: Google Adsense [Bot] and 3 guests