shizuka wrote:just brew it! wrote:@shizuka - How'd you determine that from the info Kougar posted?
The event log has everything needed to decode it:
status is the large, 64 bit register that describes the fault
mcibank is a value that determines which fault bucket it belongs to... it was 0, which implicates the instruction fetch / l1d cache
I ran the values through the Intel MCE decoder. Unfortunately, it's under nda.
recommendation is to rma the cpu. you should not be getting _any_ WHEA errors at spec.
Thank you for crunching the info!! I sincerely do appreciate it
shizuka wrote:[please note that if you are getting it on different cores, then it could be a platform (ie. voltage delivery) issue. if it always happens on core 4 (apicid 4) then you have a bum chip.
This is the crux of my problem. The WHEA parity errors were indeed on different core ID's, so it only increases my suspicion about the motherboard. Gigabyte's US support page wasn't accessible for most of yesterday but I was able to finally create a support ticket last night. Will see what they recommend.
just brew it! wrote:It *is* an error-free product from the user's perspective. The error is getting caught and corrected before it affects operation of the system. You don't return ECC DIMMs just because you are getting occasional corrected DRAM errors; that's what ECC is supposed to do -- detect and correct errors (which are *expected* to happen at some low rate).
Except that DRAM naturally does occasionally have errors. It's natural and expected, and why ECC RAM exists. The same can't be said for a processor so I don't think it's a valid comparison! Given I plan to keep the same processor for 5+ years I want to be sure it isn't going to deteriorate or have the random BSoD.
Glorious wrote:In fact, the only thing I *CAN* find indicates that bank 0 corresponds to QPI: section 16.3.1 on the Intel document.
Dunno, most of the debugging is completely over my head. But I did find this in the Intel MCE PDF: "Most MCE registers are core-specific, that is, each core has its own set of control, status, and address registers. However, in newer processor families such as Nehalem, new banks of registers have been added to the architecture to address package-level error information. For example, in Nehalem processor families, bank 0, 1, 6, 7 are per-package and introduced to address QPI, integrated memory and graphics." Not entirely sure that's applicable to your comment but figured I'd toss it out there in case it was.
For anyone that wanted to play with it here was the other WHEA error. Same info as the first post but differs with a Processor ID of 3.
- System
- Provider
[ Name] Microsoft-Windows-WHEA-Logger
[ Guid] {C26C4F3C-3F66-4E99-8F8A-39405CFED220}
EventID 19
Version 0
Level 3
Task 0
Opcode 0
Keywords 0x8000000000000000
- TimeCreated
[ SystemTime] 2014-02-02T06:54:09.206630500Z
EventRecordID 314144
- Correlation
[ ActivityID] {4C75CD32-FD52-43A8-A065-BCAF2D72DC39}
- Execution
[ ProcessID] 1732
[ ThreadID] 3140
Channel System
Computer Kougar-PC
- Security
[ UserID] S-1-5-19
- EventData
ErrorSource 1
ApicId 3
MCABank 0
MciStat 0x90000040000f0005
MciAddr 0x0
MciMisc 0x0
ErrorType 12
TransactionType 256
Participation 256
RequestType 256
MemorIO 256
MemHierarchyLvl 256
Timeout 256
OperationType 256
Channel 256
Length 864
RawData 435045521002FFFFFFFF03000200000002000000600300000836060002020E140000000000000000000000000000000000000000000000000000000000000000BDC407CF89B7184EB3C41F732CB57131B18BCE2DD7BD0E45B9AD9CF4EBD4F89008C0F2ED7F18CF0100000000000000000000000000000000000000000000000058010000C00000000102000001000000ADCC7698B447DB4BB65E16F193C4F3DB0000000000000000000000000000000002000000000000000000000000000000000000000000000018020000400000000102000000000000B0A03EDC44A19747B95B53FA242B6E1D0000000000000000000000000000000002000000000000000000000000000000000000000000000058020000080100000102000000000000011D1E8AF94257459C33565E5CC3F7E80000000000000000000000000000000002000000000000000000000000000000000000000000000057010000000000000002080000000000C30603000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000300000000000000000000000000000000000000000000000000000000000000000000000000000003000000000000000300000000000000C306030000081003FFFBFA7FFFFBEBBF00000000000000000000000000000000000000000000000000000000000000000100000001000000C8621492E31FCF0103000000000000000000000000000000000000000000000005000F0040000090000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000