Personal computing discussed
Moderators: renee, mac_h8r1, Nemesis
Kougar wrote:The 4770K was already doing the WHEA error and occasional BSoD thing, was why I wrongly concluded the chip had degraded from moderate overclocking. Now that I'm seeing similar issues with the 4771 I'm sure one (if not both) processors are actually fine. I'm running a triple 140mm radiator setup and even with a 480 in the loop, combined load temps are around 60c.
Noticed something odd about GB's Easytune software, by default it was screwing around with mosfet/phase settings, current protection thresholds, and even loadline calibration. Changed it all to standard, call it a hunch but I'm going to see if that stops the errors before I do anything else. Words can't describe my opinion of Gigabyte's software right now though... no reason at all it should be modifying UEFI settings by default.
The Egg wrote:Kougar wrote:The 4770K was already doing the WHEA error and occasional BSoD thing, was why I wrongly concluded the chip had degraded from moderate overclocking. Now that I'm seeing similar issues with the 4771 I'm sure one (if not both) processors are actually fine. I'm running a triple 140mm radiator setup and even with a 480 in the loop, combined load temps are around 60c.
Noticed something odd about GB's Easytune software, by default it was screwing around with mosfet/phase settings, current protection thresholds, and even loadline calibration. Changed it all to standard, call it a hunch but I'm going to see if that stops the errors before I do anything else. Words can't describe my opinion of Gigabyte's software right now though... no reason at all it should be modifying UEFI settings by default.
Ugh.....I hate all that automatic crap. One of the first things I do on a new system is double and triple check that all of that stuff is turned off. I don't want anything being done automatically (and potentially causing errors and instability) without my knowledge.
Kougar wrote:After two weeks of no errors showing up I suddenly received 12 WHEA corrected errors within a single hour. Guess it's time to try a new OS.
Kougar wrote:Been two weeks and so far while Metro is still highly irritating, I've not seen a single BSoD or WHEA notification. Too early to say with 100% certainty but sure looks promising.
Kougar wrote:You made me think of something... is a guest OS still capable of receiving WHEA notifications from the CPU??
Glorious wrote:Kougar wrote:You made me think of something... is a guest OS still capable of receiving WHEA notifications from the CPU??
I don't see how it could, the errors you were seeing are coming from MSRs. There is no reason why the virtual machine would duplicate the values from them, and depending on how they present the virtualized hardware to guest OSes, they might not implement those specific registers at all.
As you noted, they clearly virtualize some MSRs already, namely, the ones that report virtualization support are clearly set to "off".
JBI wrote:...but we're talking about WHEA notifications in the *host* OS here, right? Or did I miss something, and Win8 running as a guest?
Glorious wrote:Kougar wrote:You made me think of something... is a guest OS still capable of receiving WHEA notifications from the CPU??
I don't see how it could, the errors you were seeing are coming from MSRs. There is no reason why the virtual machine would duplicate the values from them, and depending on how they present the virtualized hardware to guest OSes, they might not implement those specific registers at all.
As you noted, they clearly virtualize some MSRs already, namely, the ones that report virtualization support are clearly set to "off".
just brew it! wrote:If you're still getting hardware errors there should be a way to see those through some sort of Hyper-V management interface, I would think.
Kougar wrote:just brew it! wrote:Sounds like an error in one of the internal caches to me. It can't be a DRAM error since according to the specs for that CPU, it does not support ECC RAM; so it would have no way of even knowing that a DRAM error occurred, let alone correcting it.
Quit being so logical!
But that still puzzles me, because I've seen this with two processors. I'd just figured the first proc had OC issues. Could the motherboard have defective power regulation on the uncore power plane? I find it extremely unlikely that two Haswell procs would both be "bad" with the SAME cache parity WHEA errors.
Kougar wrote:I was so afraid of that. So there's literally no way to check then either, because with the Hyper-V hypervisor in place I'd have to boot to a different disk & OS entirely right?
sluggo wrote:Sorry, but if Intel went to the trouble of implementing error detection and correction in the cache, why would you think it extremely unlikely that there would be errors in the cache? They didn't implement it for no reason. They know that the cache has to operate close to the edge of non-functionality in order to do its job.
Be thankful the EDAC is there. Consider this analogous to a "soft" (correctable) error off a hard drive, which happens often enough.
Glorious wrote:As JBI says though, you'd think there'd be some way the hypervisor would catch/report them....