Personal computing discussed

Moderators: renee, Flying Fox, Thresher

 
Noinoi
Gerbil Team Leader
Topic Author
Posts: 280
Joined: Fri Jun 26, 2015 11:31 pm
Location: Sabah, Malaysia

Errors during OCCT PSU testing seemingly caused by faulty RAM?

Sun Aug 28, 2016 12:26 pm

OCCT PSU testing was causing errors. "Hmm... strange." (Note that the system passes OCCT CPU/OCCT CPU with Linpack cleanly)

IntelBurnTest Standard or Intel XTU stress test reveals nothing wrong. "Hm... not the PSU or CPU?"

Ran Memtest86 on the desktop, and...

Image

I guess I need to get a warranty replacement for the RAM, or could it still be something else?

I really do have pot luck with my desktop's parts, do I? What's next, the HDD? Or the PSU?
[email protected] | Patriot 2x16GB | Asus GTX 970 | Aorus Z390 Pro Wifi | Intel 660p 512GB + Kingston Fury 240GB + 2x4TB WD HDDs | Win 10
 
just brew it!
Administrator
Posts: 54500
Joined: Tue Aug 20, 2002 10:51 pm
Location: Somewhere, having a beer

Re: Errors during OCCT PSU testing seemingly caused by faulty RAM?

Sun Aug 28, 2016 1:10 pm

As a guess, OCCT can't tell the difference between bits getting flipped by general system stability issues caused by flaky power, and bits being flipped by inherently bad RAM.

Is the RAM running at stock speed? If it is overclocked it might not be bad; it might just be an unstable overclock.
Nostalgia isn't what it used to be.
 
Noinoi
Gerbil Team Leader
Topic Author
Posts: 280
Joined: Fri Jun 26, 2015 11:31 pm
Location: Sabah, Malaysia

Re: Errors during OCCT PSU testing seemingly caused by faulty RAM?

Sun Aug 28, 2016 10:54 pm

just brew it! wrote:
As a guess, OCCT can't tell the difference between bits getting flipped by general system stability issues caused by flaky power, and bits being flipped by inherently bad RAM.

Is the RAM running at stock speed? If it is overclocked it might not be bad; it might just be an unstable overclock.

Depends on what definition of stock speed we're using; the RAM is running at its rated XMP profile when the problems were found.
[email protected] | Patriot 2x16GB | Asus GTX 970 | Aorus Z390 Pro Wifi | Intel 660p 512GB + Kingston Fury 240GB + 2x4TB WD HDDs | Win 10
 
UberGerbil
Grand Admiral Gerbil
Posts: 10368
Joined: Thu Jun 19, 2003 3:11 pm

Re: Errors during OCCT PSU testing seemingly caused by faulty RAM?

Sun Aug 28, 2016 11:24 pm

Noinoi wrote:
just brew it! wrote:
As a guess, OCCT can't tell the difference between bits getting flipped by general system stability issues caused by flaky power, and bits being flipped by inherently bad RAM.

Is the RAM running at stock speed? If it is overclocked it might not be bad; it might just be an unstable overclock.

Depends on what definition of stock speed we're using; the RAM is running at its rated XMP profile when the problems were found.

The RAM is running at its rated profile, but the memory controller in the CPU is being overclocked (the I5-4590 is only spec'd for 1600) so the problem may not be your RAM but your CPU. That said, 800->933 is a ~16% overclock, which isn't crazy.
Of course, testing with the memory at 800MHz will resolve the question one way or the other.
 
Noinoi
Gerbil Team Leader
Topic Author
Posts: 280
Joined: Fri Jun 26, 2015 11:31 pm
Location: Sabah, Malaysia

Re: Errors during OCCT PSU testing seemingly caused by faulty RAM?

Sun Aug 28, 2016 11:31 pm

UberGerbil wrote:
Noinoi wrote:
just brew it! wrote:
As a guess, OCCT can't tell the difference between bits getting flipped by general system stability issues caused by flaky power, and bits being flipped by inherently bad RAM.

Is the RAM running at stock speed? If it is overclocked it might not be bad; it might just be an unstable overclock.

Depends on what definition of stock speed we're using; the RAM is running at its rated XMP profile when the problems were found.

The RAM is running at its rated profile, but the memory controller in the CPU is being overclocked (the I5-4590 is only spec'd for 1600) so the problem may not be your RAM but your CPU. That said, 800->933 is a ~16% overclock, which isn't crazy.
Of course, testing with the memory at 800MHz will resolve the question one way or the other.

That might help, but the very same CPU + RAM configuration passed cleanly in all stress testing the past. Hmm...
[email protected] | Patriot 2x16GB | Asus GTX 970 | Aorus Z390 Pro Wifi | Intel 660p 512GB + Kingston Fury 240GB + 2x4TB WD HDDs | Win 10
 
biffzinker
Gerbil Jedi
Posts: 1998
Joined: Tue Mar 21, 2006 3:53 pm
Location: AK, USA

Re: Errors during OCCT PSU testing seemingly caused by faulty RAM?

Sun Aug 28, 2016 11:46 pm

Noinoi wrote:
UberGerbil wrote:
Noinoi wrote:
Depends on what definition of stock speed we're using; the RAM is running at its rated XMP profile when the problems were found.

The RAM is running at its rated profile, but the memory controller in the CPU is being overclocked (the I5-4590 is only spec'd for 1600) so the problem may not be your RAM but your CPU. That said, 800->933 is a ~16% overclock, which isn't crazy.
Of course, testing with the memory at 800MHz will resolve the question one way or the other.

That might help, but the very same CPU + RAM configuration passed cleanly in all stress testing the past. Hmm...

Would a small nudge on voltage say 1.5+.05=1.505 stabilize the RAM, and pass with no errors?
Sticks in my sig are Crucial Ballistix DDR3-1866.
It would take you 2,363 continuous hours or 98 days,11 hours, and 35 minutes of gameplay to complete your Steam library.
In this time you could travel to Venus one time.
 
Noinoi
Gerbil Team Leader
Topic Author
Posts: 280
Joined: Fri Jun 26, 2015 11:31 pm
Location: Sabah, Malaysia

Re: Errors during OCCT PSU testing seemingly caused by faulty RAM?

Mon Aug 29, 2016 12:47 am

Either way, I'm doing the tests again, albeit with the Rowhammer test skipped as it's not relevant for my needs right now, first with XMP, and then at stock speeds, just to be sure it wasn't a fluke. If the RAM appears to be stable with RAM at 1600 I think I'll rerun the OCCT PSU test and see if it still causes errors there. If it errors out there for some reason, would that mean I have something nasty going on?

Update: OK, weird, Memtest86 passed clean again when I redid it while XMP profiles are in effect. Now redoing the PSU test, but with the video card set to its factory profile (odd, though, should the GPU overclock cause issues; the test didn't lock up the rendering/freeze, and it appears to just error our when the lower-stress phase of the CPU loading happens.) Again, using auto fans, to see if it's a matter of case temperatures, the CPU, or the RAM...

Update 2: OCCT PSU test fails again about 1 hour and 12 minutes in. CPU temperatures are in check at 74C (remember, auto fans and no AC); VRM and PCH temps are at 67 and 66C. If these VRM and PCH temps are acceptable, and the CPU as well as the GPU are running at stock, perhaps I should try again with the memory running at DDR3-1600. Again, the video card never locked up.

I think I might also want to buy a Celeron or Pentium, as well as a 4GB/8GB stick of DDR3-1600 to narrow down the possible faulting part. Should I?

Update 3: OCCT PSU test clean passes in 2 hours at DDR3-1600 CL11, normal JEDEC specs (not XMP 1600); I think I might run the PSU test overnight just to be veeeeery sure that it's not the memory or something. Might give it at least 8 hours. (Should I also test XMP 1600 at CL9?)

Still can't rule out between the CPU's memory controller and the RAM itself being the faulting part, though.

Might also try out the RAM at XMP 1866 again, but maybe with slight to moderate overvolting; probably at both 1.51V and at 1.65V. Not precise enough to let me set 1.505V. I'll do both of them first with 2 hours before doing the longer tests at 1600.
[email protected] | Patriot 2x16GB | Asus GTX 970 | Aorus Z390 Pro Wifi | Intel 660p 512GB + Kingston Fury 240GB + 2x4TB WD HDDs | Win 10
 
just brew it!
Administrator
Posts: 54500
Joined: Tue Aug 20, 2002 10:51 pm
Location: Somewhere, having a beer

Re: Errors during OCCT PSU testing seemingly caused by faulty RAM?

Mon Aug 29, 2016 6:15 am

Sounds like the RAM subsystem is right on the hairy edge of stability with the XMP profile. As you've already acknowledged, there's really no way to tell based on the info you have whether this is due to the RAM or the CPU (or even both... maybe you've got a "perfect storm" where both are just a little sub-par).

Your options at this point are to start replacing things to see what happens, or just back off and run the RAM at stock. I'd be inclined to opt for the latter, but it's your time and money.
Nostalgia isn't what it used to be.
 
Noinoi
Gerbil Team Leader
Topic Author
Posts: 280
Joined: Fri Jun 26, 2015 11:31 pm
Location: Sabah, Malaysia

Re: Errors during OCCT PSU testing seemingly caused by faulty RAM?

Mon Aug 29, 2016 6:52 am

just brew it! wrote:
Sounds like the RAM subsystem is right on the hairy edge of stability with the XMP profile. As you've already acknowledged, there's really no way to tell based on the info you have whether this is due to the RAM or the CPU (or even both... maybe you've got a "perfect storm" where both are just a little sub-par).

Your options at this point are to start replacing things to see what happens, or just back off and run the RAM at stock. I'd be inclined to opt for the latter, but it's your time and money.

Mmm hmm.
Right now I'm running the PSU test, but with the memory very slightly overvolted to 1.51V while at 1866. If this is stable... now I don't know what the heck is going on with my desktop, now. I don't have a second CPU or a second set of dual-channel DDR3-1866 or better RAM, even if I do find errors, there's still the off-chance that the CPU controller is the faulting part (or both).
Long-term stability testing is also needed. I think I'll be able to arrive at a conclusion tomorrow. 8 hours of OCCT or more while the RAM is set to stock speeds is probably more than enough - I think I'll try to aim for 12 hours.

Update: PSU test passes cleanly after 2 hours, with the RAM set to 1.51 V instead of the usual 1.5 V. Now the next step: should I do a long-term stability testing at 1.51V with XMP 1866 speeds, or do it when the memory is at JEDEC stock speeds and latencies?

Update 2: Long-term stability testing with the RAM running in XMP 1866 mode, but with the voltage at 1.51V. Weird. So now I have a dual-channel kit that refuses to run properly at their rated voltage.

Now I have a few options.
1. Run the RAM at stock speeds.
2. Run the RAM at 1.51V and suffer from tripled idle power and doubled access power.
3. Attempt to overclock the RAM by overvolting it to 1.65V and then playing with frequencies and timings, throwing power consumption to the wind in the process.
4. RMA the pair and be stuck with no desktop to use if no temporary DDR3 is in use.

Oh well.

Update 3: PSU test failed again at XMP 1866 at the exact same time spot. Hmm... Anyway, this time, I've opened up the case, took the RAM and graphics card out, and disconnected the PCIe plugs, cleaned the gold contacts on the RAM and the graphics card, made sure that they're clean enough - they looked slightly dull, and someone elsewhere mentioned that I should clean them just to be very sure that it's not just a contact issue. Now that everything has been reinstalled, I'm going to run the PSU test again, but on a longer period of time. Hopefully any instability will be rooted out; I'm actually pretty confident that it's probably the RAM, in one way or another.

Also to note: when I removed the RAM sticks rather quickly, their heat spreaders were very warm to the touch. Almost uncomfortable, really. Is this normal?

(And I also learned that the video card will throw up a helpful message on POST if the video card isn't getting its PCIe plugs connected. Shame on me.)

Update 4: OK, not a RAM contact issue, and not an ambient temperature issue (turned on AC today) - PSU test failed in 35 minutes.

I'm 99% confident the RAM is the problem, and the problem being the RAM probably just got... bad enough that it can't maintain its XMP profile speeds at the specified 1.5V. I think I'll try to get the RAM replaced.

Update 5: Forcing the system's fans to all 100% seems to fix the problem, too... but still preliminary. I need to do it again for at least 8 hours, with the AC turned off, to remove or confirm insufficient case airflow. The system is currently configured to have two front intakes and the CPU radiator exhausting to the back.
[email protected] | Patriot 2x16GB | Asus GTX 970 | Aorus Z390 Pro Wifi | Intel 660p 512GB + Kingston Fury 240GB + 2x4TB WD HDDs | Win 10

Who is online

Users browsing this forum: No registered users and 1 guest
GZIP: On