Personal computing discussed

Moderators: renee, mac_h8r1, Nemesis

 
BIF
Minister of Gerbil Affairs
Topic Author
Posts: 2458
Joined: Tue May 25, 2004 7:41 pm

Re: Cold Boot Issues (Topic Renamed)

Fri Dec 11, 2015 1:04 am

Update:

After I removed the NZXT fan controller and temperature probe, I plugged all the fans into the motherboard, booted, went into the BIOS and turned on monitoring for those newly populated fan headers. Then I decided to test the fans before replacing the PSU.

I started Windows, then played around a bit in the Asus "Fan Xpert" tool, and then proceeded to reboot several times; some warm, some cold.

The system boots normally now. I can't seem to reproduce the cold boot beeping problem, and it still has the old PSU installed.

That fan controller device only had one power lead coming from one of the PSU cables for hard drives and such. The fan controller was not connected to the system in any other way such as a USB cable, and quite frankly I never suspected this device. I had been wanting to remove it for a while now because with a card taking up a rear slot, and with 5 fan wires and 5 temperature probe wires, it was cluttering up the case more than necessary. I suppose there is a possibility that the fan controller was not the root cause of my troubles, but it is the only significant change I made today.

I'm going to continue to experiment with the system for a few more days to rule out any flukes, then I'll probably swap in the new PSU.

By the way, the Thermaltake tester I bought (in the same order as the PSU) draws faults on the new PSU. Everything faults, the motherboard, GPU, CPU, and hard drive power connectors. Right now I suppose it's just as likely that the tester is faulty and that the new PSU (EVGA) is in reality a good unit. So in a few days I'll test the tester on the old PSU and then I'll swap PSUs anyway as mentioned in the paragraph above.

More to come...
 
Geonerd
Gerbil First Class
Posts: 163
Joined: Mon Dec 19, 2011 2:29 pm
Location: Sunny Aridzona

Re: Cold Boot Issues (Topic Renamed)

Fri Dec 11, 2015 1:49 am

How many DIMMs in the system?

I've just recently solved a somewhat similar 'mysterious' cold-boot issue.
Turns out the memory was cold, and that was causing sporadic memory integrity issues. A cracked solder ball/joint is almost certainly running amok.
Once warmed up, the computer ran flawlessly.

Bad memory may well cause the other symptoms you've experienced.

Pull all but one DIMM and test. Repeat with all DIMMS until you find the bad one.

You might want to DL the "Ultimate boot disk" ISO and burn a CD/DVD. It includes Memtest, assorted stability stress tests, etc. Better to than crashing or bit-rotting your Windoze installation.
 
BIF
Minister of Gerbil Affairs
Topic Author
Posts: 2458
Joined: Tue May 25, 2004 7:41 pm

Re: Cold Boot Issues (Topic Renamed)

Fri Dec 11, 2015 10:23 am

Nothing so far has indicated a memory problem in this case, and cold boots were done with warm components, about a week ago. Some would fail, but swapping power cables or plugging into a different outlet would allow the system to boot without enough time to cause components to cool down in any significant way. Plus, there was one night when I went to bed after shutting off the system. Cold boot in the morning failed anyway. So it has been successful AND unsuccessful both ways.

No, this particular problem does not (yet) seem to point to a memory issue or a temperature issue. If it does even in the slightest, I will run memtest first, before pulling any DIMMS (I have 8 ).
 
Geonerd
Gerbil First Class
Posts: 163
Joined: Mon Dec 19, 2011 2:29 pm
Location: Sunny Aridzona

Re: Cold Boot Issues (Topic Renamed)

Fri Dec 11, 2015 12:06 pm

BIF wrote:
Nothing so far has indicated a memory problem in this case, and cold boots were done with warm components, about a week ago. Some would fail, but swapping power cables or plugging into a different outlet would allow the system to boot without enough time to cause components to cool down in any significant way. Plus, there was one night when I went to bed after shutting off the system. Cold boot in the morning failed anyway. So it has been successful AND unsuccessful both ways.

No, this particular problem does not (yet) seem to point to a memory issue or a temperature issue. If it does even in the slightest, I will run memtest first, before pulling any DIMMS (I have 8 ).


You keep saying that, but...
Have you run Memtest at all? It's stupidly easy to do.
 
BIF
Minister of Gerbil Affairs
Topic Author
Posts: 2458
Joined: Tue May 25, 2004 7:41 pm

Re: Cold Boot Issues (Topic Renamed)

Sun Mar 13, 2016 11:00 pm

Geonerd wrote:
Have you run Memtest at all? It's stupidly easy to do.


Yes it is, and that is on the list of things to do, if I can get a good boot. More on that later in this post.

First an update:

Back in December I bought a new power supply. It seemed to work okay; got a few clean boots from both "restart" (warm) and "shutdown" (cold), and with both warm and cold (temperature wise) components. So I put the system back into service. It doesn't get cold booted sometimes for months at a time, so I was happily plugging away until...

... a couple weeks ago I had to do a cold boot and got the beeps again. Couldn't get a clean boot to save my life until this weekend, but as luck would have it, I didn't have a MEMTEST CD or thumbdrive installed. And now once again, I haven't been able to run MEMTEST because I ... can't boot. Funny how that works.

Even though the problem is worse, I think the overall situation is (maybe) more promising. The consistency of the problem has allowed me to do quite a lot of isolation testing and rule out a large number of components...

Not the problem:
--> I have tried both of my GTX 980 graphic cards together and separately. The system fails to boot with either card and with both cards together. When trying the cards individually, I tried moving them to different PCIe slots. Same thing; no boot. I am both relieved and frustrated that one of my GPUs was not at fault. :lol:

--> I have removed my UAD card. That's a DSP hardware device with a couple of Shark processors on board. It's for making reverbs, delays, amp and tape simulators, and so forth. System still does not boot. I am relieved that this $800+ card is not at fault.

---> I have removed an internal USB port replicator (not sure what to call it). This little device is about 3 inches long, is powered from the PSU and it provides two internal USB headers and one USB (A?) connector. I didn't suspect this part, but at this point, anything in the system could be another possible point of failure. Removing this card did not improve the situation.

---> I have removed a SATA+PATA PCIe card. Removing this card did not improve the situation.

---> I removed all of the hard drive caddys from their bays, and put a bootable CD into the slot, to see if I could get Macrium or Disk Director to boot. Absence of the hard drives did not allow the system to boot.

Each thing I try doesn't fix the situation, which probably means that my isolated devices are probably not malfunctioning. So it's good that this error is so consistent now.

I have observed a couple of strange new things.

Weird new behavior #1: When shutting off the PSU power switch, the PC shuts down, then a few seconds later, there is a brief power-on (lights and fans come on); very brief. This lasts less than a second, and the system then shuts off and stays off.
Weird new behavior #2: When turning on the PSU power switch, the PC boots itself even before I can press the power button.

Neither of the above should happen under any normal circumstances. This motherboard has onboard buttons for both power and reset, which makes it easy to test without front panel wiring connected, so I tried unplugging the front panel connectors (Back in December I had already cleaned them with contact cleaner). Both behaviors noted above remain consistent regardless of the presence or absence of the front panel connectors.

Each new thing rules out something or other. So much has been eliminated now that I think I'm pretty much left with the following possibilities:

* bad memory
* bad motherboard
* bad CPU or CPU cooler (it's USB connected)
* a problem with the case such as a shorting issue on the underside of the motherboard

Remember, the PSU was replaced with a whole different brand back in December, so I'm pretty sure it's not the PSU.

I am now at the point where it's time to rule out the memory. This operation required removal of the graphic cards and the CPU radiator (ugh, I was definitely procrastinating on this task!). I started by removing all 8 sticks (I have 64 GB RAM).

I'd like to start by just reseating all 8 to see if the system will reboot. Would it be helpful to clean the memory contacts, or should I avoid doing this?

Also, for my first test (reseating all 8 sticks), should I reseat them into the same slots, or try to move them to different slots?
 
Deanjo
Graphmaster Gerbil
Posts: 1212
Joined: Tue Mar 03, 2009 11:31 am

Re: Cold Boot Issues (Topic Renamed)

Sun Mar 13, 2016 11:10 pm

BIF wrote:
Weird new behavior #2: When turning on the PSU power switch, the PC boots itself even before I can press the power button.


That actually sounds like the behaviour if you have the APM BIOS settings to power on after power loss.
 
BIF
Minister of Gerbil Affairs
Topic Author
Posts: 2458
Joined: Tue May 25, 2004 7:41 pm

Re: Cold Boot Issues (Topic Renamed)

Sun Mar 13, 2016 11:41 pm

Okay, I'll check that in the BIOS the next time .. hah .. I get it to boot.

Which makes me wonder...would it be worthwhile to maybe consider clearing the BIOS and starting from factory?
 
just brew it!
Administrator
Posts: 54500
Joined: Tue Aug 20, 2002 10:51 pm
Location: Somewhere, having a beer

Re: Cold Boot Issues (Topic Renamed)

Sun Mar 13, 2016 11:50 pm

Agree with Deanjo - check the BIOS settings for power management. What to do when the PSU is switched on is is an option in most BIOSes. Typically, you can specify that the system should: A) remain off; B) power up; or C) power up if the last power down was due to the PSU power switch or a power failure.

Yes, try clearing your BIOS.

Reseating the RAM is a good idea. While you're at it, also try removing all but one stick of RAM and see if the system will boot reliably.

Bad motherboard seems somewhat likely at this point.

Bad CPU is possible, but unlikely.
Nostalgia isn't what it used to be.
 
biffzinker
Gerbil Jedi
Posts: 1998
Joined: Tue Mar 21, 2006 3:53 pm
Location: AK, USA

Re: Cold Boot Issues (Topic Renamed)

Mon Mar 14, 2016 12:57 am

I'd like to start by just reseating all 8 to see if the system will reboot. Would it be helpful to clean the memory contacts, or should I avoid doing this?

If you still want to clean the contacts on the sticks use a pencil eraser so your not causing a electric static shock, and the eraser will clean any contaminants from the gold plated pads.
It would take you 2,363 continuous hours or 98 days,11 hours, and 35 minutes of gameplay to complete your Steam library.
In this time you could travel to Venus one time.
 
BIF
Minister of Gerbil Affairs
Topic Author
Posts: 2458
Joined: Tue May 25, 2004 7:41 pm

Re: Cold Boot Issues (Topic Renamed)

Mon Mar 14, 2016 2:20 am

Status:

I put the first half of the RAM back in (the first 4 sticks into their same old places), re-affixed the radiator, and installed the ASUS GTX 980 back into its former place in PCIe slot #1. Was still getting boot failures with QCODE 5A, then 63 and 6d in quick succession.

I shut down and cleared the RTC following instructions in the user guide. Still would not boot.

I read in the user guide that you can use this thing called the 'Mem OK' button to let the motherboard set the memory timings. I've never tried that since I built this rig in 2012 because, well I never had any trouble before. Mem OK! is a tiny black button in the upper right of the motherboard (as oriented in a typical tower). I tried booting and pressing this button, and it would boot a couple times but still fail with the same 5A, 63, and 6d sequence.

I shut down, then did some more reading in the user guide. I was about to follow a straw-grasping hunch on the 5A code (CPU Error) and pull the water block off the CPU and reseat the CPU when I decided to "just try one more time". The front panel is still not hooked up to the motherboard so I punched the power button on the lower right area of the motherboard. It booted and the screen said something about "Mem OK Successful". Well, son of a gun!

I hit DEL a few times and the UEFI BIOS settings opened up.

I looked at the BIOS settings, and oddly enough the clock hadn't been reset by my clearing operation. So I don't know what RTC actually resets, if anything. Moving on, I looked at the DRAM settings and everything was on "auto", the default (which is what I was running before, because I never mess around with overclocking). Next, I booted (warm boot; still fearing a cold boot attempt) into Windows, and I got in. I plugged in a USB WIFI antenna (the PC is now in another room, too far away from the longest Ethernet cable available), then I quickly burned a CD with the latest Memtest86 ISO. I followed up with a warm boot again to see if I could boot from the CD and run Memtest. I still haven't tried again to cold-boot the system as is, for fear that it won't come up and I'll be in the same boat all over again. Best to follow the plan and run Memtest first.

I was too slow to go into the BIOS again, and Windows 8.1 came up to the boot selection screen. I tried "other options" and "boot from ATAPI" was present. Now that's pretty cool! I selected that and the next boot brought up Passmark Memtest 86.

It's running right now on that first four sticks of 32 GB and in the time it took me to write and proofread this post, it has gotten through 58% of the first pass. I'm going to let it run all night, then will test the second-half of the memory tomorrow evening.

So maybe Mem OK only resulted in a temporary fix. If so, I kind of hope it's one or two sticks of the RAM. I'll gladly drop down from 64 GB to 48 GB for a few months if I can get that system reliably cold-booting once again.

More to come...
 
BIF
Minister of Gerbil Affairs
Topic Author
Posts: 2458
Joined: Tue May 25, 2004 7:41 pm

Re: Cold Boot Issues (Topic Renamed)

Mon Mar 14, 2016 7:26 am

...so the "hammer test" has drawn 4 errors so far:

Test 13 has a note that "RAM may be vulnerable to high frequency row hammer bit flips", whatever that means. I don't know if this is a message about my ram, or any ram in general.

Here are the errors:

18F065D18 Expected 00000000 Actual 00000004 CPU 0
333024E0C Expected 00000000 Actual 00800000 CPU 0
7A742341C Expected 00000000 Actual 00800000 CPU 0
7CB824934 Expected 00000000 Actual 00100000 CPU 0

It's still running and it is 91% through pass 1 of 4. Is there any merit to letting it run further? Based on the addresses shown, it looks like 2 or 3 sticks are flagging here. How can I translate the addresses to tell me which sticks are pulling the errors?

Edit: I just realized that this test is still running. For some reason, it appears to be running right now against addresses LOWER than the 7A and 7C addresses shown above (it's in the 56's and 57's). Just curious, are these tests executed in a non-ascending way?
 
Ryu Connor
Global Moderator
Posts: 4369
Joined: Thu Dec 27, 2001 7:00 pm
Location: Marietta, GA
Contact:

Re: Cold Boot Issues (Topic Renamed)

Mon Mar 14, 2016 8:07 am

BIF wrote:
...so the "hammer test" has drawn 4 errors so far:

Test 13 has a note that "RAM may be vulnerable to high frequency row hammer bit flips", whatever that means. I don't know if this is a message about my ram, or any ram in general.

Here are the errors:

18F065D18 Expected 00000000 Actual 00000004 CPU 0
333024E0C Expected 00000000 Actual 00800000 CPU 0
7A742341C Expected 00000000 Actual 00800000 CPU 0
7CB824934 Expected 00000000 Actual 00100000 CPU 0

It's still running and it is 91% through pass 1 of 4. Is there any merit to letting it run further? Based on the addresses shown, it looks like 2 or 3 sticks are flagging here. How can I translate the addresses to tell me which sticks are pulling the errors?

Edit: I just realized that this test is still running. For some reason, it appears to be running right now against addresses LOWER than the 7A and 7C addresses shown above (it's in the 56's and 57's). Just curious, are these tests executed in a non-ascending way?


Rowhammer errors don't impact stability.

Rowhammer attack exploits shrinking process size in DRAM

Turn off the Rowhammer test in Memtest for stability testing.
All of my written content here on TR does not represent or reflect the views of my employer or any reasonable human being. All content and actions are my own.
 
notfred
Maximum Gerbil
Posts: 4610
Joined: Tue Aug 10, 2004 10:10 am
Location: Ottawa, Canada

Re: Cold Boot Issues (Topic Renamed)

Mon Mar 14, 2016 8:13 am

memtest will do all kinds of access patterns, incrementing, decrementing, alternating, random etc to try and get the memory to fail.

Here's the wikipedia page on Row hammer, it means that rapid accesses to some locations can affect data in others, but this is unlikely to be your issue. If the rest of the memtest is clean and you are only seeing row hammer failures then this isn't your problem with boot as boot doesn't do the repeated access that triggers this failure. It's more a security vulnerability rather than a crashing machine failure.

There's no way that I'm aware of to map physical addresses to memory slots, you just have to try moving the sticks around and substituting them and see if the failure follows the sticks.
 
BIF
Minister of Gerbil Affairs
Topic Author
Posts: 2458
Joined: Tue May 25, 2004 7:41 pm

Re: Cold Boot Issues (Topic Renamed)

Mon Mar 14, 2016 8:48 am

I see, so that's why everybody says test with 1 stick at a time. That way if you get an error, you know it can't be coming from some other stick.
 
BIF
Minister of Gerbil Affairs
Topic Author
Posts: 2458
Joined: Tue May 25, 2004 7:41 pm

Re: Cold Boot Issues (Topic Renamed)

Mon Mar 14, 2016 4:52 pm

I reran the memory tests and they ran fine with the first 32 GB of RAM.

I did a few little things in Windows after that (via a warm boot), made sure I have current backups, then decided to try another cold boot.

Shut down, then pressed the "power" button from the motherboard. Beeps.

Okay, now I am becoming fatigued of this. Truly.

I just need to figure out what parts to replace, but I can't even seem to do that.

I'm halfway to saying "eff it" and opening a new thread in System Builders forum to upgrade memory, motherboard, and CPU.

Thoughts, opinions?

P.S. The power supply in my sig has been changed out to an EVGA 1200 watt model. I'm finding it hard to develop enough gumption to change my profile. :(
Intel i7 6850K, Gigabyte GA-X99 Designare EX, 64 GB DDR4 Kingston HyperX, two Geforce 980 cards, EVGA 1200W PSU, and various SSDs and HDDs
 
Starfalcon
Gerbilus Supremus
Posts: 12008
Joined: Mon Oct 14, 2002 10:43 am

Re: Cold Boot Issues (Topic Renamed)

Tue Mar 15, 2016 3:31 am

Have you checked for a new bios yet?
 
biffzinker
Gerbil Jedi
Posts: 1998
Joined: Tue Mar 21, 2006 3:53 pm
Location: AK, USA

Re: Cold Boot Issues (Topic Renamed)

Tue Mar 15, 2016 12:16 pm

Starfalcon wrote:
Have you checked for a new bios yet?


I see this in the latest BIOS Update for his board,
Version 4802

Description BIOS 4802 for P9X79 WS released to public.
1. Improve compatibility to some Gen 2/3 capture card and with some storage device.
2. Improve “Restore AC Power Loss” function in UEFI BIOS – APM section.

File Size 4.84 MBytes update 2015/07/17

ASUS P9X79 WS 4802
It would take you 2,363 continuous hours or 98 days,11 hours, and 35 minutes of gameplay to complete your Steam library.
In this time you could travel to Venus one time.
 
Topinio
Gerbil Jedi
Posts: 1839
Joined: Mon Jan 12, 2015 9:28 am
Location: London

Re: Cold Boot Issues (Topic Renamed)

Wed Mar 16, 2016 2:12 am

BIF wrote:
My motherboard has two features that may help me test.

1. I can boot from a USB stick, so should be able to remove all peripherals.
2. Can (I think) update the BIOS from a USB stick without processor (and maybe without memory) installed. I don't plan to update the BIOS because I have the latest one (1206 or something), but I can at least test the system with only motherboard connected to PSU.

Since I've decided to do at least "some" more testing, that "blog" will continue to happen in the other thread. It never hurts to plan for the worst, so ongoing discussion about potential new system parts will continue to go here.

I'll post an update soon.

I'd first test if you can consistently get the motherboard+CPU to power on without anything else installed, even RAM, before trying USB boot. (On cardboard, after clean-up.)

Then if that works, USB boot to memtest86 is definitely good, to test the RAM one DIMM at a time.

Flashing the firmware on an unstable system is brave, IMO. :-?

As you have no IGP you'll need a GPU for the USB boots, do you have a recent-ish low- and bus-powered known good one spare? (I keep one in the parts box for this scenario, currently a HD 5450.)
Desktop: 750W Snow Silent, X11SAT-F, E3-1270 v5, 32GB ECC, RX 5700 XT, 500GB P1 + 250GB BX100 + 250GB BX100 + 4TB 7E8, XL2730Z + L22e-20
HTPC: X-650, DH67GD, i5-2500K, 4GB, GT 1030, 250GB MX500 + 1.5TB ST1500DL003, KD-43XH9196 + KA220HQ
Laptop: MBP15,2
 
just brew it!
Administrator
Posts: 54500
Joined: Tue Aug 20, 2002 10:51 pm
Location: Somewhere, having a beer

Re: Cold Boot Issues (Topic Renamed)

Wed Mar 16, 2016 5:42 am

Topinio wrote:
Flashing the firmware on an unstable system is brave, IMO. :-?

Risky, yes. But sometimes you have no choice. I even like to make sure the system is on aUPS when I do it (though I've been less diligent about this since moving to a city with more reliable electric service).
Nostalgia isn't what it used to be.
 
Topinio
Gerbil Jedi
Posts: 1839
Joined: Mon Jan 12, 2015 9:28 am
Location: London

Re: Cold Boot Issues (Topic Renamed)

Wed Mar 16, 2016 8:57 am

Fair enough precaution, but I wouldn't flash the firmware until I either could get it solid or had eliminated everything but flakey firmware as the cause of the instability.[1]

This build ran fine from 2012 until last November (though obviously there have been modifications at various points), so it ought to be possible to get it consistently POSTing unless the motherboard, CPU, or PSU has failed. If it consistently POSTs with nothing installed, even to no-RAM beep codes, and a single DIMM can then be fully validated as good, then it's worth bothering with the f/w upgrade.

Otherwise, there's a real risk of compounding the problem.

[1] I may have done so in the past, and can say with certainty that it is an error. Particularly on a system in the 5-figure bracket in 1999 :wink:
Desktop: 750W Snow Silent, X11SAT-F, E3-1270 v5, 32GB ECC, RX 5700 XT, 500GB P1 + 250GB BX100 + 250GB BX100 + 4TB 7E8, XL2730Z + L22e-20
HTPC: X-650, DH67GD, i5-2500K, 4GB, GT 1030, 250GB MX500 + 1.5TB ST1500DL003, KD-43XH9196 + KA220HQ
Laptop: MBP15,2
 
Ryu Connor
Global Moderator
Posts: 4369
Joined: Thu Dec 27, 2001 7:00 pm
Location: Marietta, GA
Contact:

Re: Cold Boot Issues (Topic Renamed)

Wed Mar 16, 2016 9:10 am

It's impossible for him to firmware brick that motherboard.

His board has the Asus BIOS Flashback functionality. It's bulletproof.

As it stands though, I suspect that motherboard is dying.
All of my written content here on TR does not represent or reflect the views of my employer or any reasonable human being. All content and actions are my own.
 
Topinio
Gerbil Jedi
Posts: 1839
Joined: Mon Jan 12, 2015 9:28 am
Location: London

Re: Cold Boot Issues (Topic Renamed)

Wed Mar 16, 2016 10:19 am

Yes, that's most likely. Who knows at this point whether it's the root cause or was killed by something else, and whether other parts are okay.

BIF, have you at any point tried to use the BIOS Flashback function in case your f/w is messed up?
Desktop: 750W Snow Silent, X11SAT-F, E3-1270 v5, 32GB ECC, RX 5700 XT, 500GB P1 + 250GB BX100 + 250GB BX100 + 4TB 7E8, XL2730Z + L22e-20
HTPC: X-650, DH67GD, i5-2500K, 4GB, GT 1030, 250GB MX500 + 1.5TB ST1500DL003, KD-43XH9196 + KA220HQ
Laptop: MBP15,2
 
BIF
Minister of Gerbil Affairs
Topic Author
Posts: 2458
Joined: Tue May 25, 2004 7:41 pm

Re: Cold Boot Issues (Topic Renamed)

Wed Mar 16, 2016 11:48 pm

Hi guys. Okay, I'll answer your questions, then give you a status.

biffzinker wrote:
Starfalcon wrote:
Have you checked for a new bios yet?


I see this in the latest BIOS Update for his board,
Version 4802

Description BIOS 4802 for P9X79 WS released to public.
1. Improve compatibility to some Gen 2/3 capture card and with some storage device.
2. Improve “Restore AC Power Loss” function in UEFI BIOS – APM section.

File Size 4.84 MBytes update 2015/07/17

ASUS P9X79 WS 4802


Oh boy, I didn't see that one. I may have the one prior from 2014 (but maybe not). Definitely don't have 4802, because it was released in July 2015 and this was long before I had any issues. But it does hit the APM power area. SOAB, that's embarrassing. I'll put that one on. :oops:

BIF wrote:
My motherboard has two features that may help me test.

1. I can boot from a USB stick, so should be able to remove all peripherals.
2. Can (I think) update the BIOS from a USB stick without processor (and maybe without memory) installed. I don't plan to update the BIOS because I have the latest one (1206 or something), but I can at least test the system with only motherboard connected to PSU.

Since I've decided to do at least "some" more testing, that "blog" will continue to happen in the other thread. It never hurts to plan for the worst, so ongoing discussion about potential new system parts will continue to go here.

I'll post an update soon.


Topinio wrote:
I'd first test if you can consistently get the motherboard+CPU to power on without anything else installed, even RAM, before trying USB boot. (On cardboard, after clean-up.)

First off, thank you for carrying that quote over from the other thread, that was nice of you.

I have the motherboard out of the case and sitting on a stack of paper. Well, it's like a very very thin stock of cardboard! Anyway, I have cleaned up the TIM and have reseated the CPU. I've visually inspected everything under a magnifying glass, looking for evidence of bulging or bursted capacitors, burned traces or connectors, or ANY other evidence of a possible "releasing of the magic smoke". The motherboard, CPU, all connectors, traces, and even undersides all appear clean and (ahem) smoke-free.

I don't have a stock HSF. Can I hook up the H80i to the CPU and on-board power and USB and try to boot, even while the motherboard is "on cardboard"? If not, I'll have to source a HSF, and that could take days for delivery.

Then if that works, USB boot to memtest86 is definitely good, to test the RAM one DIMM at a time.

Yes, I can do this. I'll configure a new bootable USB with Memtest86.

Flashing the firmware on an unstable system is brave, IMO. :-?

My system isn't "unstable" in the way you're thinking. It doesn't crash. It just won't boot from a "power off" state. But when it boots, it's fine. That's one reason this has gone on so long, because I'll boot it and let it fold for MONTHS before rebooting. And all that time, I'll be doing music and graphic arts, WHILE it folds. Yep, sometimes I forget to shut down folding.

But I do agree that I should put on the new BIOS, and so I'll do that. Also, somebody else noted in a later thread that my motherboard can flashback to the old BIOS.

As you have no IGP you'll need a GPU for the USB boots, do you have a recent-ish low- and bus-powered known good one spare? (I keep one in the parts box for this scenario, currently a HD 5450.)

I'll check my parts bins, but I think I only have GPUs that need one or two power cables in addition to whatever power the PCIe bus is providing.

STATUS:

So now the motherboard is on the bench, the TIM has been cleaned off, and all of the surface and underside components have been visually inspected. One memory slot has a slightly broken guide on the end opposite of the ejection trigger. This appears to be cosmetic, and is likely the result of getting scraped when inserting/removing the GPUs from slot #1. The GPUs don't appear to be damaged, and their backsides are made of heavy gauge structural metal anyhow.

Aside from that cosmetic blemish, everything appears to be in order.

The CPU is installed right now, if only to cover up the slot pins and keep some dust out.

I'm ready for the first test, whatever that may be.
 
Topinio
Gerbil Jedi
Posts: 1839
Joined: Mon Jan 12, 2015 9:28 am
Location: London

Re: Cold Boot Issues (Topic Renamed)

Thu Mar 17, 2016 4:23 am

If you only have a GPU that needs 2 auxillary power connectors and a CPU cooler that needs USB power, you might as well install them both into the on-cardboard set-up and try 10 or 20 POSTs. It's not ideal, but you have what you have.

If it doesn't POST every time, you can be confident that at least one of those 5 parts is not okay.

If it does POST properly (to the no-RAM beep code) every time, then all of those parts are probably possibly okay. (It would be probably if you had a bus-powered and known-good noddy GPU, but I wouldn't mark a GTX 980 as probably okay based off a series of successful system POSTs.)

At that point, you can start testing the DIMMs:

DO N_dimm=1,8
  1. Pick a DIMM, bonus points for putting a sticker on it "N_dimm", and put it in the first DIMM slot.
  2. Boot from MemTest86 V6.3.0 (latest) media, deselect row hammer test and run for many hours.
    1. If it passes 24 hours of MemTest86 it's probably good
      1. power off and put DIMM # N_dimm in a static bag for probably good DIMMs.
    2. If it fails in MemTest86 V6.3.0 then put it aside in a static bag for possibly bad DIMMs for now because either
      1. it's broken and should be put out for recycling, or
      2. it's not broken and something else is broken, e.g. slot, traces, memory controller
END DO

If the first 2 DIMMs both failed in slot 1, I'd presume (for now) that the DIMMs might be okay and I would test that DIMM (N_dimm=2) in the other slots (N_slot=2,3,...,8).

I find it handy to record results as I get them in a matrix (Excel FTW :wink: though paper is valid), DIMMS for columns and slots for rows, on systems with more than 2 DIMMs.

I'm still expecting the motherboard to be flakey, but if it POSTs reliably with a minimal hardware config then my guess would be a DIMM. I'd be surprised if more than 1 DIMM has issues, given that it's stable if it POSTs.

Good luck!
Desktop: 750W Snow Silent, X11SAT-F, E3-1270 v5, 32GB ECC, RX 5700 XT, 500GB P1 + 250GB BX100 + 250GB BX100 + 4TB 7E8, XL2730Z + L22e-20
HTPC: X-650, DH67GD, i5-2500K, 4GB, GT 1030, 250GB MX500 + 1.5TB ST1500DL003, KD-43XH9196 + KA220HQ
Laptop: MBP15,2
 
Redocbew
Minister of Gerbil Affairs
Posts: 2495
Joined: Sat Mar 15, 2014 11:44 am

Re: Cold Boot Issues (Topic Renamed)

Thu Mar 17, 2016 10:19 pm

BIF wrote:
I don't have a stock HSF. Can I hook up the H80i to the CPU and on-board power and USB and try to boot, even while the motherboard is "on cardboard"? If not, I'll have to source a HSF, and that could take days for delivery.


Yeah there's no reason why you couldn't. I'd leave the USB cable on the H80i disconnected for now also. Just hook up the fans and the pump to spare fan headers and leave the USB out of it. It's not likely to make a difference, but anything that helps make testing more simple is usually a good thing here. I'm also thinking this is probably a sign of dying motherboard.
Do not meddle in the affairs of archers, for they are subtle and you won't hear them coming.
 
BIF
Minister of Gerbil Affairs
Topic Author
Posts: 2458
Joined: Tue May 25, 2004 7:41 pm

Re: Cold Boot Issues (Topic Renamed)

Sat Mar 19, 2016 12:48 am

Well folks, this may be fixed now.

I had a hell of a time getting it to boot into the bios setup screens, and saw that I was running BIOS update 4505. 4802 is current, so I downloaded it from the Asus website and put it on a thumbdrive.

Then after another half hour of attempts to boot, I tried the "no CPU, no DRAM, no memory, no monitor" method of updating the BIOS, which entails putting the thumbdrive with the boot CAP file (I guess CAPs replace ROMs now) into that white USB port, then pressing and holding the button next to it. That didn't work, the button/light went to steady blue which the Asus support site says is bad.

So I went back to trying to boot into bios. I had only the CPU, one stick of RAM (8GB), a single GPU, the H80i, and an ATAPI optical drive plugged in with a MEMTEST86 CD in it (it seemed to want "something" valid to boot into even just going into ROM). Anyway, after about a half-dozen tries, it booted into BIOS and I was able to apply the 4802 update.

Since then, I have warm-booted and cold-booted about a half-dozen times each with no more beeping. It even boots to the Windows 8.1 SSD and I'm using the system right now to type this post. In fact, F@H even kicked off, and my CPU and the one GPU are making an annoying racket right here on my tabletop.

Right now, the system is all splayed out on cardboard, with the radiator propped on a box and it reminds me of the most recent Robocop movie where the doctor disassembles the poor guy in front of a mirror and all that's left of him is pretty much his head, esophagus, heart, and lungs. Gave me the willies, then and now!

I'm off now to reboot this thing another dozen times before I break everything down and get ready to put it all back into the case. We'll see how it goes, but so far it's very promising.

And I'm very embarrassed that that latest BIOS got past me. :oops: Thankfully, I had a little help from my friends here in the forums. :D
 
biffzinker
Gerbil Jedi
Posts: 1998
Joined: Tue Mar 21, 2006 3:53 pm
Location: AK, USA

Re: Cold Boot Issues (Topic Renamed)

Sat Mar 19, 2016 1:27 am

Good News indeed, hopefully it's problem free from know on. :wink:
It would take you 2,363 continuous hours or 98 days,11 hours, and 35 minutes of gameplay to complete your Steam library.
In this time you could travel to Venus one time.
 
BIF
Minister of Gerbil Affairs
Topic Author
Posts: 2458
Joined: Tue May 25, 2004 7:41 pm

Re: Cold Boot Issues (Topic Renamed)

Sat Mar 19, 2016 1:36 am

It's a huge relief that I won't have to buy another system so soon.

I've booted now about 8 times warm and about 15 times cold. It's late so I may get some shuteye before I power everything down for reassembly. Looking forward to removing some now-unused SSDs from this case, and further decluttering it and getting cable management set up all nice and pretty.

Funny thing though, the water pump has started making little noises. I wonder if he just doesn't like laying flat on the bench and maybe prefers his sideways orientation, since he's been running that way for more than a year now.

No matter, he's gotta come off anyway and get turned 180 degrees...I had to put him on upside down so that I could place the radiator and its hoses away from the graphic card with its hot heat-pipes on top. Once it's all back together, I'll see if the noise persists. It's possible that this is normal and is ordinarily masked by the fans going hell's bells to keep F@H from melting everything. :o
 
biffzinker
Gerbil Jedi
Posts: 1998
Joined: Tue Mar 21, 2006 3:53 pm
Location: AK, USA

Re: Cold Boot Issues (Fixed w/New BIOS)

Sat Mar 19, 2016 1:48 am

Possible the noise is being caused by air making it's way down to the pump or from moving it?
Edit: Now your going to miss out the new components smell.
It would take you 2,363 continuous hours or 98 days,11 hours, and 35 minutes of gameplay to complete your Steam library.
In this time you could travel to Venus one time.
 
BIF
Minister of Gerbil Affairs
Topic Author
Posts: 2458
Joined: Tue May 25, 2004 7:41 pm

Re: Cold Boot Issues (Fixed w/New BIOS)

Sat Mar 19, 2016 1:51 am

That's what I'm thinking, maybe there's a bubble in there.

Since I'm going to be busy with other things all day Saturday, I'm thinking of loading up all the memory and letting it run memtest all day on the cardboard. At this point, I don't think I need to add them one stick at a time since the problem seems to have been corrected by the BIOS update.

Edit: LOL! :D

Who is online

Users browsing this forum: Google [Bot] and 58 guests
GZIP: On