Server 2008 R2 - Random Shutdowns

Monopoly money comes in many flavors: 7, Vista, XP, 2K, ME, 98, etc.

Moderators: Flying Fox, Ryu Connor

Server 2008 R2 - Random Shutdowns

Postposted on Thu Jun 09, 2011 8:52 pm

Hey guys, I'm starting to get a bit peeved while trying to figure out and issue i'm having with a server I built for a client roughly 6+ months ago. Its been running faithfully up until the last 2 days where the system has completely shut down around the same time on two different days.

**EDIT**: System Specs
Operating System
MS Windows Server 2008 R2 Standard 64-bit SP1
CPU
AMD Phenom II X6 1090T 33 °C
Thuban 45nm Technology
RAM
8.00 GB Single-Channel DDR3 @ 533MHz (7-7-7-20)
Motherboard
MSI 890FXA-GD70 (MS-7640) (CPU1) 37 °C
Graphics
831W (1024x768@1Hz)
LogMeIn Mirror Driver
Standard VGA Graphics Adapter
Hard Drives
977GB Seagate ST31000524NS ATA Device (SATA) 29 °C
488GB Seagate ST3500514NS ATA Device (SATA) 26 °C
977GB Seagate ST31000524NS ATA Device (SATA) 30 °C

As you can see the system is running fairly cool, it is on a Tripplite UPS, the first time it wasn't because we thought the UPS wasn't working properly. I just tested it by unplugging the system into it and unplugging it from the wall (ran the server on the UPS for well over 30 minutes with no issue). The server is in the location where people would hear the beeping in the UPS were to run out of battery or be running on it. No one reported hearing anything and they are constantly next to it. I was over there hooking up the UPS about and hour or so before the system shutdown again so I know everything was working fine before leaving. The two 1TB drives are in a mirror raid setup and are only used for raw data. Other main services that are running are Acronis Backup and Recovery 10 going to a 2TB USB 3.0 IOSafe, and a Foxpro program they use for their inventory software. Most users are going home right about 30 minutes before it shuts off so it wasn't under any special load either. I am very stumped by this, anyone have a clue?

The first was around 4:32pm yesterday and the second was at 4:47pm today. Checking the "System" event log shows the following.

Information 6/9/2011 5:14:42 PM Kernel-General 12 None
Information 6/9/2011 4:26:17 PM Service Control Manager 7036 None
Information 6/9/2011 4:02:19 PM Service Control Manager 7036 None
Information 6/9/2011 4:00:43 PM Service Control Manager 7036 None
Information 6/9/2011 3:55:30 PM Service Control Manager 7036 None

The last 3 events refer to different services going into a stop state, shouldn't be anything out of the ordinary there.

Diagnostic System Host
Application Experince
WinHTTP Web Proxy Auto-Discovery
Windows Modules Installer

The only difference in the event logs from todays (above list) and yesterdays was this single event log that was dead last one before the system had been rebooted.

Warning 6/8/2011 3:10:41 PM Kerberos-Key-Distribution-Center 29 None

***The Key Distribution Center (KDC) cannot find a suitable certificate to use for smart card logons, or the KDC certificate could not be verified. Smart card logon may not function correctly if this problem is not resolved. To correct this problem, either verify the existing KDC certificate using certutil.exe or enroll for a new KDC certificate.***
"I think there is a world market for maybe five computers."
Thomas Watson, chairman of IBM, 1943

i5-2500K|Asus P67 Sabertooth|16GB Corsair 1600|MSI 7850 2GB|250gb Evo 840|Corsair 400R|ET750w PSU|Logitech G5|Dell 2420L|Corsair Vengeance 1300
Welch
Minister of Gerbil Affairs
Gold subscriber
 
 
Posts: 2675
Joined: Thu Nov 04, 2004 5:45 pm
Location: Fairbanks, Alaska

Re: Server 2008 R2 - Random Shutdowns

Postposted on Thu Jun 09, 2011 10:01 pm

Could you temporarily exchange the RAM and PSU? Test the RAM with MemTest in another computer, and test the PSU with whatever you computer professionals use?
wibeasley
Gerbil Elite
Gold subscriber
 
 
Posts: 952
Joined: Sat Mar 29, 2008 3:19 pm
Location: Norman OK

Re: Server 2008 R2 - Random Shutdowns

Postposted on Thu Jun 09, 2011 10:37 pm

Ram would be my first guess as wibeasley already said. Possibly the power supply but that wouldn't be my first guess.

Temps look good so overheating shouldn't be the cause. Might check the smart values on the HDDS?

I'm stumped if it isn't the memory.

Edit: I had a desktop doing this and it turned out to be the fan in the psu going out. It would fail at random times.
DrkSide
Gerbil
 
Posts: 82
Joined: Thu Sep 16, 2010 9:08 pm

Re: Server 2008 R2 - Random Shutdowns

Postposted on Thu Jun 09, 2011 11:11 pm

i bet it is the mobo...many people on numerous forums are reporting problems when running an x6 cpu. most people are complaining about their system being unstable or completely dying when overclocked at all. some people have zero problems and aren't affected, while others report a stock system just plain dying after 2-3 days of regular use. seems to be the PWMs on the mobo can't quite handle the voltage required for the x6's, and stressing them brings out this weakness rather quickly.

i have no idea if the amd's new 900 series mobos have better power delivery and will compensate for this problem. everything points to it being the same silicon as the 800 series chipsets but with an am3+ socket, and early reviews are mixed so far. out of desperation you can try a BIOS update, but i think you should be looking for a different motherboard.
ryko
Gerbil Team Leader
 
Posts: 235
Joined: Tue Feb 27, 2007 3:58 pm
Location: new york

Re: Server 2008 R2 - Random Shutdowns

Postposted on Fri Jun 10, 2011 12:31 am

There has to be more errors and information in the error log if the system is shutting down and coming back up.

When you mean shut down is it crashing or does it go through the shutdown process?

The same time seems like something is making it shutdown, maybe Acronis is telling the system to turn off after a backup completes?
paco
Minister of Gerbil Affairs
 
Posts: 2082
Joined: Wed Jul 21, 2004 7:14 pm
Location: So Cal

Re: Server 2008 R2 - Random Shutdowns

Postposted on Fri Jun 10, 2011 5:45 pm

Thanks for the heads up guys, most of what you guys were saying I had thought about (RAM/PSU/ECT) However I wasn't aware of anything being wrong with the 125w TDP AMD Chips on the 8xx series chipsets. Where is this information coming from and is it primarily users who are attempting to overclock?

As for it shutting down, yes it would log it if it was a triggered system shut down, however its a complete power off, as though someone were unplugging the system. This is why there are no logs about what was going on. I was hoping to find something in the system logs regarding services having errors or something, anything to indicate that something was causing this instant power off. Nothing showed up so I'm SoL. If it had been a ram error then I would have more than likely seen a memory dump somewhere. I've never had ram make a system shut down like that and stay off. Right now (as of last night) I took the server off of the UPS and plugged it straight into the wall. I did pickup on a new APC 1250UA UPS to put it on to rule out the first 2 possibilities (Brown out/faulty UPS).

I'm leaning the same direction that some of you are, which is that It may be the MSI boards ability to deliver power to the CPU. This should be re-creatable though if I were to run something such a CPU Burn-IN for about an hour or so. If the CPU is at 100% utilization and it doesn't crap out for 1 hour then I'd find it hard to believe its the power on the motherboard. At that rate though, the PSU could be a culprit. Since I've got a Corsair 750w PSU in this thing, which with the system's current configuration, can easily handle 2 of these systems and not be at full load (Assuming the 15-20% efficiency of a PSU). I think this is a Silver certified, 85% efficient PSU (On paper of course)

This is the PSU I've got in the beast.
http://www.newegg.com/Product/Product.a ... 6817139010

I'm going to see what happens today around that same time, I'll let you guys know what I find out next. I'll check the voltages on the 12v rails and if something seems out of spec I've got an Etasis 750w brand new that I can throw in to try out. Again thanks guys :), the client isn't to happy right now, understandably so, but sometimes **** happens outside of the scope of anyone's foresight.
"I think there is a world market for maybe five computers."
Thomas Watson, chairman of IBM, 1943

i5-2500K|Asus P67 Sabertooth|16GB Corsair 1600|MSI 7850 2GB|250gb Evo 840|Corsair 400R|ET750w PSU|Logitech G5|Dell 2420L|Corsair Vengeance 1300
Welch
Minister of Gerbil Affairs
Gold subscriber
 
 
Posts: 2675
Joined: Thu Nov 04, 2004 5:45 pm
Location: Fairbanks, Alaska

Re: Server 2008 R2 - Random Shutdowns

Postposted on Fri Jun 10, 2011 6:05 pm

Welch wrote:As you can see the system is running fairly cool, it is on a Tripplite UPS, the first time it wasn't because we thought the UPS wasn't working properly. I just tested it by unplugging the system into it and unplugging it from the wall (ran the server on the UPS for well over 30 minutes with no issue). The server is in the location where people would hear the beeping in the UPS were to run out of battery or be running on it. No one reported hearing anything and they are constantly next to it. I was over there hooking up the UPS about and hour or so before the system shutdown again so I know everything was working fine before leaving. The two 1TB drives are in a mirror raid setup and are only used for raw data. Other main services that are running are Acronis Backup and Recovery 10 going to a 2TB USB 3.0 IOSafe, and a Foxpro program they use for their inventory software. Most users are going home right about 30 minutes before it shuts off so it wasn't under any special load either. I am very stumped by this, anyone have a clue?


When does the janitorial staff come in?

PS: Amateurs build, professionals buy. Now you know why.
#182 TT: 13/DNVT, Precedence: Flash Override. Switch: Node Center. MSE forever.
Contingency
Gerbil Jedi
 
Posts: 1533
Joined: Sat Jun 19, 2004 4:03 pm
Location: al.us

Re: Server 2008 R2 - Random Shutdowns

Postposted on Tue Jun 21, 2011 4:21 pm

I hope that last message (The P.S) was a signature, because its utter and complete crap :). I have built and purchased servers, less of the building but so far with the purchases of Dell 2900's I've had 3 separate issues on their different servers, one resulting in them sending a new motherboard, another was just ram (easy when you've got 24gigs in a system :P, and the other was a PERC controller that went south after a few months. So to claim that Pros buy is one of the most laughable things I've seen posted in some time. You take your chance no matter what you do, one just cost a hell of a lot less if you don't need a dual CPU setup.

After those 2 days and putting the system through the stress tests nothing showed up odd with it. I also confirmed on the 3rd day that the power didn't go out at their location but it did go out again at the owners house at the same exact time. I called the electrical company and they took a look at everything on Monday and we haven't heard from them since. Funny thing is, no brown outs have occurred since their little visit. Its very common of our "Non-Profit" power company in town to try and fix an issue all while telling us there isn't an issue or wasn't one at all. My original diagnosis of brown-outs stands as I can't find any evidence at all that its hardware and have not had a single issue since.
"I think there is a world market for maybe five computers."
Thomas Watson, chairman of IBM, 1943

i5-2500K|Asus P67 Sabertooth|16GB Corsair 1600|MSI 7850 2GB|250gb Evo 840|Corsair 400R|ET750w PSU|Logitech G5|Dell 2420L|Corsair Vengeance 1300
Welch
Minister of Gerbil Affairs
Gold subscriber
 
 
Posts: 2675
Joined: Thu Nov 04, 2004 5:45 pm
Location: Fairbanks, Alaska

Re: Server 2008 R2 - Random Shutdowns

Postposted on Tue Jun 21, 2011 10:51 pm

Welch wrote:I hope that last message (The P.S) was a signature, because its utter and complete crap :). I have built and purchased servers, less of the building but so far with the purchases of Dell 2900's I've had 3 separate issues on their different servers, one resulting in them sending a new motherboard, another was just ram (easy when you've got 24gigs in a system :P, and the other was a PERC controller that went south after a few months. So to claim that Pros buy is one of the most laughable things I've seen posted in some time. You take your chance no matter what you do, one just cost a hell of a lot less if you don't need a dual CPU setup.


That comment was directed at you. I harbor no illusions regarding the reliability of OEM hardware.
Venn Diagram contrasting the disadvantages of OEM vs custom built solutions:

------------------------------
higher price
=====================
hardware failure (capacitor plague can happen to anyone)
software incompatibility
------------------------------
hardware design flaws (680i memory controller, Sandy Bridge SATA issues--although OEMs are not immune, the possibility of recourse is greater)
warranty support (BFG, typical Taiwanese RMA turnaround time)
unreliable hardware lifecycle (680i vanishing from the market)
hardware incompatibility (finicky memory, RAID drive support)
lack of documentation (the number of people who run the same platform as you is likely an order of magnitude smaller at best)
=====================

The mark of an effective troubleshooter is not knowing everything that can possibly go wrong, but rather the ability to discard causes, which results in faster resolution. Going with OEM allows many of the above concerns to be mitigated or ruled out completely. Example: If a server I am supporting from 200 miles away has a problem with its RAID array, is my time better spent with 1) Dell Gold Tech support, who will dispatch a tech and replacement hardware to a remote site, overnight if necessary, or 2) two replacement drives and two weeks later, finding out that the shipping firmware of the Samsung F4EG drive includes a nasty data corruption bug? When your two socket Opteron board fails, do you drive down to Best Buy? Do they stock Fibre Channel adapters right below the Monster Cables? Keeping spare parts on hand adds up, and if you don't, then you haven't been doing this long enough to get burned yet.
#182 TT: 13/DNVT, Precedence: Flash Override. Switch: Node Center. MSE forever.
Contingency
Gerbil Jedi
 
Posts: 1533
Joined: Sat Jun 19, 2004 4:03 pm
Location: al.us

Re: Server 2008 R2 - Random Shutdowns

Postposted on Sat Jun 25, 2011 3:09 am

Some of your points are valid..... however I'd like to challenge a few of them that seem to be integral to your argument.



software incompatibility? I've had this issue with many Dell systems, especially their beloved driver download section that likes to list 5-10+ network cards, some of which appear to be duplicates in which case none are marked to signify any difference. Its like playing Russian Roulette with drivers, sometimes they work, other times they don't.

Hardware design flaws? Happens to Dell too... SX280 (yes caps... ive replaced so many I could puke) their XPS laptops with defective Nvidia video cards... yep. As I said earlier, defective SCSI planar boards known to have design issues from Dell Precision workstations, the list is equally as long.

Warranty.... I'll have so to say you may have me on this one as I've NEVER had to call any of the manufacturers to return hardware. None of the motherboards or other hardware that i've used in customers systems have had to be returned due to premature failure or defectiveness. I'm not claiming that it wont happen, Ive just been fortunate. I can't however say ive been as lucky with some of the OEM guys stuff, Dell and HP alike.

Unreliable hardware life-cycle... Not had an issue with that yet either, in most cases there is a similar product that can easily take its place if something were to be unavailable or scarce after a failure. If you look at OEMs though, they are just like car manufacturers, they are dedicated to releasing the newest models that are modified on a strict schedule each year. Today's version "XX" might not be compatible with tomorrow's version "XX". What happens when they can't get a certain part, or its on back order.

Hardware incompatibility???? Like ram per se? Funny thing is that those beloved Dells that I work with all too often are the worst at this. So I have to what? Order out and wait for dell to send me the proper ram at an inflated price? No thanks.

Documentation...... not even going there. Try making sense of Dell's documentation on their PowerEdge 2900 and its memory configurations. Funny thing is the documentation is actually incorrect as I found out after about an hour of swapping DIMMS.

Provided, there is likely to be few if anyone out there running the same configuration as I am just like you said. Which admittedly may make it slightly more difficult to pin point an exact issue that I'm having that a few 1000 may have experienced using OEM hardware.


To rap all of that up it doesn't help that when I was in a time of need and Dell did have EXACTLY what I needed (hardware... the SCSI planar/back-plane for the hot swappable drives) they sent me a freaking laptop Video card. That took 1 week on rush order..... We had to send that back first for them to send the correct part.... What, they didn't send the right part the second time either... thats right a SATA raid add-in card. Only on the 3rd shot did they get it right and it was already 3 weeks into the project. We had to use Acronis images restored to differential hardware in order to give them a temporary server. Which is my solution to most of your above hardware issues. See, even if for some reason Asus stopped making motherboards that worked with my customers current config, I could simply have an entire new/temp system in place with everything as it was within and hour or two. Ever since I got on the Acronis bandwagon, Differential hardware doesn't scare me into buying OEM.
"I think there is a world market for maybe five computers."
Thomas Watson, chairman of IBM, 1943

i5-2500K|Asus P67 Sabertooth|16GB Corsair 1600|MSI 7850 2GB|250gb Evo 840|Corsair 400R|ET750w PSU|Logitech G5|Dell 2420L|Corsair Vengeance 1300
Welch
Minister of Gerbil Affairs
Gold subscriber
 
 
Posts: 2675
Joined: Thu Nov 04, 2004 5:45 pm
Location: Fairbanks, Alaska

Re: Server 2008 R2 - Random Shutdowns

Postposted on Sat Jun 25, 2011 1:20 pm

6 months = 180 days. after that, 2008 server shuts off hourly, without proper activation.

time to pony up $$ to MS for a license for this error to go away.
multi_core
Gerbil In Training
 
Posts: 2
Joined: Wed Nov 25, 2009 4:48 pm

Re: Server 2008 R2 - Random Shutdowns

Postposted on Sat Jun 25, 2011 2:16 pm

multi_core wrote:6 months = 180 days. after that, 2008 server shuts off hourly, without proper activation.

time to pony up $$ to MS for a license for this error to go away.


R2 gives you 30 days but won't shutdown the OS. It gives a nag message when logging off or on and you can only get critical updates when using windows update.
| May the forces of evil become confused on the way to your house |
dolemitecomputers
Minister of Gerbil Affairs
 
Posts: 2605
Joined: Wed Dec 26, 2001 7:00 pm
Location: Utah


Return to Windows

Who is online

Users browsing this forum: No registered users and 3 guests