Core keeps dieing (notfred)

Come join the... uh... er... fold.

Moderators: just brew it!, farmpuma

Core keeps dieing (notfred)

Postposted on Mon Jan 12, 2009 3:05 pm

I keep getting the following on one of my notfred pxe clients. I'm wondering if I have a faulty processor or something?

Code: Select all
[18:09:19] Completed 32500 out of 250000 steps  (13%)
[18:21:43] Completed 35000 out of 250000 steps  (14%)
[18:34:08] Completed 37500 out of 250000 steps  (15%)
[18:46:33] Completed 40000 out of 250000 steps  (16%)
[18:58:57] Completed 42500 out of 250000 steps  (17%)
[19:11:21] Completed 45000 out of 250000 steps  (18%)
[19:23:46] Completed 47500 out of 250000 steps  (19%)
[19:36:10] Completed 50000 out of 250000 steps  (20%)
[19:46:28] CoreStatus = 1 (1)
[19:46:28] Client-core communications error: ERROR 0x1
[19:46:28] Deleting current work unit & continuing...
[19:46:41] - Warning: Could not delete all work unit files (2): Core file absent
[19:46:41] Trying to send all finished work units
[19:46:41] + No unsent completed units remaining.
[19:46:41] - Preparing to get new work unit...
[19:46:41] + Attempting to get work packet
[19:46:41] - Will indicate memory of 486 MB
[19:46:41] - Detect CPU. Vendor: GenuineIntel, Family: 6, Model: 15, Stepping: 6
[19:46:41] - Connecting to assignment server
[19:46:41] Connecting to http://assign.stanford.edu:8080/
[19:46:43] Posted data.
[19:46:43] Initial: 40AB; - Successful: assigned to (171.64.65.56).
[19:46:43] + News From Folding@Home: Welcome to Folding@Home
[19:46:43] Loaded queue successfully.
[19:46:43] Connecting to http://171.64.65.56:8080/
[19:46:50] Posted data.
[19:46:50] Initial: 0000; - Receiving payload (expected size: 4841004)
[19:47:08] - Downloaded at ~262 kB/s
[19:47:08] - Averaged speed for that direction ~291 kB/s
[19:47:08] + Received work.
[19:47:08] + Closed connections
[19:47:13]
[19:47:13] + Processing work unit
[19:47:13] Core required: FahCore_a2.exe
[19:47:13] Core found.
[19:47:13] Working on Unit 03 [January 12 19:47:13]
[19:47:13] + Working ...
[19:47:13] - Calling './mpiexec -np 4 -host 127.0.0.1 ./FahCore_a2.exe -dir work/ -suffix 03 -checkpoint 15 -forceasm -verbose -lifeline 510 -version 602'

[19:47:13]
[19:47:13] *------------------------------*
[19:47:13] Folding@Home Gromacs SMP Core
[19:47:13] Version 2.01 (Wed Aug 13 13:11:25 PDT 2008)
[19:47:13]
[19:47:13] Preparing to commence simulation
[19:47:13] - Ensuring status. Please wait.
[19:47:22] - Assembly optimizations manually forced on.
[19:47:22] - Not checking prior termination.
[19:47:24] - Expanded 4840492 -> 23982741 (decompressed 495.4 percent)
[19:47:24] Called DecompressByteArray: compressed_data_size=4840492 data_size=23982741, decompressed_data_size=23982741 diff=0
[19:47:24] - Digital signature verified
[19:47:24]
[19:47:24] Project: 2669 (Run 6, Clone 123, Gen 58)
[19:47:24]
[19:47:24] Assembly optimizations on if available.
[19:47:24] Entering M.D.
[19:59:59] Completed 2500 out of 250000 steps  (1%)


More to the point this part:

Code: Select all
[19:36:10] Completed 50000 out of 250000 steps  (20%)
[19:46:28] CoreStatus = 1 (1)
[19:46:28] Client-core communications error: ERROR 0x1
[19:46:28] Deleting current work unit & continuing...
[19:46:41] - Warning: Could not delete all work unit files (2): Core file absent


It seems that it can't get through a WU for some reason. Just craps out at around %20 every time. Any ideas?
Image
Shinare
Gerbil XP
 
Posts: 352
Joined: Wed Jul 06, 2005 12:48 pm

Re: Core keeps dieing (notfred)

Postposted on Mon Jan 12, 2009 9:34 pm

See http://fahwiki.net/index.php/Error_0x0_and_0x1

Have you run a memtest on it? How about increasing the amount of RAM (it's reporting 486MB)?
notfred
Grand Gerbil Poohbah
 
Posts: 3742
Joined: Tue Aug 10, 2004 10:10 am
Location: Ottawa, Canada

Re: Core keeps dieing (notfred)

Postposted on Mon Jan 12, 2009 11:25 pm

Thanks for that, will run a memtest on it all day tomorrow. They are stock Dells, so if its bad memory I'll just have them replace it. :)
Image
Shinare
Gerbil XP
 
Posts: 352
Joined: Wed Jul 06, 2005 12:48 pm

Re: Core keeps dieing (notfred)

Postposted on Wed Jan 14, 2009 1:54 pm

I ran memtest86+ the entire day with no errors, but decided to replace the memory anyway on your suggestion. That seems to have solved the problem. No core error since. I did replace the 512MB with 1GB so I'm guessing 512MB isnt enough to fold on as all the other computers that are not having problems have 1GB in them.

Thanks again for the solution!
Image
Shinare
Gerbil XP
 
Posts: 352
Joined: Wed Jul 06, 2005 12:48 pm


Return to TR Distributed Computing Effort

Who is online

Users browsing this forum: No registered users and 2 guests