New diskless folding suite released

Come join the... uh... er... fold.

Moderators: just brew it!, farmpuma

New diskless folding suite released

Postposted on Wed Sep 19, 2007 1:57 pm

Well, it's finally done, major rewrite time, all available at http://reilly.homeip.net/folding

Here's the list of changes:
New Features
Detects 32 bit or 64 bit, one image for both.
Remote Reboot capability.
Backup and restore to up to 2 USB drives (in addition to the backup and restore to TFTP that it has always done.
Installl capability to up to 2 USB drives on boot.
More info at boot to make it more understandable what it is doing, no longer logs to screen as it runs.
Detects VirtualPC in addition to detecting VMWare before deciding to run NTP or not.

Bug Fixes
Fixed detecting extra processors if they had "processor" in the model name.
Work with any network interface (eth0 - eth9) and not just eth0.
Really finally remove hangcheck timer from 64bit kernel (I hope!).
Remove -advmethods from SMP client as may run out of WUs and not fall back properly.
Added -verbosity 9 to both clients to help debug.
Upgrade to kernel 2.6.22.1 for more network device support.
Change from an initrd to an initramfs - no need to extract source as root and dynamic resizing of the rootfs.
Change to Makefile based build and install system.

Future Plans
Combine benchmark CD into image so it gets all these benefits, plus ability to select which one to run at boot time.
Upgrade to latest version of FAH Client and add all the parameters as configurable.
Add Samba so WU is accessible from other PCs for monitoring etc.

I've got family staying over the next 2 weeks, so please let me know what doesn't work and I'll try to fix it when I get a chance.
notfred
Grand Gerbil Poohbah
 
Posts: 3751
Joined: Tue Aug 10, 2004 10:10 am
Location: Ottawa, Canada

Postposted on Wed Sep 19, 2007 2:24 pm

:o
Let me be the first to say thanks and we very much appreciate all the work you do.
Usacomp2k3
Gerbil God
 
Posts: 21316
Joined: Thu Apr 01, 2004 4:53 pm
Location: Orlando, FL

Postposted on Wed Sep 19, 2007 3:00 pm

You are awesome NotFred! Have fun with your family. :D
Join UGN's Drive to the Top!
Image
UnitedGerbilNation wants you!!
jeffry55
Grand Gerbil Poohbah
 
Posts: 3181
Joined: Sat Oct 30, 2004 4:38 pm
Location: Menlo Park - just down the street from the F@H Servers!

Postposted on Wed Sep 26, 2007 1:55 pm

Is it me, or is notfred's site down at the moment?

I finally got an evening off and I cant get to the new 64 bit diskless client :(
psychojoy
Gerbil
 
Posts: 31
Joined: Sat Sep 08, 2007 11:28 pm

Postposted on Wed Sep 26, 2007 2:30 pm

psychojoy wrote:Is it me, or is notfred's site down at the moment?

I finally got an evening off and I cant get to the new 64 bit diskless client :(


It's not just you. I can't get to it either. :cry:
Join UGN's Drive to the Top!
Image
UnitedGerbilNation wants you!!
jeffry55
Grand Gerbil Poohbah
 
Posts: 3181
Joined: Sat Oct 30, 2004 4:38 pm
Location: Menlo Park - just down the street from the F@H Servers!

Postposted on Wed Sep 26, 2007 4:01 pm

It works for me.
Fold! And I don't mean your clothes!

Do you have a favorite gerbil recipe? Please share with the TR community!
flybywire
Gerbil Jedi
 
Posts: 1883
Joined: Wed Jun 16, 2004 2:28 pm
Location: Springfield, VA - USA

Postposted on Wed Sep 26, 2007 4:02 pm

Now the site that is up doesnt have the 64bit SMP stuff on it :S
psychojoy
Gerbil
 
Posts: 31
Joined: Sat Sep 08, 2007 11:28 pm

Postposted on Wed Sep 26, 2007 4:28 pm

I believe it auto-detects 32-bit or 64-bit at boot time.
(this space intentionally left blank)
just brew it!
Administrator
Gold subscriber
 
 
Posts: 37886
Joined: Tue Aug 20, 2002 10:51 pm
Location: Somewhere, having a beer

Thanks

Postposted on Wed Sep 26, 2007 6:15 pm

The new rewrite is fantastic. The client seems to be able to create the backup files on Tftpd32 without needing empty files created manually as targets -- that's a major plus, IMO.

The USB backup is dandy too. I have one system that just can't write to the Tftpd server successfully, no matter what I try. It either sends garbage or 0 bytes. I'm guessing there is something quirky with the linux driver for OEM ethernet port on this Lenovo laptop. No matter, though -- I just stick the USB drive in it and it works like a charm. Very elegant.

Love what you do. Please keep it up.
Optimum1
Gerbil
 
Posts: 13
Joined: Wed Jun 13, 2007 11:57 am

Postposted on Wed Sep 26, 2007 9:34 pm

Sorry about the server down time - it's my all in one server and it seems the cut over between two shows recording on MythTv crashed it for once. That was about 12:30 pm and I didn't notice until about 4pm (all EDT).

psychojoy - JBI is correct, it's just one image that autodetects 32bit or 64bit and how many processors - you need to be 64bit and 2 or more processors for it to run SMP otherwise it will run the standard client.
notfred
Grand Gerbil Poohbah
 
Posts: 3751
Joined: Tue Aug 10, 2004 10:10 am
Location: Ottawa, Canada

Problems when running on more than 1 machine

Postposted on Thu Sep 27, 2007 3:42 am

notfred-

I started running the lastest CD a few days ago on 1 box and....AWESOME!!!!

2 days later fired it up on another box. When both are running WUs hang at 100%...

Switched back to running on 1 box and works fine...

Any ideas?

...USB backup is NICE
theMASS
Gerbil First Class
 
Posts: 132
Joined: Thu Sep 27, 2007 3:24 am

Re: Problems when running on more than 1 machine

Postposted on Thu Sep 27, 2007 5:42 am

theMASS wrote:notfred-

I started running the lastest CD a few days ago on 1 box and....AWESOME!!!!

2 days later fired it up on another box. When both are running WUs hang at 100%...

Switched back to running on 1 box and works fine...

Any ideas?

...USB backup is NICE


The hang at 100% is a bug in the linux F@H 64bit SMP client. Notfred can't fix that, but having that reboot function sure helps :D
Firestarter
Gerbil XP
 
Posts: 490
Joined: Sun Apr 25, 2004 11:12 am

Postposted on Thu Sep 27, 2007 6:51 am

Something i noticed last night....

Whilst booting up the PXE folders - they pick up IP addresses and go through the motions and then they stop at part where it says the "network is unreachable"

I can view the IP of the folder over html and reboot it using the remote reboot tool, but clicking on the unitinfo gives me a 404 error.

I dont think they are folding yet :(
psychojoy
Gerbil
 
Posts: 31
Joined: Sat Sep 08, 2007 11:28 pm

Re: Problems when running on more than 1 machine

Postposted on Thu Sep 27, 2007 11:41 am

Firestarter wrote:The hang at 100% is a bug in the linux F@H 64bit SMP client. Notfred can't fix that, but having that reboot function sure helps :D

Someone suggested that the 100% hang can result from having the same hostname on 2+ machines on the same LAN segment. Can anyone confirm or dismiss this as a cause for the hang? ...or is anyone running 2+ "notfred" SMP boxes booting from CD? EDIT: or USB Stick or Diskless?

Anyway to change default hostname?
theMASS
Gerbil First Class
 
Posts: 132
Joined: Thu Sep 27, 2007 3:24 am

Postposted on Thu Sep 27, 2007 12:02 pm

I was just wondering if the expiration dates for the beta clients are going to be a factor from this new release?
Image
Nitrodist
Grand Gerbil Poohbah
 
Posts: 3280
Joined: Wed Jul 19, 2006 1:51 am
Location: Minnesota

Postposted on Thu Sep 27, 2007 4:37 pm

I thought I'd rebuild the server box to eliminate any problems.

I installed xubuntu 6.06 with the dhcp3-server and tftpd-hpa and the clients are picking up IP addresses but not the pxelinux.0 package over PXE. This is a worse state than i was in before!

I dont understand why this stuff isnt working :(
psychojoy
Gerbil
 
Posts: 31
Joined: Sat Sep 08, 2007 11:28 pm

Re: Problems when running on more than 1 machine

Postposted on Thu Sep 27, 2007 6:44 pm

theMASS wrote:
Firestarter wrote:The hang at 100% is a bug in the linux F@H 64bit SMP client. Notfred can't fix that, but having that reboot function sure helps :D

Someone suggested that the 100% hang can result from having the same hostname on 2+ machines on the same LAN segment. Can anyone confirm or dismiss this as a cause for the hang? ...or is anyone running 2+ "notfred" SMP boxes booting from CD? EDIT: or USB Stick or Diskless?

Anyway to change default hostname?


I can't definitely confirm or dismiss this, but I can tell you this: when I went into TFTPD32 and filled in the domain name field, then rebooted the stuck client, the client connected and uploaded its results right away. As a filthy dirty workaround, this might be the ticket. When a client hangs at 100%, change the domain name sent to the client by the DHCP server, and it will change the hostname as a side effect.


Edit to add: I do have two notfred systems on my network -- one saving only to tftp, running the old style 32-bit client; and one new dual core machine running the new style 32/64 auto client. Could this be related to the problem I'm having getting the new system to write to the tftp server? Is some weirdness related to duplicate hostnames screwing things up? If so, is there a quick and dirty way to have the boot disk create a unique hostname; like mangling the MAC address into a unique ID, and prepending it onto the domain supplied by the DHCP server?
Optimum1
Gerbil
 
Posts: 13
Joined: Wed Jun 13, 2007 11:57 am

Postposted on Thu Sep 27, 2007 7:15 pm

I can't definitely confirm or dismiss this, but I can tell you this: when I went into TFTPD32 and filled in the domain name field, then rebooted the stuck client, the client connected and uploaded its results right away. As a filthy dirty workaround, this might be the ticket. When a client hangs at 100%, change the domain name sent to the client by the DHCP server, and it will change the hostname as a side effect.

You're running a Diskless setup? You don't always get the 100% hang? I got it 100% of the time when running 2 machines. Never when running just 1. That would be a REALLY dirty filthy workaround since I want to run it on 6 boxes. (It would be more like a "RUNaround":))

Also I'm not saving to tftp... just USB.

No kids tonight... I'll play around with "my" toys and if I figure anything out I'll let you know.
Image
theMASS
Gerbil First Class
 
Posts: 132
Joined: Thu Sep 27, 2007 3:24 am

Postposted on Fri Sep 28, 2007 9:31 am

theMASS wrote:
I can't definitely confirm or dismiss this, but I can tell you this: when I went into TFTPD32 and filled in the domain name field, then rebooted the stuck client, the client connected and uploaded its results right away. As a filthy dirty workaround, this might be the ticket. When a client hangs at 100%, change the domain name sent to the client by the DHCP server, and it will change the hostname as a side effect.

You're running a Diskless setup? You don't always get the 100% hang? I got it 100% of the time when running 2 machines. Never when running just 1. That would be a REALLY dirty filthy workaround since I want to run it on 6 boxes. (It would be more like a "RUNaround":))

Also I'm not saving to tftp... just USB.

No kids tonight... I'll play around with "my" toys and if I figure anything out I'll let you know.


I am running one old diskless setup, and one hybrid -- backing up to the USB and trying to backup to the tftp as well. The new system, running the 64 bit client does two weird things: it hangs at 100% (until the domain change workaround is performed), and it also has trouble writing good backups to tftp server. I have found that tweaking up the retries and timeout settings has helped remedy this, although it still seems to bomb out with a "rcvd packet too short" message periodically. Interestingly, the 64 backups are close to 50 MB in size, where the backup for old client rarely exceed 5 MB in size.

You may not have too much of a runaround with my suggested workaround -- the systems will all keep their distinct hostnames until their DHCP leases expire -- and that may allow you to get several days of work units turned in before you have to do the reboot-two-step-shuffle-boogie again.
Optimum1
Gerbil
 
Posts: 13
Joined: Wed Jun 13, 2007 11:57 am

Postposted on Fri Sep 28, 2007 12:10 pm

Hmm, interesting. I'll take a look at the hostname thing when I get a chance but probably not for another week - my grandmother is staying with us in my study and my main PC just blew a power supply. Still that means an upgrade from an XP 2500+ to a quad 6600 when I get in there to get it folding.

I suspect they are getting the same hostname but I can probably easily fix that by just appending the last octet of the IP address to the hostname.
notfred
Grand Gerbil Poohbah
 
Posts: 3751
Joined: Tue Aug 10, 2004 10:10 am
Location: Ottawa, Canada

Postposted on Fri Sep 28, 2007 12:24 pm

psychojoy wrote:I thought I'd rebuild the server box to eliminate any problems.

I installed xubuntu 6.06 with the dhcp3-server and tftpd-hpa and the clients are picking up IP addresses but not the pxelinux.0 package over PXE. This is a worse state than i was in before!

I dont understand why this stuff isnt working :(


Have you checked the permissions on /var/lib/tftpboot/PXEClient and the pxelinux.0 file itself?
Optimum1
Gerbil
 
Posts: 13
Joined: Wed Jun 13, 2007 11:57 am

Postposted on Fri Sep 28, 2007 4:44 pm

I seem to have spoken too soon... When I got home yesterday I found the hang @ 100% while running only 1 box. So it may have been coincidental that I was getting the hang only when running 2 machines. I had some other non related computer issues that I had to deal with so I haven't had a chance to play around with the folding boxes yet.

One thing that was different and I don't know if it has anything to do with single vs. multiple boxes running -- Upon reboot the WU was sent no problem... when I had the issue before qfix had be run on the backedup WUs (actually a backedup backup:)) to save them otherwise the client would trash the WU and start over at 0%.

Probably won't have a chance to "screw around" much until after the weekend.
Image
theMASS
Gerbil First Class
 
Posts: 132
Joined: Thu Sep 27, 2007 3:24 am

Postposted on Fri Sep 28, 2007 5:11 pm

Optimum1 wrote:Have you checked the permissions on /var/lib/tftpboot/PXEClient and the pxelinux.0 file itself?


I chmod'ed them all to 777 and still no response. :(
psychojoy
Gerbil
 
Posts: 31
Joined: Sat Sep 08, 2007 11:28 pm

Postposted on Fri Sep 28, 2007 5:24 pm

psychojoy wrote:
Optimum1 wrote:Have you checked the permissions on /var/lib/tftpboot/PXEClient and the pxelinux.0 file itself?


I chmod'ed them all to 777 and still no response. :(


I hope you don't mind me suggesting some really basic things, but I have no idea what your experience level is. That said, have you gone back and checked, checked and re-checked every config setting in the How-to, making sure there isn't a single misplaced space, line break, typo? Have you restarted the DHCP and TFTPD services, and tailed the logs for errors after each restart?

I hope you can manage to get going....these setups are low maintenance once they're up and running. Good luck.
Optimum1
Gerbil
 
Posts: 13
Joined: Wed Jun 13, 2007 11:57 am

Postposted on Fri Sep 28, 2007 5:54 pm

Optimum1 wrote:
psychojoy wrote:
Optimum1 wrote:Have you checked the permissions on /var/lib/tftpboot/PXEClient and the pxelinux.0 file itself?


I chmod'ed them all to 777 and still no response. :(


I hope you don't mind me suggesting some really basic things, but I have no idea what your experience level is. That said, have you gone back and checked, checked and re-checked every config setting in the How-to, making sure there isn't a single misplaced space, line break, typo? Have you restarted the DHCP and TFTPD services, and tailed the logs for errors after each restart?

I hope you can manage to get going....these setups are low maintenance once they're up and running. Good luck.


My linux knowledge is fairly poor. But my DHCP set up is exactly right - this i am sure of.

tftpd-hpa on the other hand I know nothing about. I installed the package and I assume it runs. If i do a "sudo /etc/init.d/tftpd-hpa restart" i get no feedback and no change in the PXE booting stuff.

Where are the logs?
psychojoy
Gerbil
 
Posts: 31
Joined: Sat Sep 08, 2007 11:28 pm

Postposted on Fri Sep 28, 2007 6:06 pm

I found the logs in /var/log/syslog

DHCP sends its discover/offer/request/ack all fine. There are no errors in it regarding the network boot.

I still cant find any mention of tftpd-hpa in the syslogs!
psychojoy
Gerbil
 
Posts: 31
Joined: Sat Sep 08, 2007 11:28 pm

Postposted on Fri Sep 28, 2007 8:05 pm

Do you have a display hooked up, so you can watch the pxe boot sequence. You haven't mentioned any of the messages that display as the pxe client tries to boot...are you seeing something like this?

DHCP MAC ADDR: 00 D4 B7 88 01 48
PXE-EA1: No PXE server found, using standard boot file.
IP ADDR: 10.150.0.51
PXE-E32: TFTP open timeout.
PXE-E32: TFTP open timeout.
PXE-M0F: Exiting LANDesk (R) Service Agent II


We should rule out all the BIOS setup possibilities by verifying the boot screen messages.

Some older pxe stacks have problems with current standards in pxelinux. There are some workarounds, but let's see what your boot messages are first.
Optimum1
Gerbil
 
Posts: 13
Joined: Wed Jun 13, 2007 11:57 am

Postposted on Sat Sep 29, 2007 3:21 am

The messages i get are very similar to what you have described.

CLIENT MAC ADDR: blah blah GUID: blah blah
CLIENT IP: 172.20.0.15 MASK: 255.255.255.0 DHCP IP:172.20.0.5
GATEWAY IP: 172.20.0.1
PXE-E32: TFTP open timeout
TFTP......

I guess from the TFTP part hanging, that TFTPD-HPA is not running properly - but I dont really know....

The IP of this machine is 172.20.0.15
The IP of the dhcp/tftp server is 172.20.0.5
The IP of the router is 172.20.0.1

Thanks for your help so far.
psychojoy
Gerbil
 
Posts: 31
Joined: Sat Sep 08, 2007 11:28 pm

Postposted on Sat Sep 29, 2007 2:38 pm

First, try disabling MTU discovery on the server -- it's been known to cause some pxe stacks to barf.

echo 1 > /proc/sys/net/ipv4/ip_no_pmtu_disc

if that doesn't do it, you can set it back with:

echo 0 > /proc/sys/net/ipv4/ip_no_pmtu_disc


Of course, restart your tftpd server after every change.

Another issue that can cause pxe stack problems is a block size request negotiation failure between the client and server. For that, try starting the tftpd server with the " -r blksize" option.

...worth a try...
Optimum1
Gerbil
 
Posts: 13
Joined: Wed Jun 13, 2007 11:57 am

Postposted on Sat Sep 29, 2007 2:48 pm

when you do a restart of you tftpd-hpa service - do you get any feedback?

i do: sudo /etc/init.d/tftpd-hpa stop

followed by sudo /etc/init.d/tftpd-hpa start

and i get no messages at all! It just goes straight back to a prompt.

when i run the dhcp server restart it tells the that the stop was ok and the restart was ok.

I am still worrying about the install of tftpd-hpa. I have uninstalled it and reinstalled it a couple of times with no result.

PS neither of the above changes made any difference.
psychojoy
Gerbil
 
Posts: 31
Joined: Sat Sep 08, 2007 11:28 pm

Next

Return to TR Distributed Computing Effort

Who is online

Users browsing this forum: No registered users and 1 guest