Page 4 of 7

Re: New Notfred diskless version out

Posted: Mon Jul 06, 2009 8:35 pm
by notfred
New version out, there's a link on the main page to shut it down and restart with the "-oneunit" flag.

As for VirtualBox, I haven't tried it. I don't know what it needs, if it can work with VMWare .vmx and .vmdk files then it should be fine. If it is easy to generate whatever it needs and someone can point me to some docs then I can take a look at adding it.

Re: New Notfred diskless version out

Posted: Tue Jul 07, 2009 4:59 am
by Flying Fox
As long as the VM can mount .iso files as if it is a CD-ROM already in the emulated optical drive, it should work. Before notfred's "appliance" I used to just do that as if I burned a disc and boot from that.

Re: New Notfred diskless version out

Posted: Tue Jul 07, 2009 7:50 am
by notfred
Yup, the folding stuff is all the same across all the versions, it's just how they are packaged so that they boot correctly off the appropriate media. There's a minor difference in the backup/restore - if under a VM then I do backup/restore to the virtual hard drive, but that's about it. If you can get any of the methods to boot in the VM of your choice then you should be good to go.

Re: New Notfred diskless version out

Posted: Tue Jul 07, 2009 12:32 pm
by ethomaz
I tested VirtualBox 3.0, but the current version doesn't support 2 or more core for 64bits Guests. :cry:

Re: New Notfred diskless version out

Posted: Tue Jul 07, 2009 5:05 pm
by slugbug
I notice the diskless appliance has an option for 4 processors. Which VMWare product would I need to use to be able to use 4 cores per virtual appliance?

Re: New Notfred diskless version out

Posted: Tue Jul 07, 2009 8:49 pm
by notfred
Sorry, no idea on the VMWare side of things, but my diskless stuff will support as many processors as you have - I've tested at something quite ridiculous like 20 and it worked fine (kvm under Linux lets you specify as many processors as you like). That SMPCPUs parameter is used to work out how many copies of FAH to run:

Number of CPUs / SMPCPUs = number to run (rounded up).

This lets you choose between running 1 or 2 copies of FAH if you have a quad, or if you have something like a Core i7 you can set SMPCPUs to 8 and get it to run just 1 instance of the special 8 thread WUs.

Re: New Notfred diskless version out

Posted: Tue Jul 07, 2009 9:22 pm
by Flying Fox
slugbug wrote:
I notice the diskless appliance has an option for 4 processors. Which VMWare product would I need to use to be able to use 4 cores per virtual appliance?

You will need VMware's top line products, from ESX to the latest vSphere.

Re: New Notfred diskless version out

Posted: Wed Jul 08, 2009 4:15 pm
by Shinare
notfred, should I upgrade my diskless Core2Duo farm to this new version, everything seems to be running fine. Any reason to run the new stuff?

Re: New Notfred diskless version out

Posted: Wed Jul 08, 2009 8:14 pm
by notfred
Nope, it just adds the "-oneunit" link to the main diskless folder webpage and that triggers restarting the folding with -oneunit for people who want to shut down that folder. There's no other changes (apart from the memory in the .vmx file, but that's for VMs not diskless).

Re: New Notfred diskless version out

Posted: Thu Jul 09, 2009 7:57 pm
by Mvgratz
Just installed and wanted to say "Thanks notfred".
One question, what does the error "Bus error (core dumped)" mean?

Re: New Notfred diskless version out

Posted: Thu Jul 09, 2009 8:50 pm
by notfred
That's not good. It means some software tried to access a bad address, causing a SIGBUS to be generated dumping the process core (which could be loaded in a debugger to trace back to where it crashed).

It could be a bad WU crashing one of the folding processes, or it could be an unstable machine.

Re: New Notfred diskless version out

Posted: Thu Jul 09, 2009 10:26 pm
by Ragnar Dan
A few changes I'd appreciate:

1) Make multiple backups if room is available. I realize you don't want to eat too much CPU time on the script, but since we're generally only getting < 100 MB in backup.tar files, and most flash drives have dozens of times that space, an incremented naming scheme (backup1.tar - backup99.tar) would protect against the last backup being a dud and screwing restores. Maybe make it an option or something, anyway.

2) This is a small thing, but backing up client.cfg would retain the count of the number of WU's submitted. I realize you generate the client.cfg from the init file, but perhaps allowing the option of choosing to back it up on the CD generator page could be done?

3) Finally, this is probably of little actual value, but sometimes, on machines that aren't settled down properly yet (as with the rest of these requests), running the backup when you want it to run can be helpful. So on the generated web page that has links to the FAHlog.txt file and the temperatures, etc., a link to run the backup script manually might be helpful.

Edit: Also, I notice when the SMP WU's get to the FINISHED_UNIT point, they always seem to wait 3⅓ minutes before continuing. If I click on "Kill folding cores" will that speed it up without harm?

Re: New Notfred diskless version out

Posted: Sun Jul 12, 2009 11:47 pm
by haysdb
I am running instances of the appliance under Windows 7 beta 64-bit. It has been working fine for a few months but over the last two days, one of the instances has been picking up a work unit that's taking HOURS per 1%. I'd like to kill the work unit and force it to reload a new one. How do I do that?

Re: New Notfred diskless version out

Posted: Mon Jul 13, 2009 8:53 am
by notfred
Ragnar Dan,
Let me see what I can do. I don't have as much time to work on this as I used to, but they all seem reasonable requests. I don't know about hitting the "Kill folding cores" link at the end. I have no idea what the client is doing at that point. It may speed it up or it may corrupt your WU.

haysdb,
Probably the easiest is to just create a new copy of the folding appliance. I do have an outstanding request to add a "nuke it" link to my stuff, but haven't got there yet.

Re: New Notfred diskless version out

Posted: Tue Jul 14, 2009 11:15 pm
by Ragnar Dan
Thanks for the reply and consideration. I suppose I should try the "Kill folding cores" thing for myself, but I thought I might've read something on one of your threads about it, and hoped you or someone else might have a definitive answer.

And I'm trying to keep my numbers up. :wink:

Re: New Notfred diskless version out

Posted: Sun Jul 19, 2009 1:40 am
by brentpresley
Notfred,
First and foremost, thank you for all the work you have put into this project. You have really made things 100X easier for those of us wanting to setup the latest clients quickly, without learning all the ins/outs of linux to do it.

I have one request, if you could find time.

Stanford this past week released "extra large work units" that are designed to run on 8-core systems. To run these, you need both the latest client (available here: http://www.stanford.edu/~kasson/folding/linux/fah6) and run it with the "-bigadv" and "smp -8" flags.

Could you possibly update the clients to make use of this?

Thanks,
Brent

Re: New Notfred diskless version out

Posted: Sat Jul 25, 2009 12:12 am
by Ragnar Dan
I noticed an odd result from using the -oneunit flag. It reported that it was running with that flag at the end of the WU, but then said it was going to download a new WU. Checking the work directory, it didn't look as though it did so, but I can't recall what is downloaded and what it looks like before all of the other files are generated from it, so I'm concerned it may have gotten a new WU from Stanford, which is now flushed, since I shut it down before any backup could occur, if it were possible.

Re: New Notfred diskless version out

Posted: Mon Aug 10, 2009 10:33 am
by Francois Blais
Hi.
I'm a newbie at Linux stuff, so I'm sorry if I ask something already covered.

I configured a 1 GB USB stick with the appropriate package.

I managed to boot on it, but I'm not sure it worked correctly.
My PC has a single CPU.
Should it work?

At the end of the boot sequence, I saw a few things about the folding setup, and then a prompt to login. (it said that no password was requires to login as root IIRW)

So I entered "root".

After that I got a # prompt, nothing else.

Can someone help me, please?

Best regards,
François

Re: New Notfred diskless version out

Posted: Mon Aug 10, 2009 12:42 pm
by Flying Fox
Welcome to the forums Francois!

Francois Blais wrote:
I managed to boot on it, but I'm not sure it worked correctly.
My PC has a single CPU.
Should it work?
If it is a single CPU, then the single core Folding client will be launched. Mind you though these days the single core client does not produce a whole lot of points for the power the CPU consumes. The choice is yours.

Francois Blais wrote:
At the end of the boot sequence, I saw a few things about the folding setup, and then a prompt to login. (it said that no password was requires to login as root IIRW)

So I entered "root".

After that I got a # prompt, nothing else.

Can someone help me, please?
What you see on screen is known as the "shell" prompt and this is for more advanced users (you need to know how to navigate and peek at files with the command line). For you I would suggest to use the handy mini-web-server of the diskless folder setup. Pay attention to the IP address that is assigned to the folding box, and then from another computer, fire up your favourite browser and type in that IP address. That should get you to a simple page with a bunch of links. IIRC click on the one that says "Folding log file" (or something similar) and you will basically see the FAHlog.txt file where you can determine if the thing is running.

Of course, feeling the CPU to see if it heats up under 100% load should work too. ;)

Re: New Notfred diskless version out

Posted: Mon Aug 10, 2009 3:31 pm
by Francois Blais
Thanks!
It seems to work ok.
I did a new boot with the USB stick, and that time I didn't logon, I left it there.
After a few minutes, I got a message saying "set_rtc_mmss: can't update from 59 to 1".
That was written several lines on the monitor.
I then logged on with root.

I was under the impression the progression was shown on the screen.
Did I miss something?

Best regards,
François

Re: New Notfred diskless version out

Posted: Mon Aug 10, 2009 4:05 pm
by Flying Fox
Francois Blais wrote:
Thanks!
It seems to work ok.
I did a new boot with the USB stick, and that time I didn't logon, I left it there.
After a few minutes, I got a message saying "set_rtc_mmss: can't update from 59 to 1".
That was written several lines on the monitor.
I then logged on with root.

I was under the impression the progression was shown on the screen.
Did I miss something?
As I said, that's just a shell, more like a "debug mode" if you have worked with enterprise class network switches or infrastructure-type equipment. To see real progress you should use the browser-based interface (unless you know how to navigate *NIX command-line style). When it is booting, pay attention to what IP the machine is assigned. Go to another machine, and type "http://[IP address]" and you should see the "home page" with links where you can do several things.

You really don't need to logon to the shell and just leave it there running in the background. The mini web server is another background process too. These processes just work in the background quietly doing their thing.

Re: New Notfred diskless version out

Posted: Mon Aug 17, 2009 3:42 pm
by Eastrider
Hello Notfred.

I'm using on a Intel Q6600 two clients of VMWare Player, one for Core 0 and 1 and the other for core 2 and 3. Running under Windows 7RC, 64bits.

The point is, sometimes, I'm getting "Attept to access beyond the end of the device" errors. And really don't know what's causing them. Sometimes they appear randomly, and today it appeared when restoring the checkpoint after a VMW-P reset. My checkpoint time it's every 5 minutes. Does it have something to do?



Also, what's the best option for PPD, 2 Clients x2 Cores, or 1 Client x 4 Cores? Anyway it's not like I'm having the option due to how high is the price of VMWare Workstation, but's something good to know.

Re: New Notfred diskless version out

Posted: Mon Aug 17, 2009 4:06 pm
by Flying Fox
Eastrider wrote:
The point is, sometimes, I'm getting "Attept to access beyond the end of the device" errors. And really don't know what's causing them. Sometimes they appear randomly, and today it appeared when restoring the checkpoint after a VMW-P reset. My checkpoint time it's every 5 minutes. Does it have something to do?
IIRC this has something to do with the Folding client failing to delete the previous work files in the queue. The queue has like 10 slots (0-9?) and when enough slots have been filled (if same slot is used it will overwrite the files, just the unused slots), all the work files run over whatever diskspace you gave it. My solution, given the fact that you are not running totally diskless (Windows 7) on the host, is to sacrifice a bit more disk space and use real VMs where you install real Linux on them (and give up the nice admin stuff with notfred's system :-?).

Eastrider wrote:
Also, what's the best option for PPD, 2 Clients x2 Cores, or 1 Client x 4 Cores? Anyway it's not like I'm having the option due to how high is the price of VMWare Workstation, but's something good to know.
VMware Workstation can only do 2 CPUs as well. Your only choice is ESX Server or up for >2 cores anyway. So 2x 2-core clients is the best points you can wring out of SMP clients. To get more points you you will need a decent GPU and you can go 1 GPU + 1 SMP + 1 single core. Of course 1 GPU + 1 WinSMP may be ok as well, but definitely less points.

Re: New Notfred diskless version out

Posted: Mon Aug 17, 2009 4:26 pm
by Eastrider
Flying Fox wrote:
Eastrider wrote:
The point is, sometimes, I'm getting "Attept to access beyond the end of the device" errors. And really don't know what's causing them. Sometimes they appear randomly, and today it appeared when restoring the checkpoint after a VMW-P reset. My checkpoint time it's every 5 minutes. Does it have something to do?
IIRC this has something to do with the Folding client failing to delete the previous work files in the queue. The queue has like 10 slots (0-9?) and when enough slots have been filled (if same slot is used it will overwrite the files, just the unused slots), all the work files run over whatever diskspace you gave it. My solution, given the fact that you are not running totally diskless (Windows 7) on the host, is to sacrifice a bit more disk space and use real VMs where you install real Linux on them (and give up the nice admin stuff with notfred's system :-?).

Eastrider wrote:
Also, what's the best option for PPD, 2 Clients x2 Cores, or 1 Client x 4 Cores? Anyway it's not like I'm having the option due to how high is the price of VMWare Workstation, but's something good to know.
VMware Workstation can only do 2 CPUs as well. Your only choice is ESX Server or up for >2 cores anyway. So 2x 2-core clients is the best points you can wring out of SMP clients. To get more points you you will need a decent GPU and you can go 1 GPU + 1 SMP + 1 single core. Of course 1 GPU + 1 WinSMP may be ok as well, but definitely less points.



I'm folding on two 8800GT's at 770/1900/1070.

Anyway, the "Attemp to access bla bla bla" error does not stop my folding (CPU still at 100%) but anyways, does it corrupt my WU? No, right? :S

Second question: Since I'm folding on my main gaming rig on the idle times, I have to shutdown (suspend) the machines. On GPUs, they just shutdown and work again when I reopen the client. On the CPUs, VMWare, when I reopen, there says "clocksource unstable, delta *lotsofnumbres* ns". I think I have and idea of why (reopen the client fools a bit the folding process) but, is this normal? Also, like I asked on the othe error, affect negatively my results? (WU results, not PPD results).



Edit:

Went to play GRID online with a friend.

When back, putted on back the four FAH clients (2GPUs and 2SMPs).

GPUs and Core0/1 were ok. Core2/3 are Clocksource tsc Unstable (delta = 84374629325 ms). The message appeared one second after restoring the VM from suspend mode. Before that it didn't had any error.

The client is still running, anyway. CPU at 98%. But the only thing I'm afraid is of the WU corruption. I don't want all my CPU work being useless while I'm sleeping.

Re: New Notfred diskless version out

Posted: Mon Aug 17, 2009 9:07 pm
by notfred
I've never really got to the bottom of the "Attept to access beyond the end of the device" issue. I think it affects the backups rather than the WU so it shouldn't corrupt the WU.

Don't worry about the "Clocksource tsc Unstable" messages. That's the kernel saying it was using the processor timer for a clock but it has detected that it isn't reliable (VMs do lots of odd things with virtual CPU clock cycles) so it is falling back to another method.

I don't think either will corrupt your WU, although the first may mean that you can't restart the VM and have the restore work properly.

Re: New Notfred diskless version out

Posted: Mon Aug 17, 2009 9:29 pm
by Ragnar Dan
I found in using VMWare player, at least, that my folding client doesn't seem to read the Linux system clock properly, or some strange thing, because the client's FAHlog.txt uses the current time in my local time zone for output, where all the others I use are using "UTC time". So, because Fahmon has a problem with that for some reason and deems the client *hung* even though it's progressing through the WU perfectly well, I just update the clock to something near the UTC time, and then everyone is happy except for me, since I have to do that every time I restart the VM.

Re: New Notfred diskless version out

Posted: Tue Aug 18, 2009 10:02 am
by notfred
That's "time" which is how late it is rather than "clock" which ticks :-) Windows is a pain as it keeps time in local timezone whilst every other OS uses UTC and then deals with the UTC to local time conversion at presentation time. I'll look and see if there is some way I can get VMWare to give the time in UTC and not localtime.

Have you tried checking the box "Ignore asynchronous clocks" in Fahmon > Preferences > Monitoring tab? That should let it work with clients with different times on them.

Re: New Notfred diskless version out

Posted: Tue Aug 18, 2009 2:33 pm
by Shinare
Notfred,
I have downloaded a fold.iso CD image and am using it inside a VirtualBox virtual machine since I can boot from the ISO file in it. It seems to be running fine and using both processors. However, I get this message on the console that reads: Clocksource tsc unstable (delta = 4686729312 ns)

It seems to be plugging through the work unit but I am worried this might cause the folding@home server to throw away my results?

Re: New Notfred diskless version out

Posted: Wed Aug 19, 2009 8:11 am
by notfred
Shinare wrote:
However, I get this message on the console that reads: Clocksource tsc unstable (delta = 4686729312 ns)
It seems to be plugging through the work unit but I am worried this might cause the folding@home server to throw away my results?

From 3 posts earlier:
notfred wrote:
Don't worry about the "Clocksource tsc Unstable" messages. That's the kernel saying it was using the processor timer for a clock but it has detected that it isn't reliable (VMs do lots of odd things with virtual CPU clock cycles) so it is falling back to another method.

Re: New Notfred diskless version out

Posted: Wed Aug 19, 2009 9:14 am
by Shinare
Well I feel a little sheepish, heh. Guess I should open my eyes and read more. Sorry about that notfred. Still lovin your awsome product.