Creating the "perfect" FAH Monitoring Software

Come join the... uh... er... fold.

Moderators: just brew it!, farmpuma

Creating the "perfect" FAH Monitoring Software

Postposted on Sun Oct 26, 2008 9:14 am

As you all know monitoring FAH is a bit of a career, particularly if you are running a large farm.

I use FahMon which I like for it's simplicity. I've tried other monitors which are more sophisticated, but have come back to Fahmon for it's simple user interface and setup. Fahmon could use some improvements, so I'm going to take a stab at writing my own monitoring software. Some of my high priority features are:

#1. Better Visibility and reporting of Stalled or hung clients. Reporting if you are processing the same WU twice in a row. Email notification of problems. Ability to mark clients a "offline" so that they aren't distracting you.

#2: Points Per Day Tracking: I want to see time per step, PPD per step, a PPD History graph calculated on each step like the Task Manager's CPU History graph, etc.

#3: Ability to spawn VNC

#4: Try (!) to compensate for poor time reporting on the client with VMWare Linux clients.

We'll see how much effort this thing will take to write, but I would be interested in hearing people's suggestions for their "perfect" monitoring features.

- JP
JPinTO
Gerbil Team Leader
 
Posts: 239
Joined: Sat Jun 30, 2007 6:02 am
Location: Toronto, Ontario

Re: Creating the "perfect" FAH Monitoring Software

Postposted on Sun Oct 26, 2008 10:59 am

Back when I had more systems folding, I thought about doing something like this too. I didn't get very far (you can see what my scripts produce here), but maybe I can be of some assistance.

JPinTO wrote:#1. Better Visibility and reporting of Stalled or hung clients. Reporting if you are processing the same WU twice in a row. Email notification of problems. Ability to mark clients a "offline" so that they aren't distracting you.

Yeah, I've often thought about doing stalled node detection by checking the last time the FAHlog.txt file was touched (this time is the "last log update" line in my status report); just never got around to implementing it.

#2: Points Per Day Tracking: I want to see time per step, PPD per step, a PPD History graph calculated on each step like the Task Manager's CPU History graph, etc.

The part of this that is a PITA is keeping your WU database updated with new WUs as they come out. I've been doing this manually, by maintaining a text file where I copy-paste new WUs from Stanford's project summary page. The utility that calculates the PPD reads the text file to get the WU point values.

It would be nice if the history graph could also label the points at which new WUs were downloaded (and their WU number).

#3: Ability to spawn VNC

I'm assuming this would be for local nodes only? Exposing VNC on the 'net isn't a good idea, since its security is poor.

#4: Try (!) to compensate for poor time reporting on the client with VMWare Linux clients.

IMO a better solution here is to try to deal with the poor clock synchronization. This is one aspect of VMware which really pisses me off; VirtualBox seems to get the clock right, but unfortunately the SMP Linux client won't run under VirtualBox.

My hackish solution for this issue is to run a script in the VM which performs an NTP time sychronization every few minutes...
(this space intentionally left blank)
just brew it!
Administrator
 
Posts: 35193
Joined: Tue Aug 20, 2002 9:51 pm
Location: Somewhere, having a beer

Re: Creating the "perfect" FAH Monitoring Software

Postposted on Sun Oct 26, 2008 12:09 pm

I wouldn't reinvent the wheel, FAHMon is good so start with that and extend it to do what you want, it's open source (GPL License) so go grab the code from svn.

Be aware that it is non-trivial to decide if a client is hung - the log file can still get updates from autosend and such that will not show it is hung - you have to filter the updates to that file. Also some things can take minutes to go through - e.g. at the end of a core_a1 WU and you don't want to go pulling the trigger too early.
notfred
Grand Gerbil Poohbah
 
Posts: 3490
Joined: Tue Aug 10, 2004 9:10 am
Location: Ottawa, Canada

Re: Creating the "perfect" FAH Monitoring Software

Postposted on Sun Oct 26, 2008 2:47 pm

just brew it! wrote:Yeah, I've often thought about doing stalled node detection by checking the last time the FAHlog.txt file was touched (this time is the "last log update" line in my status report); just never got around to implementing it.


Thanks for your input!

One issue I have with Fahmon is that it's really flakey with reporting guests as "*hung" when they are running just fine. Right now for example, almost all of my GPU clients have recently started reporting as *hung* for whatever reason. The guests are perfectly fine. I'd really like an improved host down, client down, client hung reporting.

just brew it! wrote:The part of this that is a PITA is keeping your WU database updated with new WUs as they come out. I've been doing this manually, by maintaining a text file where I copy-paste new WUs from Stanford's project summary page. The utility that calculates the PPD reads the text file to get the WU point values.


I wasn't going to maintain a WU database. I was just going to pull the values directly from the stanford psummary page when needed. Do they not update the psummary page frequently or what am I missing??

just brew it! wrote:I'm assuming this would be for local nodes only? Exposing VNC on the 'net isn't a good idea, since its security is poor.


For sure, local only, not over the net. I finally activated remote desktop on all my VMWare Linux guests, and started using the mRemote wrapper to quickly access them all since alot of my SMP units stall at 100% and have to be qfixed, restarted, yadada's to work.. I'm thinking an integrated monitoring tool with VNC spawning would work for me, not sure how others work. This is a nice to have feature, not a core feature.

just brew it! wrote:IMO a better solution here is to try to deal with the poor clock synchronization. This is one aspect of VMware which really pisses me off; VirtualBox seems to get the clock right, but unfortunately the SMP Linux client won't run under VirtualBox.


I have not had success under Ubuntu with activating NTP under Gnome for whatever reason. I'm not a hardened Linux person, so perhaps I'm doing something wrong. I've activated the vmware tools "synchronization with host" function and that helped greatly. I still have a few guests that aren't able to sync... although these are usually guests that are sharing cores with other guests.
JPinTO
Gerbil Team Leader
 
Posts: 239
Joined: Sat Jun 30, 2007 6:02 am
Location: Toronto, Ontario

Re: Creating the "perfect" FAH Monitoring Software

Postposted on Sun Oct 26, 2008 3:04 pm

notfred wrote:I wouldn't reinvent the wheel, FAHMon is good so start with that and extend it to do what you want, it's open source (GPL License) so go grab the code from svn.

Be aware that it is non-trivial to decide if a client is hung - the log file can still get updates from autosend and such that will not show it is hung - you have to filter the updates to that file. Also some things can take minutes to go through - e.g. at the end of a core_a1 WU and you don't want to go pulling the trigger too early.


Thanks for your comments. Unfortunately, I'm not an open source guy so my first attempt will be using the tools that I know. Also, I'm getting flakey enough reporting, misreporting, and crashes from Fahmon that I'm not confident with it. I like the simplicity of the GUI, but I want to make it more functional and reliable for me. For instance, at any given time Fahmon reports 1/2 my guests with green light the other half are yellow or red for unknown reasons as the guests are fine. Perhaps this only becomes a problem (or PITA) after you are monitoring a handful of guests.

I haven't thought about how I'm going to implement the hung functionality. I agree with you that completion hangs are more problematic, but I was planning on give more visibility to the upload completion status or core error if one occurs. Parsing through log files for numerous clients takes too much time.

Intra-processing hangs should be easier to detect as you know how long the WU step takes and then can report if it is taking significantly longer than expected. I'll probably do this as a customizable %... eg: after the step has taken 100% longer than expected then report it as hung.

- JP
JPinTO
Gerbil Team Leader
 
Posts: 239
Joined: Sat Jun 30, 2007 6:02 am
Location: Toronto, Ontario

Re: Creating the "perfect" FAH Monitoring Software

Postposted on Sun Oct 26, 2008 3:18 pm

JPinTO wrote:I wasn't going to maintain a WU database. I was just going to pull the values directly from the stanford psummary page when needed. Do they not update the psummary page frequently or what am I missing??

This will be fine if you are only interested in tracking current WUs. If you want to analyze your logs going back in time, you will have problems because old WUs drop off the psummary page.

I have not had success under Ubuntu with activating NTP under Gnome for whatever reason. I'm not a hardened Linux person, so perhaps I'm doing something wrong. I've activated the vmware tools "synchronization with host" function and that helped greatly. I still have a few guests that aren't able to sync... although these are usually guests that are sharing cores with other guests.

I don't think the stock NTP client configuration is aggressive enough to deal with the magnitude of clock skew VMware can introduce. I've been using the following shell script (run it in a terminal window in the guest, with root privilege):
Code: Select all
#!/bin/bash
while true; do
date
ntpd -q -g
sleep 150;
done
(this space intentionally left blank)
just brew it!
Administrator
 
Posts: 35193
Joined: Tue Aug 20, 2002 9:51 pm
Location: Somewhere, having a beer

Re: Creating the "perfect" FAH Monitoring Software

Postposted on Sun Oct 26, 2008 4:49 pm

FahMon is just fine IMO

but one feature that i'd definitely like to add is the graph showing your performance as suggested by JPinto :)
Intel C2Q Q8300 2.5GHz | Gigabyte GA-G33M-DS2R | 8GB DDR2 800 | MSI ATI Radeon HD5770 1GB | AOpen G325 Aeolus

PSN ID: MentalReaper
Steam ID: dstrb3d
ReAp3r-G
Gerbil Elite
 
Posts: 747
Joined: Mon Mar 13, 2006 5:18 pm
Location: University at Buffalo, Buffalo NY

Re: Creating the "perfect" FAH Monitoring Software

Postposted on Sun Oct 26, 2008 8:12 pm

just brew it! wrote:I don't think the stock NTP client configuration is aggressive enough to deal with the magnitude of clock skew VMware can introduce. I've been using the following shell script (run it in a terminal window in the guest, with root privilege):
Code: Select all
#!/bin/bash
while true; do
date
ntpd -q -g
sleep 150;
done

The problem with running in a VM is that ntp will normally do kernel clock discipline but the VM doesn't get the ticks regularly enough. What you are doing with the -q -x options is just forcing an update and quitting, that works nicely. Trying to get kernel clock discipline in a VM isn't going to work, the NTP is more accurate than the timeslices the VM gets.
notfred
Grand Gerbil Poohbah
 
Posts: 3490
Joined: Tue Aug 10, 2004 9:10 am
Location: Ottawa, Canada

Re: Creating the "perfect" FAH Monitoring Software

Postposted on Sun Oct 26, 2008 8:27 pm

Something else I should add regarding my ntp script... if you're going to do something similar, you should configure your ntp service to access a local ntp server, or (failing that), one maintained by your ISP (if available). Most Linux distros are configured by default to use free public ntp servers, and it is not nice to be hitting a public ntp server every couple of minutes. On my home network, my file server also acts as a local ntp server (and it is in turn synced to my ISP's ntp server).
(this space intentionally left blank)
just brew it!
Administrator
 
Posts: 35193
Joined: Tue Aug 20, 2002 9:51 pm
Location: Somewhere, having a beer

Re: Creating the "perfect" FAH Monitoring Software

Postposted on Mon Oct 27, 2008 6:49 am

Yup, I have a similar setup at home of my server syncing out to the internet and then chiming time on my LAN. Fortunately for me, my ISP peers in to ottix which the National Research Council also peers in to. They have a public stratum 2 NTP server that syncs to 3 atomic clocks, so I'm about 12 hops and <10ms from Canada's official time. They also provide a redundant site as well. Way better time sync than I could get from any of the standard distribution ones.
notfred
Grand Gerbil Poohbah
 
Posts: 3490
Joined: Tue Aug 10, 2004 9:10 am
Location: Ottawa, Canada

Re: Creating the "perfect" FAH Monitoring Software

Postposted on Mon Oct 27, 2008 11:38 am

Hey Pinto Bean. My hat goes off to you for this worthy effort. I will be happy to be a beta tester if you need any. :D I don't know a darn thing about programming. I have trouble with my remotes!! :lol:
Join UGN's Drive to the Top!
Image
UnitedGerbilNation wants you!!
jeffry55
Grand Gerbil Poohbah
 
Posts: 3181
Joined: Sat Oct 30, 2004 3:38 pm
Location: Menlo Park - just down the street from the F@H Servers!

Re: Creating the "perfect" FAH Monitoring Software

Postposted on Mon Nov 03, 2008 8:38 pm

Some interesting patterns on SMP failures starts coming to light when you comb through the log files. Perhaps it's more obvious with just a few guests, but it's like trying to find a needle in a haystack with a large farm.
JPinTO
Gerbil Team Leader
 
Posts: 239
Joined: Sat Jun 30, 2007 6:02 am
Location: Toronto, Ontario

Re: Creating the "perfect" FAH Monitoring Software

Postposted on Fri Nov 07, 2008 6:37 pm

I have one request - just do not be as stupid as the guy who programmed FahMon. I am being serious. The author of FahMon has programmed several classic bugs into his application, like division by zero, offsets being wrong by one and others.

At midnight, when the clock wraps around from 23:59 to 00:00, all my clients get reported as hung even when I enable the option to ignore asynchronous clocks. And when a client reaches 99% is its ETA pointing into the past. This is just ridiculous and one cannot trust any of the application's numbers.
sdack
Gerbil
 
Posts: 66
Joined: Mon Apr 21, 2008 3:47 am
Location: In another thread, having a nice talk

Re: Creating the "perfect" FAH Monitoring Software

Postposted on Fri Nov 07, 2008 7:57 pm

sdack wrote:I have one request - just do not be as stupid as the guy who programmed FahMon. I am being serious. The author of FahMon has programmed several classic bugs into his application, like division by zero, offsets being wrong by one and others.

At midnight, when the clock wraps around from 23:59 to 00:00, all my clients get reported as hung even when I enable the option to ignore asynchronous clocks. And when a client reaches 99% is its ETA pointing into the past. This is just ridiculous and one cannot trust any of the application's numbers.


Your absolutely right. You should message him/them and complain about the quality of the free software. If it doesn't get fixed you should demand a refund.
Looking for Knowledge wrote:When drunk.....
I want to have sex, but find I am more likely to be shot down than when I am sober.
Heiwashin
Grand Gerbil Poohbah
 
Posts: 3027
Joined: Wed Dec 13, 2006 12:21 pm
Location: Denham Springs, LA

Re: Creating the "perfect" FAH Monitoring Software

Postposted on Fri Nov 07, 2008 9:06 pm

sdack wrote:I have one request - just do not be as stupid as the guy who programmed FahMon. I am being serious. The author of FahMon has programmed several classic bugs into his application, like division by zero, offsets being wrong by one and others.

At midnight, when the clock wraps around from 23:59 to 00:00, all my clients get reported as hung even when I enable the option to ignore asynchronous clocks. And when a client reaches 99% is its ETA pointing into the past. This is just ridiculous and one cannot trust any of the application's numbers.

If you think you can do better, go ahead. FYI many of those issues are actually from Stanford's side and not FAHMon and there are threads on the folding forum requesting Stanford fix their progress indications so that they are not off by one for some WUs and not for others. Stanford do know about it, but it is not a priority for them because it doesn't impact the science contained in the WUs.
notfred
Grand Gerbil Poohbah
 
Posts: 3490
Joined: Tue Aug 10, 2004 9:10 am
Location: Ottawa, Canada

Re: Creating the "perfect" FAH Monitoring Software

Postposted on Sat Nov 08, 2008 4:39 am

notfred wrote:If you think you can do better, go ahead. FYI many of those issues are actually from Stanford's side and not FAHMon and there are threads on the folding forum requesting Stanford fix their progress indications so that they are not off by one for some WUs and not for others. Stanford do know about it, but it is not a priority for them because it doesn't impact the science contained in the WUs.

Are you saying you do not like the criticism? If so then you are having a really strange problem!!

Some of the mistakes should not exist even when it still would be a 0.1-alpha version. And it is not Stanford when an application cannot handle midnight transitions or predicts the ETA into the past. It is just lousy work.

Btw, why do you even bother about it? Are you the author of FahMon?

@Heiwashin: The guy accepts donations for his application. I would not pay a penny and with such ridiculous errors is he not worth a bug report. Hence my hope in a new and better application ;) - back on topic.
Last edited by sdack on Sat Nov 08, 2008 4:47 am, edited 2 times in total.
sdack
Gerbil
 
Posts: 66
Joined: Mon Apr 21, 2008 3:47 am
Location: In another thread, having a nice talk

Re: Creating the "perfect" FAH Monitoring Software

Postposted on Sat Nov 08, 2008 4:41 am

sdack wrote:Btw, why do you even bother about it? Are you the author of FahMon?

Are you a paying customer of FahMon?
Looking for Knowledge wrote:When drunk.....
I want to have sex, but find I am more likely to be shot down than when I am sober.
Heiwashin
Grand Gerbil Poohbah
 
Posts: 3027
Joined: Wed Dec 13, 2006 12:21 pm
Location: Denham Springs, LA

Re: Creating the "perfect" FAH Monitoring Software

Postposted on Sat Nov 08, 2008 4:45 am

Heiwashin wrote:
sdack wrote:Btw, why do you even bother about it? Are you the author of FahMon?

Are you a paying customer of FahMon?

What is your point? If it is free it can be crappy? Oh my god ... get a life.
sdack
Gerbil
 
Posts: 66
Joined: Mon Apr 21, 2008 3:47 am
Location: In another thread, having a nice talk

Re: Creating the "perfect" FAH Monitoring Software

Postposted on Sat Nov 08, 2008 8:28 am

FahMon/FahSpy/whatever is just a glorified log parser which knows how to read the FahLog.txt file. What notfred was saying is that Stanford screwed up on the log file. If the source is wrong how can you expect the log parsers to do better? Ever heard of GIGO?

If you are so sure these "mistakes" are that easy to fix (like you are a programmer), why don't you write one that "fixes" Stanford's own bugs? Open source software is about people who think they can do better submitting improvements.
Image
The Model M is not for the faint of heart. You either like them or hate them.

Gerbils unite! Fold for UnitedGerbilNation, team 2630.
Flying Fox
Gerbil God
 
Posts: 23534
Joined: Mon May 24, 2004 1:19 am

Re: Creating the "perfect" FAH Monitoring Software

Postposted on Sat Nov 08, 2008 9:17 am

Flying Fox wrote:FahMon/FahSpy/whatever is just a glorified log parser which knows how to read the FahLog.txt file..

No, it [FahMon] does not know how to read the logfile or else it would know what to do when times switch from 23:59 to 00:00. It then does not take a genius to get it right. Do not try get smart with me! This thread is about creating a new monitor application. So until you have anything smart to say I suggest you STFU.
sdack
Gerbil
 
Posts: 66
Joined: Mon Apr 21, 2008 3:47 am
Location: In another thread, having a nice talk

Re: Creating the "perfect" FAH Monitoring Software

Postposted on Sat Nov 08, 2008 9:58 am

sdack wrote:
Flying Fox wrote:FahMon/FahSpy/whatever is just a glorified log parser which knows how to read the FahLog.txt file..

No, it [FahMon] does not know how to read the logfile or else it would know what to do when times switch from 23:59 to 00:00. It then does not take a genius to get it right. Do not try get smart with me! This thread is about creating a new monitor application. So until you have anything smart to say I suggest you STFU.

And you are just so smart you are just being an armchair critic? :roll: So if it does not take a genius may be you can get off your chair and write some code yourself? I don't see you saying anything smart so may be you are the one that needs to STFU.

Back on topic: I haven't checked out the source code in detail yet, but are they packaging the parsing logic as a lib? It would be nice if developers don't need to reinvent the parser all the time.
Image
The Model M is not for the faint of heart. You either like them or hate them.

Gerbils unite! Fold for UnitedGerbilNation, team 2630.
Flying Fox
Gerbil God
 
Posts: 23534
Joined: Mon May 24, 2004 1:19 am

Re: Creating the "perfect" FAH Monitoring Software

Postposted on Sat Nov 08, 2008 10:44 am

Flying Fox wrote:bla bla bla

No, you STFU and get out of your arm chair. The OP wants to write a new application and I support him. What is it you think you are doing?

Btw, I am a senior software engineer. What is it that you are?
sdack
Gerbil
 
Posts: 66
Joined: Mon Apr 21, 2008 3:47 am
Location: In another thread, having a nice talk

Re: Creating the "perfect" FAH Monitoring Software

Postposted on Sat Nov 08, 2008 11:00 am

sdack wrote:
Flying Fox wrote:bla bla bla

No, you STFU and get out of your arm chair. The OP wants to write a new application and I support him. What is it you think you are doing?

Btw, I am a senior software engineer. What is it that you are?


Yes, I can tell by the level of maturity that you've demonstrated thus far. :roll:
Fold! And I don't mean your clothes!

Do you have a favorite gerbil recipe? Please share with the TR community!
flybywire
Gerbil Jedi
 
Posts: 1883
Joined: Wed Jun 16, 2004 1:28 pm
Location: Springfield, VA - USA

Re: Creating the "perfect" FAH Monitoring Software

Postposted on Sat Nov 08, 2008 11:08 am

sdack wrote:
Flying Fox wrote:bla bla bla

No, you STFU and get out of your arm chair. The OP wants to write a new application and I support him. What is it you think you are doing?

Btw, I am a senior software engineer. What is it that you are?

Did I ever say I am not supporting the OP? :roll:

I write code for a living just as you are. So why are you not writing your "easy" fix to the parsing logic and contribute back to the community? If you really support the OP why don't you give the OP the code that you believe would be so easy to write? The authors of FahMon are at least doing something, you are just sitting here calling them stupid, when the bugs are actually from Stanford. Do you fix bugs by covering/hiding the underlying cause or you try to fix the real problem? Sure you can try (as the OP said "Try (!)") to workaround the stuff, but the problems with the irregular reporting seems to be quite random and it changes from WU to WU that it takes a lot of effort to keep up. If you can't keep up then whatever different results that you can generate is not going to be that useful.

Notfred does a lot of these log parsing with his diskless tools, I would tend to trust his judgement more than you who seem to just know to complain and not researching the problems.

flybywire wrote:Yes, I can tell by the level of maturity that you've demonstrated thus far. :roll:
That's why I never pay much attention about labels such as "senior" when I am interviewing people. I would take a person who knows what he/she is talking about and real experience (note: this does not necessarily correlate with time on a job) than qualifiers that can be obtained via various different means without real skills/knowledge/experience backing them up.
Image
The Model M is not for the faint of heart. You either like them or hate them.

Gerbils unite! Fold for UnitedGerbilNation, team 2630.
Flying Fox
Gerbil God
 
Posts: 23534
Joined: Mon May 24, 2004 1:19 am

Re: Creating the "perfect" FAH Monitoring Software

Postposted on Sat Nov 08, 2008 11:36 am

Flying Fox wrote:Did I ever say I am not supporting the OP? :roll:

No, you did not. Instead you started to creep up on me like a quick response monkey! :wink:

That's why I never pay much attention about labels such as "senior" when I am interviewing people. I would take a person who knows what he/she is talking about and real experience (note: this does not necessarily correlate with time on a job) than qualifiers that can be obtained via various different means without real skills/knowledge/experience backing them up.

Trust me, I do not like to mention my job to others but rather have them believe me without them knowing, but you left me just little choice. Go back in the thread and read it again. All I did was to point out to the OP the ridiculous mistakes of FahMon. Guess why I did that.

If you wish then we can discuss these mistakes but do not respond to me (and I do not necessarily mean only you) to do it better.
sdack
Gerbil
 
Posts: 66
Joined: Mon Apr 21, 2008 3:47 am
Location: In another thread, having a nice talk

Re: Creating the "perfect" FAH Monitoring Software

Postposted on Sat Nov 08, 2008 11:42 am

Keep it civil and get back on topic, or this thread is heading for lockville...

Edit: My take on the FahMon debate -- Given that it is Open Source software, it makes sense to use it as a starting point for any new effort. While it may have bugs, there is also a lot of working code there already; why reinvent the wheel? We should either figure out how to fix the bugs and submit them back to the original developer, or fork the code base.
(this space intentionally left blank)
just brew it!
Administrator
 
Posts: 35193
Joined: Tue Aug 20, 2002 9:51 pm
Location: Somewhere, having a beer

Re: Creating the "perfect" FAH Monitoring Software

Postposted on Sat Nov 08, 2008 12:12 pm

just brew it! wrote:Edit: My take on the FahMon debate -- Given that it is Open Source software, it makes sense to use it as a starting point for any new effort. While it may have bugs, there is also a lot of working code there already; why reinvent the wheel? We should either figure out how to fix the bugs and submit them back to the original developer, or fork the code base.

While you are right in general do I think that with the bugs of FahMon it might not be a too big mistake to have an alternative. A little bit of competition can improve software, too.

To the OP:
Here is something interesting, if not amusing, to watch out for:
Code: Select all
:~$ ls -l
...
-rw-r--r-- 1 sven sven 172316321 2008-11-08 18:02 unitinfo.txt
...
:~$ more unitinfo.txt
Current Work Unit
-----------------
Name: Gromacs
Tag: P2669R17C82G21
Download time: November 8 14:29:23
Due time: November 11 14:29:23
Progress: 1723161591%  [||||||||||||...

The SMP client reports a progress of 1.7 billion percent and for every 10% it wrote a "|" into the file. Hence its size of 172MB.
sdack
Gerbil
 
Posts: 66
Joined: Mon Apr 21, 2008 3:47 am
Location: In another thread, having a nice talk

Re: Creating the "perfect" FAH Monitoring Software

Postposted on Sat Nov 08, 2008 1:26 pm

sdack wrote:While you are right in general do I think that with the bugs of FahMon it might not be a too big mistake to have an alternative. A little bit of competition can improve software, too.

...and in this case, given the Open Source nature of the application, it may make a lot of sense to base the competition on the original code base. Much the same thing happened with Memtest86 -- another developer took the existing code, and forked it (as Memtest86+). If they had been required to start from scratch, Memtest86+ would probably never have come into existence. The end result is that we now have two choices, and both versions are likely better than they would've been in the absence of competition.

I doubt anyone here has the time to do a full-blown FahMon style application completely from scratch.
(this space intentionally left blank)
just brew it!
Administrator
 
Posts: 35193
Joined: Tue Aug 20, 2002 9:51 pm
Location: Somewhere, having a beer

Re: Creating the "perfect" FAH Monitoring Software

Postposted on Sat Nov 08, 2008 1:45 pm

just brew it! wrote:I doubt anyone here has the time to do a full-blown FahMon style application completely from scratch.

That's why I floated the idea of separating the logic of the parser (and may be monitoring portion) as a library, and leave the GUI to others that are more capable. If the entire monitoring community focuses its effort on the workarounds, that may work. The problem is to get a comprehensive list of WU types and their potential errors to workaround from, and to keep that list updated whenever Stanford pushes out new cores that change the behaviour. That is arguable the bigger (because of external factors) problem to tackle.
Image
The Model M is not for the faint of heart. You either like them or hate them.

Gerbils unite! Fold for UnitedGerbilNation, team 2630.
Flying Fox
Gerbil God
 
Posts: 23534
Joined: Mon May 24, 2004 1:19 am

Re: Creating the "perfect" FAH Monitoring Software

Postposted on Sat Nov 08, 2008 1:58 pm

Flying Fox wrote:The problem is to get a comprehensive list of WU types and their potential errors to workaround from, and to keep that list updated whenever Stanford pushes out new cores that change the behaviour. That is arguable the bigger (because of external factors) problem to tackle.

Yes, even with my simple monitoring scripts, that has been the biggest headache. The format of the log files varies between WU types, and things aren't consistent between the project summary page, what is reported in the log files, and the unitinfo.txt files. There are even cases where the unitinfo.txt file contains garbage. Handling the special cases is a royal PITA...
(this space intentionally left blank)
just brew it!
Administrator
 
Posts: 35193
Joined: Tue Aug 20, 2002 9:51 pm
Location: Somewhere, having a beer

Next

Return to TR Distributed Computing Effort

Who is online

Users browsing this forum: No registered users and 0 guests