Code: Select all
cd fah
./a2check
If you'd rather run it in the background (and log to a file), you can do the following instead:
Code: Select all
cd fah
./a2check >>a2check.log 2>&1 &
What it does: Every 5 minutes it counts the number of instances of FahCore_a2.exe running on the system. Anything other than 4 (the normal number) or 0 (indicating that the a2 core is not being used at all) means there's a potential problem. If the script detects an anomalous number of a2 cores running, it waits another 5 minutes and checks again (this should prevent "false positives" from occurring if we happen to check just as the WU is starting up or shutting down normally). If the second check still indicates a problem, it kills the wayward a2 cores. This should allow the main folding client to start the next WU normally.
The script runs continuously until killed, and logs a message every time it initiates a recovery.
Here's the script itself (save as a2check in your folding directory):
Code: Select all
#!/bin/bash
app="FahCore_a2.exe"
while true; do
count=`ps -A | grep $app | wc -l`
if [[ $count > 0 && $count != 4 ]]
then
sleep 300
count=`ps -A | grep $app | wc -l`
if [[ $count > 0 && $count != 4 ]]
then
echo `date`: "Nuking $count $app processes"
killall -9 $app
fi
fi
sleep 300
done