Automating changes to task multiples

cecht
cecht
Joined: 7 Mar 18
Posts: 1537
Credit: 2915131972
RAC: 2124170

Gary, you're right, I haven't

Gary, you're right, I haven't yet seen shorter mean task times with shorter checkpoints, but there is a lot of variability in the time of these GW tasks. It will be interesting to see what insights Bernd might have. Also of interest will be to see how your reaction time improves. :)

archae86, your post got me to take a closer look at my log files for the timings of increments, decrements and pauses. With the script running every minute, the only wasteful pauses (task suspensions) I'm seeing are for transient task multiple increments. That is, when the script increases the task multiple, a task is added that results in GPU memory tapping into GTT, then within 1 or 2 minutes the multiple is decreased followed by a suspension (when GTT% remains high). Obviously an instance when the multiple should not have increased to begin with. On both my single and dual card systems, this happens for 9% of task multiple changes. Not all that bad, but not the best.  

Currently, the script conditions are set to assume that an increased task multiple will add a task that has the same memory requirements as the currently running task(s). A smarter approach would be to evaluate the task-on-deck for its actual memory requirements and not assume anything. I worked up up a short script that calculates the DF for all tasks in queue and their status, listed by their boinc-client priority of action. (uploaded as task-df-state.sh in the Dropbox link of the OP) I'm still working on how to ID the task-on-deck and use its DF as a conditional parameter in the main script.

Ideas are not fixed, nor should they be; we live in model-dependent reality.

archae86
archae86
Joined: 6 Dec 05
Posts: 3157
Credit: 7229951521
RAC: 1155787

cecht wrote:I'm still working

cecht wrote:
I'm still working on how to ID the task-on-deck and use its DF as a conditional parameter in the main script.

I would not have guessed you would be able to do that, but it seems a really good goal.  Good luck.

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5872
Credit: 117782641950
RAC: 34755817

Just to advise that I sent a

Just to advise that I sent a PM to Bernd about the clFFT warning.  He has responded to the effect that, when developing the code, they may not have put enough consideration into what happens when a task is suspended and that they should investigate this, hopefully on Monday.

So Craig, your adventure with scripting the task multiplicity may very well pay off with some code improvements :-).

Cheers,
Gary.

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5872
Credit: 117782641950
RAC: 34755817

cecht wrote:... I'm still

cecht wrote:
... I'm still working on how to ID the task-on-deck and use its DF as a conditional parameter in the main script.

I've grabbed your script and used it on my test machine - currently 418 tasks on board, running at x3.  All tasks were nicely listed in the output.  All 3 states, 6 uploaded and ready to report, 3 scheduled (running) and the balance listed as uninitialized, showed in the output.   Surely, the 'task-on-deck' is just the topmost uninitialized task, immediately following the 'scheduled' tasks?  I've looked down my list and that's exactly the order in which the uninitialized tasks will be started.  Perhaps I'm misunderstanding what you mean by that term :-).

I imagine the easiest approach might be to construct arrays in the main script for however many of task name, DF, task_state and sched_state that you wanted to keep track of.  You could manipulate the data in those arrays as current tasks get crunched and new ones are added through work fetch events.  That implies a "run forever" script that keeps everything up to date.

When first launched, the script would scan the complete list of tasks and store (ignoring the completed and running ones, as mentioned above) just the 'uninitialized' ones and identify the topmost DF as belonging to the 'next-to-start' task.  To know when to move on to the next DF and completely remove the topmost entry whenever there is a task completion, your script could monitor the PIDs (Process IDs) of all currently running O2MDF app instances and detect when any one of them disappears.  At that point, the topmost task in the arrays has started running, replacing the completed task, so this top entry should be removed, immediately exposing the next available DF for further decision making purposes.

Detecting when a PID disappears is relatively straight forward.  Below is a little script that shows a way to do it and I've added comments so that anyone following along might better understand what's going on.  I've tested it on my test host which is continuing to run at x3 and it works as expected.  Of course, there's a lot more work to detect work fetch events and to add the new task data to the bottom of the arrays.  The script below does none of that but creating and updating arrays shouldn't be too complicated.



#!/bin/bash
# pid_test - find current O2MDF PIDs and detect when any one of them finishes.
#
all_pids=`ps -A | grep O2MDF | sed 's/^\ *//' | cut -d' ' -f1`
echo
echo -n "Here are the current PIDs -> " ; echo $all_pids
echo
echo "When a task is at 99% hit <enter> to start detecting its completion."
echo "If you start too early you will get an awful lot of dots ........."
echo -n "Ready? (y or n) [y] : "
read ans            # If no answer is supplied, set the expected default
ans=${ans:=y}
if [ $ans != y ]; then
    exit 0
fi
echo
p_done=0                            # This will hold the PID that actually exits.
while true ; do                     # Repeat all this for as long as it takes for
    for p in `echo $all_pids` ; do  # any of the concurrent tasks to finish.
        msg=`kill -0 $p 2>&1`       # The output in $msg is redirected error output.
        if [ "X$msg" != "X" ]; then # This test only succeeds when a PID being
            p_done=$p               # tested has gone and there is some output.
            break                   # At that point we need to break out of
        fi                          # this inner loop looking at all PIDs.
    done
    if [ $p_done -eq 0 ]; then      # If $p_done is not set, the previous loop
        echo -n .                   # exited normally with out a break, so
        sleep 1                     # sleep for a bit and then keep going
    else                            # otherwise, a task has finished
        echo " - the $p_done PID has now disappeared."
        break                       # so break out of the outer (run forever) loop.
    fi
done
echo "This test has concluded successfully with a finished task detected."

I hope some of this might be of some use to you.

One complication would arise when a running task gets suspended.  I imagine its PID disappears and it gets a different one when it resumes.  That task would be liable to restart when a different running task finished so the 'next available DF' would need to be reconsidered.

Cheers,
Gary.

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5872
Credit: 117782641950
RAC: 34755817

Quote:archae86 wrote:I've

Quote:

archae86 wrote:

I've gone on about this point as preamble to my belief that when I suspend and then resume a task, it appears to recapitulate most of the early phase before getting back to serious business.

If this applies to the scheme discussed here, the overhead penalty for pausing may be higher than one might otherwise suppose.

Thanks very much for all your comments.  I've been particularly concerned about that same point as well, which is why I looked into both the time to get restarted and the meaning of the warning message.  Hopefully Bernd will be able to throw some light on that.

archae86 wrote:
On my system (Windows 10, RX 5700) ... the penalty for exceeding GPU RAM capacity is a bit of an increasingly deep bog ....

I know exactly what you're talking about and your 'description' is very appropriate :-).  I've been testing various DF combinations to work out just which ones can safely be run on a 4GB RX 570 without disappearing under the quicksand.  I've found an early indicator of when to exit the looming bog before it's too late :-).  The indicator is even better now that I have my checkpoint interval set to 10 secs.

When a group of 3 is starting, there is quite a bit of 'setup' time (as you mention) before the GPU swings into full action.  The time taken for the first checkpoint to arrive is quite revealing.  In the past, with the 60 sec interval, a task couldn't record a checkpoint until a potential one occurred somewhere after the 60 sec point.  That meant that BOINC's simulated progress would race ahead (because of the very low task estimates) until a real checkpoint allowed a return to sanity.  Now, it's immediately obvious that a task is quite quick because the first checkpoint showing real progress will be in the 50-60 secs range and there will be no simulated progress from BOINC.  My estimate is that some 30 secs or more of that first minute is mainly 'setup' so that the balance after that is a better indicator of the checkpoint interval.

If a task is a little slower between checkpoints, BOINC's simulated progress will kick in momentarily at 60 secs and then the progress will drop back at the true checkpoint.  If that has occurred before 75 secs, the task will perform quite well as a x3 candidate.  Even at 90 secs, the performance will be somewhat slower but acceptable.  If it gets beyond 2 mins, jump out quick before the bog swallows you :-).

archae86 wrote:
But in the scheme for this thread, it may make sense to be somewhat slow to make multiplicity downshift changes, as for systems such as mine the cost of a brief time of running a bit into the bog may be lower than the cost of suspending and later restarting.

I agree, which is why I think that Craig's current idea of looking at what is coming next as a way of avoiding an unnecessary upshift (that needs to be followed fairly quickly by a downshift) is probably a useful thing to do.

 

 

Cheers,
Gary.

cecht
cecht
Joined: 7 Mar 18
Posts: 1537
Credit: 2915131972
RAC: 2124170

Garry Roberts wrote:Surely,

Garry Roberts wrote:
Surely, the 'task-on-deck' is just the topmost uninitialized task, immediately following the 'scheduled' tasks?  I've looked down my list and that's exactly the order in which the uninitialized tasks will be started.  Perhaps I'm misunderstanding what you mean by that term :-).

Nope, you got it, no misunderstanding! It was a hasty colloquialism I tossed in there. Sry.

Garry Roberts wrote:
I imagine the easiest approach might be to construct arrays in the main script for however many of task name, DF, task_state and sched_state that you wanted to keep track of.

Using data in previous posts from you and archae86, and my own observations, This weekend I worked up a method to estimate a VRAM% requirement for the next-to-start task (from its DF) and add it to current VRAM% usage; if the sum exceeds VRAM capacity, then the task multiple would not increase. (Haven't yet decided whether to force a preemptive task X decrease if it looks like VRAM will max out.) I'll get that set up on a timer tomorrow and let it run for a while to see how it works. In the meantime, I'll take a look at your approach.  I've already picked up some coding pointers from your script. Thanks for a fresh perspective. :)

Ideas are not fixed, nor should they be; we live in model-dependent reality.

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5872
Credit: 117782641950
RAC: 34755817

cecht wrote:.... In the

cecht wrote:
.... In the meantime, I'll take a look at your approach.

If by "approach" you are referring to that little script, please don't look too hard :-).  It was just a little game of "watch the dots" until it goes *pop*! :-).

I knew about the "kill -0 $pid" trick for knowing exactly when a process terminates but had never actually used it.  I thought it best to make sure it would work as intended.  Since you had said, "still working on how to ID", I guessed the next logical question might be, "still working on how to know when to ditch the current one and move to the next" :-).  So a little test was born to make sure I wasn't spouting rubbish :-).

One little point to realise about that command is that you can only use it for processes your user ID has started.  It works for me but if you have a boinc user then for your personal user to use it, the kill command would have to be prefixed by 'sudo' to give it the necessary privileges to examine processes started by the boinc user.  I like a simpler life so no boinc user for me :-).

Cheers,
Gary.

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4312
Credit: 250678523
RAC: 34969

I can confirm that there is

I can confirm that there is no code (yet) that frees GPU resources if the app is terminated by a signal, e.g. when it is "suspended" by the client (and preferences are set to "remove the app from memory when suspended"). Some drivers will free these resources automatically, but some might not. We'll definitely fix that in the next version of the App.

BM

cecht
cecht
Joined: 7 Mar 18
Posts: 1537
Credit: 2915131972
RAC: 2124170

Well, just in time for O2MDF

Well, just in time for O2MDF to end, I present to you an improved script for optimizing task multiples for O2MDF!  *sigh*    Hey, there are still tasks needing to be re-run, so this old mule may be able to help hoe those last few rows. Hopefully the next GW data series can make use of some parts of the current automation strategy or script bits.

The new files are in the Dropbox link of the OP (opening post).

What's new in bc-taskXDF.sh

A major restructuring of script flow.

Added more error checking, reporting, and condition evaluation.

Corrected a problem in the original version that triggered unneeded task suspensions because of a failure to update VRAM% and GTT% after a suspension.

Added ability to estimate VRAM usage of the next task waiting and next task ready to run. This prevents instances of jumping the gun increasing the task multiple if doing so would exceed VRAM limits. Unfortunately this look-ahead one task function only works for single-card systems. Multi-card systems now do run better than with the previous script version, but functionality is still needed to look-ahead n tasks for n GPUs.

Because it is only looking ahead one task (either ready to run or waiting to run), the current version will have a fair number of task suspensions when run in a system with more than one crunching GPU card. Multi-card systems will need to look ahead n-tasks for n-GPUs. Nevertheless, the script can provide (?) better performance in multi-card systems than if just running at a constant 1X tasks (or whatever is the minimum task multiplied the system can handle), but better scripting is still needed to optimally run multi-card systems.

Anticipated VRAM usage is now based on the estimated additive VRAM requirements for the three classes of frequency ranges analyzed by the E@H app, Gravitational Wave Search O2 Multi-Directional O2 2.08 (GW-opencl-ati). The frequency range, delta frequency or DF, is obtained from the task name. Hence, the new script name, bc-taskXDF.sh. Estimated VRAM GB requirements and DF class divisions are set in the .cfg configuration file. The default values should work for 4, 6, and 8 GB cards. Assuming that the 16GB Radeon VII won't be run above a 5X task multiple, the default values should work for that card too. These values may change in the next GW data series.

Added checks to avoid transient low readings of VRAM% that can trigger unwanted task X increases.

Changed time interval of the systemd .timer file (now bc-taskXDFd.timer) from OnUnitActiveSec=60 to OnUnitInactiveSec=60.  This accounts for extra time of running the script when it evokes pauses and suspensions. The change was made to provide a full minute from the time the script completes to when it runs again.

Added an enhanced stand-alone utility, findDF.sh, to report DFs of running, waiting, and next tasks. Maybe not useful, but fun to use.

Finally, if running as a systemd or equivalent service, edit the path to the .cfg file, found on line 47 in the bc-taskXDFd.sh script, to match that for ExecStart in the bc-taskXDFd.service file.

TIP: If you see flip-flops between incrementing and decrementing task multiples over sequential cycles (when running as timed systemd or equivalent) it may mean that one or both DF breakpoints in the .cfg file are wrong; e.g., upper break is set to DF .75 when it should be .70. Breakpoints may change in future task data series.

TIP: When running bc-taskXDFd.sh in the background as a timed service, 'cat' the file to keep tabs on what's going on. If you also run the main script, bc-taskXDF.sh from the terminal to get full reporting on task X status, then consider commenting out the 'sed' line in that script that edits the app_config file so the timed script can do its thing. The log file for the timed service, as currently configured, can take a bit of effort to interpret.  If you come up with a better reporting scheme let me know.  The current format was set up for import and analysis in a spreadsheet.

Ideas are not fixed, nor should they be; we live in model-dependent reality.

cecht
cecht
Joined: 7 Mar 18
Posts: 1537
Credit: 2915131972
RAC: 2124170

An update: I've worked out

An update: I've worked out how to look ahead n tasks for n GPUs and will update the scripts for download once they have been put through their paces.

Ideas are not fixed, nor should they be; we live in model-dependent reality.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.