Automating changes to task multiples

cecht
cecht
Joined: 7 Mar 18
Posts: 935
Credit: 1,126,741,536
RAC: 1,907,968

It's actually working well

It's actually working well for 1 and 2 GPU systems! (I haven't tested it for more GPUs.)

Latest version has been uploaded (link in OP). 

  • Now has look-ahead ability for n number of tasks ready-to-run, given n GPUs.
  • Evaluates all tasks waiting and all n ready tasks for which has the highest DF task and bases  increment-decrement decisions on that.
  • The frequency of task suspensions has been reduced by optimizing evaluation conditions and providing sufficient pausing of the script to let the GPU(s) memory usage get up to speed after a task suspension and after a task completes and a new one starts.
  • The .cfg settings for VRAM usages of the three O2MDF DF task classes has been revamped.
  • Log output is more readable.
  • The findDF.sh utility script has been updated to report n tasks ready-to-run calculated from n GPUs * current task multiple.

Details are in comments of the .cfg. and .sh files.

Ideas are not fixed, nor should they be; we live in model-dependent reality.

cecht
cecht
Joined: 7 Mar 18
Posts: 935
Credit: 1,126,741,536
RAC: 1,907,968

A new and improved version,

A new and improved version, taskXDF and related files, is up on the Dropbox page,

https://www.dropbox.com/sh/kiftk65mg59ezm6/AACeHeeKYvvrj65VawsRM107a?dl=0

In addition to improved look-ahead functions for matching VRAM usage to task multiplicities, a timer script can now be used for running the script on timed intervals; no more need for systemd.

Given the slow release of tasks with higher delta frequencies, DF, I decided not to wait and put the new scripts out now so folks can have a play with them, praise me, curse me, etc. :)

The only thing missing, in the taskXDF.cfg file, are VRAM GB values for tasks above a DF of 0.40. Those values will need to be updated as higher DF tasks roll out (if they ever do). The script(s) now reports the average VRAM GB used for running tasks, along with individual DFs for running tasks, so when all tasks of the same DF are running, you'll know what the VRAM use is for that task DF.

As before, it's still only for Linux systems running supported AMD cards. I'm working on a Python version, and have dreams of Windows implementation, but that might be a while....

Ideas are not fixed, nor should they be; we live in model-dependent reality.

cecht
cecht
Joined: 7 Mar 18
Posts: 935
Credit: 1,126,741,536
RAC: 1,907,968

Update: .cfg file has new

Update: .cfg file has new VRAM GB memory values, all taskXDF scripts have minor improvements in reporting and tweaks to logic.

Ideas are not fixed, nor should they be; we live in model-dependent reality.

cecht
cecht
Joined: 7 Mar 18
Posts: 935
Credit: 1,126,741,536
RAC: 1,907,968

Update: Added feature to


Update: Added feature to periodically report and log average task times and multiplicities. Improved formatting of reports for added clarity of data.

Ideas are not fixed, nor should they be; we live in model-dependent reality.

DF1DX
DF1DX
Joined: 14 Aug 10
Posts: 72
Credit: 2,046,581,559
RAC: 1,835,094

Here is another data

Here is another data point:

Host 12801270 (Linux, Radeon VII) shows a memory usage of 12.64 GB for 6x O2MDF tasks.

This is the highest value of this host so far. I only use the script vramDF-timer. Thanks for that.

cecht
cecht
Joined: 7 Mar 18
Posts: 935
Credit: 1,126,741,536
RAC: 1,907,968

DF1DX wrote: Here is another

DF1DX wrote:

Here is another data point:

Host 12801270 (Linux, Radeon VII) shows a memory usage of 12.64 GB for 6x O2MDF tasks.

This is the highest value of this host so far. I only use the script vramDF-timer. Thanks for that.

Thanks DF. I assume that was for running a set of tasks with DF 0.40 or 0.45?  DF values up to 0.45 have been updated in the recently uploaded taskXDF.cfg file. Glad you are getting some use out of the vramDF script. Let me know of any desired improvements.

UPDATES:
-- In the Dropbox folder, taskXDF scripts now include an extra check to not increase task X when the fraction_done value of any running task is below 5%. This is because when a task begins running, it's VRAM% may take a while to get "up to speed". Transiently low VRAM% readings during that transition period no longer cause unwanted task X increments.

In taskXDFt:
-- Commented out the new feature to report average task X and task completion times. It reports averages from 20 readings that the timed script happens to catch in the 'uploading' phase. It stores readings that it needs to average in "hidden" files, .temp_times.txt and .temp_Xs.txt, in the working directory. If that mechanism doesn't bother you and if you think it may be useful, uncomment the code block (just before the ==main body==). Results are reported intermittently in the terminal and log file along with the regular script reports.
-- Added a file size check on the taskXDF.log file to clear its contents once it reaches 100MB, just in case you love the program so much you leave it running on the timer all - the - time.  ;)
-- Fixed a coding error (a typo!) that caused erroneous task X increases.

Other:
-- Changed the file name for the timer file back to taskXDF-timer (aka, taskXDF-timer.bash in Dropbox).
-- Wrote my first Python module! So I'm about 1% along the way to Pythonification of the bash scripts and releasing a multi-OS utility.  :P

Ideas are not fixed, nor should they be; we live in model-dependent reality.

DF1DX
DF1DX
Joined: 14 Aug 10
Posts: 72
Credit: 2,046,581,559
RAC: 1,835,094

Hi Cecht, 12.64 GB is for

Hi Cecht,

12.64 GB is for six tasks with DF = 0.45.
For six tasks with DF = 0.40 it is 12.34 GB, in idle mode about 0.19 GB.

With seven or more tasks of the spotlight series the average runtime per WU on this host increases, so it is not worth it.

Best regards Jürgen

cecht
cecht
Joined: 7 Mar 18
Posts: 935
Credit: 1,126,741,536
RAC: 1,907,968

Being so encouraged that

Being so encouraged that someone was using one of my scripts for monitoring O2MDF GPU tasks, I worked up another monitoring-only bash script.  The new offering reports GPU memory and boinc-client metrics, but doesn't change task multiples. (Okay, so maybe technically off-topic, but hey, close enough?)

The new scripts taskXDF-mon and taskXDF-mon-timer are available in the taskXDF-mon folder on the Dropbox page:
https://www.dropbox.com/sh/kiftk65mg59ezm6/AACeHeeKYvvrj65VawsRM107a?dl=0

Unlike the previous vramDF and vramDF-timer scripts, this new monitor does not report separate metrics for each card, but can be used with multiple card systems. What it reports are pertinent GW GPU crunching metrics that may be useful for, well...., you tell me!

Use and output details are in the README file.

Here is an example terminal output from my single card RX 5600xt host (without the pretty colors):

date     time   │ queued │ VRAM% │ GTT% │ taskGB │ X │ DFs: [running] [waiting] [ready]
----------------│--------│-------│------│--------│---│-----------------------------------
Sep 28 09:11:12 │     27 │    47 │ 1.20 │   0.70 │ 4 │ [.15 .15 .15 .15 ] [na] [.35 ]
Sep 28 09:16:12 │     27 │    55 │ 1.20 │   0.83 │ 4 │ [.15 .15 .15 .35 ] [na] [.15 ]
Sep 28 09:21:12 │     27 │    55 │ 1.20 │   0.83 │ 4 │ [.15 .35 .15 .15 ] [na] [.20 ] 

 

Ideas are not fixed, nor should they be; we live in model-dependent reality.

cecht
cecht
Joined: 7 Mar 18
Posts: 935
Credit: 1,126,741,536
RAC: 1,907,968

DF1DX wrote: Hi

DF1DX wrote:

Hi Cecht,

12.64 GB is for six tasks with DF = 0.45.
For six tasks with DF = 0.40 it is 12.34 GB, in idle mode about 0.19 GB.

With seven or more tasks of the spotlight series the average runtime per WU on this host increases, so it is not worth it.

Best regards Jürgen

6X tasks is an impressive throughput!  Thanks for the update.

Ideas are not fixed, nor should they be; we live in model-dependent reality.

cecht
cecht
Joined: 7 Mar 18
Posts: 935
Credit: 1,126,741,536
RAC: 1,907,968

The set of Linux scripts to

The set of Linux scripts to automatically change gravitational wave task multiples in response to AMD GPU memory requirements has been moved to GitHub, https://github.com/csecht/gravwave-taskX-df

The scripts that just monitor and report GW task memory usages and task DF values has also been moved to GitHub, https://github.com/csecht/TaskXDF-monitor

Let me know of any problems or suggestions.

Happy crunching!

 

Ideas are not fixed, nor should they be; we live in model-dependent reality.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.