I just noticed that often the processing load on my Nvidia gpus is under 50%.
Does this mean I would get more total production if I were to increase the # of tasks per gpu to 2?
Tom M
A Proud member of the O.F.A. (Old Farts Association). Be well, do good work, and keep in touch.® (Garrison Keillor) I want some more patience. RIGHT NOW!
Copyright © 2024 Einstein@Home. All rights reserved.
I assume you mean with the
)
I assume you mean with the Gravity Wave tasks?
the GW tasks need a lot of CPU support. So unless you have a powerful CPU you won’t get very high GPU utilization.
running 2x tasks per GPU helps, at the expense of using more CPU resources. Do you really want to use 2+ thread for each GPU? I know you like to run CPU work too, so using so much of the CPU just to feed the GPU seems like a waste.
_________________________________________________________________________
Your right. The cpu work has
)
Your right. The cpu work has the priority so I need to run 1 gpu task on 1 thread.
---edit---
I am trying 1.25 cpus per gpu to see if I can drive the trend of the gpu utilization up for Gravity Waves.
I have also dropped GW from my selected gpu apps list. And disabled the "run none selected apps".
Unfortunately, E@H has buried me in "several" tasks so it will be a while before I get the GW's processed
--edit--
Thank you.
Tom M
A Proud member of the O.F.A. (Old Farts Association). Be well, do good work, and keep in touch.® (Garrison Keillor) I want some more patience. RIGHT NOW!
If you don't want to run the
)
If you don't want to run the GW GPU tasks you can just abort them. As long as you've also got at least a few Fermi GPU tasks that will report success when finished there won't be any major disruption to your supply of new tasks.
That is odd. While I have
)
That is odd. While I have turned off Gravity Wave and then restricted Gravity Wave gpu to 2 gpus out of 5 available gpus, as of this morning my system wasn't crunching Pulsar GPU at all.
I checked available tasks and there was still Pulsar tasks available.
I had reset the cache to 0.1 day and 0.1 additional so the backoff was very high. I did an manual update.
When I renamed app_config.xml to app_config_stop.xml and read the config files again, 3 more gpu's started up running Gravity Wave.
So why was Boinc Manager not running the Pulsar app for which I had all sorts of data files?
---edit----
Add cold boots, setting the cache to 1 as things that don't start the Pulsar gpu tasks
Examined the log, no errors listed,
Switched to NNT so if the last resort of reseting the project becomes necessary I can run out as many tasks as I can before I do.
-edit---
Tom M
A Proud member of the O.F.A. (Old Farts Association). Be well, do good work, and keep in touch.® (Garrison Keillor) I want some more patience. RIGHT NOW!
Probably would help to see
)
Probably would help to see your cc_config. Also, is this a dedicated Einstein Machine? I've had many talks with Keith over the years about the value of dedicated machines for each project. Having multi-project machines injects unknown variables into the mix.
Zalster wrote:Probably would
)
The GPU's run E@H and optionally S@H if something shows up. It is a multi-project machine.
This cc_config.xml file has not been changed since before the Pulsar tasks suddenly stopped running.
<cc_config>
<log_flags>
<sched_op_debug>1</sched_op_debug>
</log_flags>
<options>
<use_all_gpus>1</use_all_gpus>
<save_stats_days>90</save_stats_days>
<max_file_xfers>4</max_file_xfers>
<max_file_xfers_per_project>2</max_file_xfers_per_project>
<no_alt_platform>1</no_alt_platform>
</options>
</cc_config>
<max_tasks_reported>50</max_tasks_reported>
Tom M
A Proud member of the O.F.A. (Old Farts Association). Be well, do good work, and keep in touch.® (Garrison Keillor) I want some more patience. RIGHT NOW!
The good news is with the NNT
)
The good news is with the NNT the Gravity Waves supply of tasks are showing real signs of running out within a few more days at worse or maybe today.
This could be a self-correcting issue.
Tom
A Proud member of the O.F.A. (Old Farts Association). Be well, do good work, and keep in touch.® (Garrison Keillor) I want some more patience. RIGHT NOW!
@Tom I don't see where you
)
@Tom
I don't see where you are excluding 3 of 5 GPUs from your machine in the cc_config file. From what I see, you would be using all GPUs for GW. Where did you put the <exclude>?
Z
Tom M wrote:So why was Boinc
)
Firstly, data files (.dat extensions) aren't tasks. Do you mean data files or do you mean tasks? None of the physical files you can see are tasks. Tasks are essentially just parameter entries in the state file.
If you do mean tasks, they will be done in FIFO order. You can override the FIFO queue by suspending the entries that would run next, if your aim is to run a different job that is further down the queue.
If you do actually mean that you have data files but you don't have any jobs to run, the following may help. You mentioned you had settings of 0.1 and 0.1 for your work cache. The way that works is that your client will initially request GPU work (of whatever description) so that you have at least 0.2 days worth (at current estimates) on board. No further requests for GPU work (of any description) will occur until you have less than 0.1 days worth (at current estimates) remaining. If you want to always have at least 0.2 days worth on hand, it's probably better to put all of that in the 1st setting and leave the 2nd setting at zero. Otherwise, the amount on hand may tend to cycle up and down a bit. Admittedly, there's not much 'cycling' that can happen between 0.1 and 0.2 so it's a minor point only. It would become much more visible if someone set 0.1 and 1.0 as the two values.
So if you currently have more than 0.1 days worth of work on board, your client wont request any sort of GPU work even if there are currently no FGRPB1G tasks on board.
The other thing you need to realise is that when there is a choice between the FGRP and GW versions of GPU tasks, there is a 'server-side mechanism' (set by staff) that will influence the choice made by the scheduler. At the moment, that mechanism seems to be set so as to prefer the sending of GW work, a lot of the time. It's quite easy to imagine periods where there is no FGRPB1G work on board. That is quite understandable, since the 'holy grail' for this project is the first ever detection of continuous GW. The scientists will want to get the GW work done as quickly as possible so sending it is likely to be prioritised.
If you want to get some FGRPB1G tasks, just deselect the GW type (perhaps temporarily) and once that change is in place, increase your work cache size enough to allow your client to make a new work request where the scheduler has only one choice in what it can send. If you don't do something like that, chances are that you will just get more GW work.
I don't know if I'm properly understanding what you are trying to do. Please correct me if I'm not.
Cheers,
Gary.
To decrease CPU-usage I have
)
To decrease CPU-usage I have tried this: (with lowered CPU usage, higher GPU utilization and lower throughput :(
1) Modify this:
2) Suspend GPU work.
# gcc -I/usr/local/cuda-10.2/targets/x86_64-linux/include/ -O2 -fPIC -shared -Wl,-soname,libsleep.so -o libsleep.so libsleep.c
# cp ~myhome/libsleep/libsleep.so /usr/lib/libsleep.so
# sync
3) resume GPU work.