Einstein@Home | Aborting task h1... : exceeded elapsed time limit...

Glenn Hawley, R...

Joined: 6 Mar 05

Posts: 48

Credit: 903950180

RAC: 265936

I've discovered

7 Dec 2019 7:13:41 UTC

Message 174746

(moderation:

)

I've discovered the https://boinc.berkeley.edu/wiki/Client_configuration, which will allow me to try to figure out how to control the CPU and GPU usage to optimize matters.

Tomorrow, though, since I just got home from curling and it's nearly midnight

archae86

Joined: 6 Dec 05

Posts: 3161

Credit: 7300735023

RAC: 2229448

Glenn Hawley, RASC Calgary

7 Dec 2019 14:06:46 UTC

Message 174750 in response to message 174744

(moderation:

)

Glenn Hawley, RASC Calgary wrote:

The GPU must now be competing for time against something, since the estimated time for the WU has gone from about 3.5 hours up to 3+ days.

I think a variant of max_concurrent would be necessary to rein in the CPU processes.
BOINC and Win10 seem bound and determined to run 16 CPUs worth of work concurrently, even though I've restricted the application (in Process Lasso) to even CPUs only.

It is the CPU support task for the GPU tasks which is not getting service fast enough to keep the GPU as busy as it could. That is because too many tasks are being launched. That is because you changed the parameter which regulates number of tasks launched upward. BOINC knows nothing of the affinity restriction you placed using Process Lasso, so launches tasks based on their estimated CPU consumption (estimated by Einstein and bundled with the task), the number of CPUs reported to exist on your machine (which can be the truth, or can be fudged by you using the ncpus parameter in the options section of cc_config.xml.), as modified by the preference item "Use at most xx% of processors".

If you want to limit the number of tasks BOINC decides to launch simultaneously, tweaking the use at most % of processors is your method. And you should.

Setting durable affinities using Process Lasso affects where tasks run, not how many run, as it does not modify BOINC's view of the available resources at all.

Generally the CPU support task associated with a particular GPU task needs to get near-instant access to a CPU any time the GPU tasks reaches a point of need, as otherwise the GPU task just waits for support. Hence many of us running GPU work at Einstein hold the BOINC task launch count below maximum, using the the "use at most xx% of processors".

I had forgotten these points when I recently intentionally tried running CPU Gravity Wave tasks on my main machine which normally just runs two GRP GPU tasks. As the GPU is a Radeon VII (quite productive on Einstein GRP work), I had faked the number of CPUs up from the physical 4 actually present to 16 in order to raise the maximum number of tasks Einstein was willing to send per day to somewhat higher than actual production. That works fine even with a "use at most 100% of CPUs" restriction when no CPU tasks are running, and the number of GPU tasks is set at 2 by my multiplicity specification. But when I enabled CPU task processing, I promptly got something like 14 CPU task running in addition to my two GPU tasks. As the real processor is a four-core one without hyperthreading, this meant a great deal of swapping and would have crippled GPU productivity had I permitted it to continue.

Limiting the number of tasks launched using the "use at most % of processors" is important, and setting affinity with Process Lasso can be important, but they address different parts of the problem.

I am ignorant of the behavior of max_concurrent and don't modify it on my machines. Perhaps you may find it useful.

Glenn Hawley, R...

Joined: 6 Mar 05

Posts: 48

Credit: 903950180

RAC: 265936

"If you want to limit the

7 Dec 2019 19:00:01 UTC

Message 174753

(moderation:

)

"If you want to limit the number of tasks BOINC decides to launch simultaneously, tweaking the use at most % of processors is your method. And you should."

So, if I understand this correctly, limiting to 87.5% of processors should free up one thread (of the 16) that the GPU can still access?

Glenn Hawley, R...

Joined: 6 Mar 05

Posts: 48

Credit: 903950180

RAC: 265936

It didn't work. Setting use

7 Dec 2019 19:34:22 UTC

Message 174754

(moderation:

)

It didn't work.

Setting use 87.5% of CPUs dropped two threads off my CPU workflow.
And, I still ended up getting only one GPU workunit, but at 2.5 hours instead of 1.5 hours, so the idled CPU was not being made available to the GPU.

It would be worthwhile the loss of two CPU workunits if I could have gotten two GPU ones simultaneously, but it seems that I'm best off leaving it alone at one process per thread and one for the GPU.

If I ever do go out and get a higher-end graphics card, I might well re-examine these issues.

Holmis

Joined: 4 Jan 05

Posts: 1118

Credit: 1055935564

RAC: 0

When you say that 16 CPU

7 Dec 2019 22:07:39 UTC

Message 174758 in response to message 174754

(moderation:

)

When you say that 16 CPU tasks are running where do you look to get that info? Boinc Manager, Process Lasso, Process Explorer or some other tool?

What's your current setting for the "GPU utilization factor of GW apps" setting? Is it still 0.33?

If it is and you look in Boinc Manager when you are set to use 100% of the CPUs then there should be 15 CPU tasks with the status "Running" and 3 GPU tasks with the status "Running 1 CPU & 0.33 GPU". Do you see this?

What's you cache settings?
Store at least: X days of work
Store up to an additional: X days

I'm wondering if Boinc thinks that it's in deadline pressure.

I took a look at your task list and noticed that you're aborting all "Gamma-ray pulsar binary search #1 on GPUs" tasks, if you don't want to run them then deselect them in your preferences.

What I would do to test the performance of your machine is:

Set a low cache, 0.1 + 0.01 days or thereabouts.
Set the "GPU utilization factor" to 1.0 for all three types of apps.
Reset any settings you've changed via Process Lasso.
Set Boinc to only use 7 of the 16 threads (43,75%). Or even lower than that.

Then either let the cache run down or if you've been set to multiple day cache then abort most of if so Boinc will download at least one new GPU tasks so the new utilization factor gets applied.

After all this let the machine run for at least half a day without changing any settings.
This should get you some data on the performance of your machine.

Einstein@Home | Aborting task h1... : exceeded elapsed time limit...

Forums › Problems and Bug Reports

I've discovered

Glenn Hawley, RASC Calgary

"If you want to limit the

It didn't work. Setting use

When you say that 16 CPU

Comment viewing options

Forums › Problems and Bug Reports