Gravity Wave search on GPUs: do we have a problem?

Ian&Steve C.
Ian&Steve C.
Joined: 19 Jan 20
Posts: 3958
Credit: 47031642642
RAC: 65094159

Richard Haselgrove

Richard Haselgrove wrote:

Ian&Steve C. wrote:
you're still getting only 1000 cred when it validates. 

If.

None of my O3 tasks has been validated yet.

When.

Credit reward is based on this value.

But they don't appear to have a validator setup/running for these tasks yet. so they probably wont even try to validate until that's implemented. Proof: https://einsteinathome.org/workunit/544521572

_________________________________________________________________________

Ian&Steve C.
Ian&Steve C.
Joined: 19 Jan 20
Posts: 3958
Credit: 47031642642
RAC: 65094159

looks like they put up a

looks like they put up a Validator for these tasks, ran it for a bit, then shut it off for more troubleshooting. over a thousand tasks (including the one linked above) were marked inconclusive and resent to new hosts. only 2 validated according to the SSP.

_________________________________________________________________________

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 2143
Credit: 2959172831
RAC: 707714

Ian&Steve C. wrote: looks

Ian&Steve C. wrote:

looks like they put up a Validator for these tasks, ran it for a bit, then shut it off for more troubleshooting. over a thousand tasks (including the one linked above) were marked inconclusive and resent to new hosts. only 2 validated according to the SSP.

I got a lot of those resends overnight, and I'm working through them. My current score is:

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4312
Credit: 250582370
RAC: 34589

Thanks for the reports, and

Thanks for the reports, and sorry for the late reply.

Some remarks that may clear up things a bit, at least in hindsight:
- In contrast to previous transitions "within" the O2MDF application we now needed to let the old workunit generator run dry completely before we could set a new one. It was rather unfortunate that the O2MDF workunt generator ran out of work at the beginning of a weekend. This way we had no GPU GW work for a few days.

- Unfortunately the status shown on the server status page doesn't reflect the actual status in all cases - a daemon can be still running while it is "disabled". We disabled the O2MDF workunit generator to prevent it from being uselessly restated every 5 mins, just to terminates itself because there is no more work left to generate.

- Regarding the "cleanup at 99%": This is actually more than just cleanup. During the main computation the application cycles through millions of "templates" and matches it to the data. The result is a list of ~10000 "candidates" that match the data best. We got this algorithm running pretty efficient on the GPU. However in making this computation more efficient, a bit of statistical detail in the result is lost that is important later on to judge the quality of a candidate. Calculating this detail is clumsy; the memory access pattern is so random that running it on the GPU isn't any faster than on the CPU, and counting in the required memory transfers it would take even longer there. However, we don't need this this information for each of the millions of templates (most of which are thrown away), we only need it for the candidates that make it into the top n. So at the end we do a little more computation only on the candidates that come out of the GPU calculation, and do this on the CPU. The result list of O3ASE is a bit longer (30000 candidates) than that of O2MD (3x7500), so computation on the CPU will also take longer.

- Regarding CPU usage: Some years ago the behavior of the BOINC client regarding the CPU utilization ("estimate") for a GPU app was that when it's more than 0.9, it would lower the priority of the process to "nice" instead of "normal", which slows down the communication between GPU and CPU (especially on NVidia with "busy waiting"/"polling" on the CPU side) and also the CPU computation at the end. Has this changed since back then?

BM

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4312
Credit: 250582370
RAC: 34589

Indeed I'm still working on

Indeed I'm still working on the validator, there might be some changes of the vaildation status back and forth even of already reported (and validated) results.

BM

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 2143
Credit: 2959172831
RAC: 707714

Bernd Machenschalk wrote:-

Bernd Machenschalk wrote:

- Regarding CPU usage: Some years ago the behavior of the BOINC client regarding the CPU utilization ("estimate") for a GPU app was that when it's more than 0.9, it would lower the priority of the process to "nice" instead of "normal", which slows down the communication between GPU and CPU (especially on NVidia with "busy waiting"/"polling" on the CPU side) and also the CPU computation at the end. Has this changed since back then

Ah. I didn't know that. Yes, it seems it is still true:

https://github.com/BOINC/boinc/blob/master/client/app_start.cpp#L560

    // run it at above idle priority if it
    // - uses coprocs
    // - uses less than one CPU
    // - is a wrapper
    //
    bool high_priority = false;
    if (app_version->rsc_type()) high_priority = true;
    if (app_version->avg_ncpus < 1) high_priority = true;
    if (app_version->is_wrapper) high_priority = true;

But I don't know how case (1) and case (2) interact. I'll try to tease that out.

Edit - checked on a Windows machine, set up the same way with 1.0 CPU set via app_config.xml (I'm more familiar with the Windows tools).

Einstein GPU app - still O2MDF in this case - is shown as 'base priority: below normal', whereas pure CPU apps are 'base priority: low'. So the logic seems to be "if any one (or more than one) of these cases apply, increase the priority".

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4312
Credit: 250582370
RAC: 34589

Richard Haselgrove wrote: As

Richard Haselgrove wrote:

As usual with OpenCL on NVidia, it actually clocks up 100% CPU while running, but is sent out with an estimate of 0.9 - which allows BOINC to start another CPU task.

Will BOINC actually do this, i.e. start another CPU task for this 0.1 "free" CPU? Or will it do this only if you are running 10 GPU tasks in parallel, summing up to a free CPU core?

Would it help to set the CPU utilization (estimate) to 0.99 instead of 0.9?

BM

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 2143
Credit: 2959172831
RAC: 707714

Bernd Machenschalk

Bernd Machenschalk wrote:

Richard Haselgrove wrote:

As usual with OpenCL on NVidia, it actually clocks up 100% CPU while running, but is sent out with an estimate of 0.9 - which allows BOINC to start another CPU task.

Will BOINC actually do this, i.e. start another CPU task for this 0.1 "free" CPU? Or will it do this only if you are running 10 GPU tasks in parallel, summing up to a free CPU core?

Would it help to set the CPU utilization (estimate) to 0.99 instead of 0.9?

Yes and no.

Only the integer part is considered. 0.99 will allow a CPU task to be started. 2 x 0.99 (1.98) will allow one task to be started, but not two.

Ian&Steve C.
Ian&Steve C.
Joined: 19 Jan 20
Posts: 3958
Credit: 47031642642
RAC: 65094159

some of this terminology is

BOINC logic is that less than >1 really means 0 for accounting.

0.9 = 0 CPUs reserved

0.9+0.9 = 1 CPU reserved

0.9+0.9+0.9 = 2 CPUs reserved

 

some of this priority terminology is counterintuitive though. are you saying that the tasks will run slower if the CPU use estimate is greater than 1? or less than 1?

i force all my tasks to 1.0 CPU via an app_config as that is best representative of actual CPU use and BOINC accounting for available resources. is there a reason the project doesnt want to do this for nvidia tasks?

_________________________________________________________________________

Ian&Steve C.
Ian&Steve C.
Joined: 19 Jan 20
Posts: 3958
Credit: 47031642642
RAC: 65094159

Richard Haselgrove

Richard Haselgrove wrote:

Einstein GPU app - still O2MDF in this case - is shown as 'base priority: below normal', whereas pure CPU apps are 'base priority: low'. So the logic seems to be "if any one (or more than one) of these cases apply, increase the priority".

being that these are GPU tasks, it seems that case 1 (uses coprocs) will always be satisfied regardless of the others, no?

_________________________________________________________________________

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.