No work available for the applications you have selected.

CElliott

Joined: 9 Feb 05

Posts: 28

Credit: 999396456

RAC: 542169

26 Jan 2021 13:12:07 UTC

Topic 224608

(moderation:

)

Last night, from 21:37:13 until 23:42:47, my computer (12591228) systematically flushed 905 "Gamma-ray pulsar binary search #1 (GPU)" work units. I don't know why; there is no indication in the log of any error. From then on the computer processed nothing all night, and this morning all action was deferred by BOINC for about 17 hours. I rebooted the computer and then repeatedly clicked "Update" until the 905 errorred-out work units were returned to the server. Thereupon the server said, "No work available for the applications you have selected." The server has repeatedly made that same return for the last two hours and 46 minutes. Yet, the server status page says that there are work units available to send in every category. I have downloaded 21 and processed 14 of the "Gravitational Wave search O2 Multi-Directional GPU" work units, which I detest, but still I can't access any "Gamma-ray pulsar binary search #1 (GPU)" work units, of which is server says it has many available.

Why can't I get any new work of the desired type?

It would be quite informative if a key were given on the server status page to tell what the column headings (FGRPB1G FGRP5 O2MDF O2MD1 BRP4) mean. It would be even better if the applications associated with those projects were named at the bottom of the page.

Ian&Steve C.

Joined: 19 Jan 20

Posts: 3953

Credit: 46808772642

RAC: 64224659

why do you detest Gravity

26 Jan 2021 14:30:36 UTC

Message 182844

(moderation:

)

why do you detest Gravity wave tasks?

as for an explanation, see this thread: https://einsteinathome.org/content/gamma-ray-gpu-tasks-hanging

TLDR;

the project has some new gamma ray tasks (LATeah3001L00) which do not run properly on Nvidia Volta/Turing/Ampere GPUs. The root cause is not yet known, just that these tasks will not process on the newer nvidia GPUs and your RTX 2070 falls into this category (Turing). That's why you had so many errors.

Because of this, the project admins have disabled sending FGRP tasks to all affected systems, so that's why you get the message that no tasks are available. They will presumably re-enable these tasks when they fix the problem, but there is no ETA for that.

_________________________________________________________________________

San-Fernando-Valley

Joined: 16 Mar 16

Posts: 406

Credit: 10173683455

RAC: 25969241

CElliott wrote: ... Why

26 Jan 2021 15:32:56 UTC

Message 182852

(moderation:

)

CElliott wrote:

...

Why can't I get any new work of the desired type?

...

Try reading some of the other posts which describe the problem ...

Have a great day !

Gary Roberts

Moderator

Joined: 9 Feb 05

Posts: 5872

Credit: 117576846558

RAC: 35256959

CElliott wrote:Last night,

26 Jan 2021 23:28:00 UTC

Message 182870

(moderation:

)

CElliott wrote:

Last night, from 21:37:13 until 23:42:47, my computer (12591228) systematically flushed 905 "Gamma-ray pulsar binary search #1 (GPU)" work units. I don't know why; there is no indication in the log of any error.

There has been a known problem for some days now for modern nvidia cards. If you weren't keeping such a large cache size of work, you would have seen it much earlier (the switch to the new type of tasks would have occurred much sooner) and far fewer tasks would have failed overall.

When tasks fail very quickly like these did, they would have shown as such in your event log. Your client fairly quickly then decides to go into a multi-hour (up to 24 hrs) backoff and you are left with pages and pages of computation errors on your machine, with more being added if you are observing before all the remaining tasks have failed. Yes, updating would be the only way to get rid of all the failed tasks (without waiting for the backoff to end) and return them to the project.

I took a look at your tasks list on the server and the failed tasks (for data file LATeah3001L00.dat) started being received by your host early on 22nd Jan and were returned nearly 4.5 days later on 26th Jan. You must have something like a 4 day cache size which is why it took so long for you to experience the problem.

There has been active discussion of this problem for quite a while now. If you had noticed any of that, you could have known of the impending failure and stopped receiving those new tasks before the problem even showed up.

CElliott wrote:

I have downloaded 21 and processed 14 of the "Gravitational Wave search O2 Multi-Directional GPU" work units, which I detest, ....

That's the best thing to do at the moment until there is further word from the Devs about the issue. They have a 7 day deadline so please keep a low work cache size. Maybe there'll be a change to a different GRP data file which doesn't trigger the problem. It might take a while for the 'new data' problem to be resolved.

Maybe you might get to "not detest" GW tasks, particularly if you think about their scientific importance. From the spectacular BH-BH and NS-NS merger events, we know GW is a real thing. Within LIGO and elsewhere, the race is on to detect the much weaker and more elusive continuous GW emissions from spinning massive objects like black holes and neutron stars. Just think how spectacular it would be if Citizen Science actually won that race!

Cheers,
Gary.

CElliott

Joined: 9 Feb 05

Posts: 28

Credit: 999396456

RAC: 542169

@Ian&Steve C., Gary

27 Jan 2021 1:16:20 UTC

Message 182876

(moderation:

)

@Ian&Steve C., Gary Roberts

Thank you very much for your reply. I had forgotten about the error you mentioned; as I recall though it has been in the system since 2 days after I installed the the new RTX 2070 2 years ago. If they can stop me from receiving these "Gamma-ray pulsar binary search #1 (GPU)" work units now, why can't they prevent the Turing cards from receiving the defective WUs altogether?

Here is the error information from the STDERR section of the Task report:

22:12:09 (8820): [debug]: Set up communication with graphics process.
boinc_get_opencl_ids returned [0000000000000000 , 0000000000000000] 
Failed to get OpenCL platform/device info from BOINC (error: -1)!
initialize_ocl(): Got no suitable OpenCL device information from BOINC - boincPlatformId is NULL - boincDeviceId is NULL
initialize_ocl returned error [2004]
OCL context null
OCL queue null
Error generating generic FFT context object [5]
22:12:10 (8820): [CRITICAL]: ERROR: MAIN() returned with error '5'

From the above it sure looks like a fault in my computer with the OpenCL driver becoming unavailable for some reason. If BOINC sees a sudden inexplicable and deleterious change like that, why can't it shut itself down with a warning in the event log rather than flush 905 work units? I was here; I worked until 9:00 PM that night and could have taken some action if I had just known what was going on. I had not rebooted in several weeks, which could easily have been the problem.

Again, thanks for all your help; you greatly relieved my mind.

Ian&Steve C.

Joined: 19 Jan 20

Posts: 3953

Credit: 46808772642

RAC: 64224659

another user (also on an old

27 Jan 2021 1:39:03 UTC

Message 182877

(moderation:

)

another user (also on an old Windows OS) showed driver issues and crashes like what you likely experienced too, also from these tasks. most people on Linux have the situation where it just hangs and doesnt do anything or ever finish, but it seems the older windows OSes like to have a driver crash as a result of whatever these tasks are trying to do on these nvidia cards. likely just a difference in how the OS/driver is handling the situation.

because your cache was so big, when you started having errors, it just kept trying every task you had available until your cache was exhausted. I agree that it might be more graceful if BOINC would pause or something after X number of failures without failing everything if left unattended, but that would probably be a large rewrite for BOINC code, and those things take time and resources. if you really want a feature like this, then go over the BOINC github and make a feature request and the devs will decide if it's something they want to do.

but you can help improve that situation yourself by just reducing your cache size so there aren't so many to error out in the event of a system issue.

_________________________________________________________________________

jcp

Joined: 7 May 18

Posts: 1

Credit: 607625

RAC: 251

I seem to have a related

27 Jan 2021 13:04:04 UTC

Message 182895

(moderation:

)

I seem to have a related problem running under Linux on a Raspberry Pi 3. I see no obvious errors, but no work units are downloading. Einstein@Home is only using 361KB of the 1.75 KB BOINC is using, there is still 4GB available.

I tried running Asteroids@Home on 2 of the cores, but it too has not downloaded any work units.

No work available for the applications you have selected.

Forums › Problems and Bug Reports

why do you detest Gravity

CElliott wrote: ... Why

CElliott wrote:Last night,

@Ian&Steve C., Gary

another user (also on an old

I seem to have a related

Comment viewing options

Forums › Problems and Bug Reports