No work available for the applications you have selected.

CElliott
CElliott
Joined: 9 Feb 05
Posts: 28
Credit: 999259790
RAC: 534543
Topic 224608


Last night, from 21:37:13 until 23:42:47, my computer (12591228) systematically flushed 905 "Gamma-ray pulsar binary search #1 (GPU)" work units. I don't know why; there is no indication in the log of any error. From then on the computer processed nothing all night, and this morning all action was deferred by BOINC for about 17 hours. I rebooted the computer and then repeatedly clicked "Update" until the 905 errorred-out work units were returned to the server. Thereupon the server said, "No work available for the applications you have selected." The server has repeatedly made that same return for the last two hours and 46 minutes. Yet, the server status page says that there are work units available to send in every category. I have downloaded 21 and processed 14 of the "Gravitational Wave search O2 Multi-Directional GPU" work units, which I detest, but still I can't access any "Gamma-ray pulsar binary search #1 (GPU)" work units, of which is server says it has many available.

Why can't I get any new work of the desired type?

It would be quite informative if a key were given on the server status page to tell what the column headings (FGRPB1G    FGRP5    O2MDF    O2MD1    BRP4) mean. It would be even better if the applications associated with those projects were named at the bottom of the page.

Ian&Steve C.
Ian&Steve C.
Joined: 19 Jan 20
Posts: 3953
Credit: 46801762642
RAC: 64212257

why do you detest Gravity

why do you detest Gravity wave tasks?

 

as for an explanation, see this thread: https://einsteinathome.org/content/gamma-ray-gpu-tasks-hanging

 

TLDR;

the project has some new gamma ray tasks (LATeah3001L00) which do not run properly on Nvidia Volta/Turing/Ampere GPUs. The root cause is not yet known, just that these tasks will not process on the newer nvidia GPUs and your RTX 2070 falls into this category (Turing). That's why you had so many errors.

Because of this, the project admins have disabled sending FGRP tasks to all affected systems, so that's why you get the message that no tasks are available. They will presumably re-enable these tasks when they fix the problem, but there is no ETA for that.

_________________________________________________________________________

San-Fernando-Valley
San-Fernando-Valley
Joined: 16 Mar 16
Posts: 406
Credit: 10170773455
RAC: 25957018

CElliott wrote: ... Why

CElliott wrote:


...

Why can't I get any new work of the desired type?

...

 

Try reading some of the other posts which describe the problem ...

Have a great day !

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5872
Credit: 117572826587
RAC: 35234577

CElliott wrote:Last night,

CElliott wrote:
Last night, from 21:37:13 until 23:42:47, my computer (12591228) systematically flushed 905 "Gamma-ray pulsar binary search #1 (GPU)" work units. I don't know why; there is no indication in the log of any error.

There has been a known problem for some days now for modern nvidia cards.  If you weren't keeping such a large cache size of work, you would have seen it much earlier (the switch to the new type of tasks would have occurred much sooner) and far fewer tasks would have failed overall.

When tasks fail very quickly like these did, they would have shown as such in your event log.  Your client fairly quickly then decides to go into a multi-hour (up to 24 hrs) backoff and you are left with pages and pages of computation errors on your machine, with more being added if you are observing before all the remaining tasks have failed.  Yes, updating would be the only way to get rid of all the failed tasks (without waiting for the backoff to end) and return them to the project.

I took a look at your tasks list on the server and the failed tasks (for data file LATeah3001L00.dat) started being received by your host early on 22nd Jan and were returned nearly 4.5 days later on 26th Jan.  You must have something like a 4 day cache size which is why it took so long for you to experience the problem.

There has been active discussion of this problem for quite a while now.  If you had noticed any of that, you could have known of the impending failure and stopped receiving those new tasks before the problem even showed up.

CElliott wrote:
I have downloaded 21 and processed 14 of the "Gravitational Wave search O2 Multi-Directional GPU" work units, which I detest, ....

That's the best thing to do at the moment until there is further word from the Devs about the issue.  They have a 7 day deadline so please keep a low work cache size.  Maybe there'll be a change to a different GRP data file which doesn't trigger the problem.  It might take a while for the 'new data' problem to be resolved.

Maybe you might get to "not detest" GW tasks, particularly if you think about their scientific importance.  From the spectacular BH-BH and NS-NS merger events, we know GW is a real thing.  Within LIGO and elsewhere, the race is on to detect the much weaker and more elusive continuous GW emissions from spinning massive objects like black holes and neutron stars.  Just think how spectacular it would be if Citizen Science actually won that race!

Cheers,
Gary.

CElliott
CElliott
Joined: 9 Feb 05
Posts: 28
Credit: 999259790
RAC: 534543

@Ian&Steve C., Gary

@Ian&Steve C., Gary Roberts

Thank you very much for your reply.  I had forgotten about the error you mentioned; as I recall though it has been in the system since 2 days after I installed the the new RTX 2070 2 years ago.  If they can stop me from receiving these "Gamma-ray pulsar binary search #1 (GPU)" work units now, why can't they prevent the Turing cards from receiving the defective WUs altogether?  

 

Here is the error information from the STDERR section of the Task report:

22:12:09 (8820): [debug]: Set up communication with graphics process.
boinc_get_opencl_ids returned [0000000000000000 , 0000000000000000] 
Failed to get OpenCL platform/device info from BOINC (error: -1)!
initialize_ocl(): Got no suitable OpenCL device information from BOINC - boincPlatformId is NULL - boincDeviceId is NULL
initialize_ocl returned error [2004]
OCL context null
OCL queue null
Error generating generic FFT context object [5]
22:12:10 (8820): [CRITICAL]: ERROR: MAIN() returned with error '5'

From the above it sure looks like a fault in my computer with the OpenCL driver becoming unavailable for some reason.  If BOINC sees a sudden inexplicable and deleterious change like that, why can't it shut itself down with a warning in the event log rather than flush 905 work units?  I was here; I worked until 9:00 PM that night and could have taken some action if I had just known what was going on.  I had not rebooted in several weeks, which could easily have been the problem.

Again, thanks for all your help; you greatly relieved my mind.

 

Ian&Steve C.
Ian&Steve C.
Joined: 19 Jan 20
Posts: 3953
Credit: 46801762642
RAC: 64212257

another user (also on an old

another user (also on an old Windows OS) showed driver issues and crashes like what you likely experienced too, also from these tasks. most people on Linux have the situation where it just hangs and doesnt do anything or ever finish, but it seems the older windows OSes like to have a driver crash as a result of whatever these tasks are trying to do on these nvidia cards. likely just a difference in how the OS/driver is handling the situation.

because your cache was so big, when you started having errors, it just kept trying every task you had available until your cache was exhausted. I agree that it might be more graceful if BOINC would pause or something after X number of failures without failing everything if left unattended, but that would probably be a large rewrite for BOINC code, and those things take time and resources. if you really want a feature like this, then go over the BOINC github and make a feature request and the devs will decide if it's something they want to do.

but you can help improve that situation yourself by just reducing your cache size so there aren't so many to error out in the event of a system issue.

_________________________________________________________________________

jcp
jcp
Joined: 7 May 18
Posts: 1
Credit: 607625
RAC: 251

I seem to have a related

I seem to have a related problem running under Linux on a Raspberry Pi 3.  I see no obvious errors, but no work units are downloading.  Einstein@Home is only using 361KB of the 1.75 KB BOINC is using, there is still 4GB available.

 

I tried running Asteroids@Home on 2 of the cores, but it too has not downloaded any work units.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.