WU hit 100% but never finishes (20+ hours & counting)?

Kyle Tinsley
Kyle Tinsley
Joined: 11 Nov 04
Posts: 9
Credit: 18331034
RAC: 0
Topic 224601

Hello, I just re-added Einstein@Home very recently, and for the first few days it seemed to be crunching fine. However yesterday I noticed that it had spent over 17 hours on a single WU (which is about 17 X longer than usual) and was at 100% but would never actually mark it as complete.

I thought maybe it had just gotten stuck on finishing that final step, so I restarted the whole BOINC client. When it came back up, it seemed to completely re-start work on that WU from 0% again. It's now 20 hours later and it's once again stuck at 100% and still ticking.

I read a little bit about the how the completion progress for these WU is supposed to go, and I understand that the progress from 99 to 100% can be much slower than the rest of it. And I DID notice before I went to bed last night that it was making extremely small (0.001-ish percent increments) progress very slowly at around 99.8xx% at the time. ETA was like 25 seconds, and 20 mins later it would be at 22 seconds but still only a tiny fraction of a percent closer to being done.

I left it alone because it still seemed to be making progress, but this many hours later I have to believe it's just permanently stuck like this. What do I do to resolve this situation? Let it sit there pointlessly crunching for the next week and a half until the deadline hits? Is there a way to force report or cancel a specific work unit?

 

Thanks!

archae86
archae86
Joined: 6 Dec 05
Posts: 3145
Credit: 7023324931
RAC: 1821328

You have not taken the

You have not taken the measure available to you at Account|Preferences|Privacy|Should Einstein@Home show your computers on its website?:

To let us see your hardware.  However, a large fraction of posts in the last few days are from people in similar situations who own relatively modern Nvidia cards who find them failing to run properly if they are running the Gamma-Ray Pulsar application in recently distributed tasks.  Most likely you share that problem.

As to getting rid of work units unlikely to succeed:

Open Boinc Manager

Assure you are in the Advanced View mode:
View|Advanced View

Select the Tasks Tab

Select tasks in the list you think unlikely to work 

Click Abort

The project has recently adopted a measure which should mean your machine won't get more GRP GPU tasks until they figure out a fix for the problem.  So aborting the bad tasks you already have should be enough.

Of course, it is entirely possible you have another problem entirely, but not very likely.

 

Ian&Steve C.
Ian&Steve C.
Joined: 19 Jan 20
Posts: 3681
Credit: 33823223910
RAC: 37778601

Let me guess, Volta, Turing,

Let me guess, Volta, Turing, or  Ampere nvidia GPU, and Gamma ray tasks?

if so, this is a known problem with the new GR tasks. The project devs are aware of the issue, but have no ETA for a fix. 
 

the only thing you can do is abort all LATeah3001L00 tasks and enable Gravitational Wave work and crunch those until the GR work is fixed for the newer nvidia cards. One of the project admins stated that future FGRB tasks would be blocked for distribution to GPUs with CC >7.0, which corresponds to the Volta/Turing/Ampere cards 

_________________________________________________________________________

Kyle Tinsley
Kyle Tinsley
Joined: 11 Nov 04
Posts: 9
Credit: 18331034
RAC: 0

Ian&Steve C. wrote: Let me

Ian&Steve C. wrote:


Let me guess, Volta, Turing, or  Ampere nvidia GPU, and Gamma ray tasks?

if so, this is a known problem with the new GR tasks. The project devs are aware of the issue, but have no ETA for a fix.

the only thing you can do is abort all LATeah3001L00 tasks and enable Gravitational Wave work and crunch those until the GR work is fixed for the newer nvidia cards. One of the project admins stated that future FGRB tasks would be blocked for distribution to GPUs with CC >7.0, which corresponds to the Volta/Turing/Ampere cards

I don't know what Volta/Turing/Ampere refers to, but the card is "NVIDIA GeForce RTX 2080 SUPER". 

Application: Gamma-ray pulsar binary search #1 on GPUs 1.22 (FGRPopencl-nvidia)
Name: LATeah3001L00_180.0_0_0.0_131880_0

Since it sounds like this is probably the situation you're describing, are you saying I should uncheck all of the boxes on the apps list below except for the bottom 3 (or 4)?

 Binary Radio Pulsar Search (Arecibo)
 Binary Radio Pulsar Search (Arecibo, GPU)
 Gamma-ray pulsar binary search #1
 Gamma-ray pulsar search #5
 Gamma-ray pulsar binary search #1 (GPU)
 Continuous Gravitational Wave search O2 All-Sky
 Gravitational Wave Injection run on LIGO O1 Open Data
 Gravitational Wave search O2 Multi-Directional
 Gravitational Wave search O2 Multi-Directional GPU

Thanks!

Ian&Steve C.
Ian&Steve C.
Joined: 19 Jan 20
Posts: 3681
Credit: 33823223910
RAC: 37778601

Then the answer is yes.

Then the answer is yes. That’s the problem I’m describing. 
 

RTX 2080 Super (as well as all RTX 20-series, and all GTX 16-series) are Nvidia’s Turing architecture. Volta is the architecture before Turing (in the consumer market only represented by the Titan V), and Ampere is the architecture after, currently only in the consumer market in the form of the RTX 30-series cards. 
 

the only Nvidia cards that seem to work with this new set of GR tasks are the Pascal or older cards. Think GTX 10-series and older. 
 

switch to Gravitational Wave search O2 Multi-Directional GPU. Those tasks will work on your GPU. Anything that doesn’t say “GPU” is CPU work. You don’t need to uncheck anything since after you abort the all the bad gamma ray GPU tasks, you shouldn’t download anymore according to the admins. They probably won’t re-enable gamma ray for these cards until the issue is resolved. 

_________________________________________________________________________

Kyle Tinsley
Kyle Tinsley
Joined: 11 Nov 04
Posts: 9
Credit: 18331034
RAC: 0

Thank you for the

Thank you for the explanations Ian and/or Steve, and ARCHAE86!

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.