Machines not receiving new tasks after 12 per day

SJC_Steve

Joined: 20 Jul 11

Posts: 28

Credit: 567066246

RAC: 78826

14 Jan 2021 1:01:40 UTC

Topic 224502

(moderation:

)

My PC isn't doing much E@H GPU work. In the message log, it says no new tasks are being sent since it has completed the daily limit of 12 tasks.

Is this a new limit on the number tasks that can be done / day and why?

Thanks,
Steve

archae86

Joined: 6 Dec 05

Posts: 3164

Credit: 7373481687

RAC: 2165134

SJC_Steve wrote: Is this a

14 Jan 2021 1:36:33 UTC

Message 182401

(moderation:

)

SJC_Steve wrote:

Is this a new limit on the number tasks that can be done / day and why?

The limit has not changed in a long while. And for your machine it is far more than 12 until it got in trouble.

Your actual problem is that the machine produced a flurry of nearly instant fails, with a reported 2 seconds of run time giving computation error result on GW GPU tasks.

Each such error reduces your daily quota. Yours worked its way down to 12.

If you find the problem and fix it, then on the next new day you'll be able to get some work on the reduced quota, and each successful result returned will greatly raise the quota.

So fixing the problem causing instant failure of every task is the priority here.

Unfortunately, my brief review of your stderr did not show the symptoms of a problem that I recognize. Possibly someone else will drop by to help.

My personal first move would be to reboot the machine.

Gary Roberts

Moderator

Joined: 9 Feb 05

Posts: 5887

Credit: 119402292372

RAC: 25916096

SJC_Steve wrote:My

14 Jan 2021 8:24:30 UTC

Message 182403

(moderation:

)

SJC_Steve wrote:

My PC isn't doing much E@H GPU work.

You have 3 different machines on your account so I had to do a bit of searching to identify one with problems. The tasks list currently shows 549 tasks in total of which the most recent 35 have compute errors. By clicking on the TaskID link for the first error, there is an output log that contains the line:-

[ERROR] Couldn't get OpenCL device from BOINC (-1)!

The task was returned at 13 Jan 2021 23:00:09 UTC, as were a lot more of these. Interestingly, there were also 3 successfully completed and validated tasks returned at exactly the same time along with all the errors. Prior to that, at 12:17:36 UTC (almost 11 hours earlier) there were more validated tasks returned so it seems to suggest that the machine was working fine and was perhaps shut down (or at least stopped from crunching) for a number of hours (did you do something like a driver update??) and that when crunching was resumed there were already the 3 'good' completed tasks and then a bunch of immediate errors that got returned.

If you want to identify what caused the issue, I suggest you work out your local times corresponding to the above two UTC times recorded by the server and think about what changes you might have made during that particular interval. It seems likely that something you did caused the rather drastic set of events that followed. I note that your three machines have the same Ubuntu version but have 3 different GPU driver versions. That may be significant.

Cheers,
Gary.

SJC_Steve

Joined: 20 Jul 11

Posts: 28

Credit: 567066246

RAC: 78826

Gary and Archae86, Thanks

14 Jan 2021 17:29:13 UTC

Message 182421

(moderation:

)

Gary and Archae86,

Thanks for the input, I'll dig into it.

Steve

SJC_Steve

Joined: 20 Jul 11

Posts: 28

Credit: 567066246

RAC: 78826

In looking into the lastest

14 Jan 2021 17:43:43 UTC

Message 182422

(moderation:

)

In looking into the lastest failure on WORKUNIT 517274929, it shows three different computers erroring out and one computer that completed and validated the task. It's uncompleted at this point waiting for another validate.

Is this an issue with our PCs or the task?

Thanks,
Steve

SJC_Steve

Joined: 20 Jul 11

Posts: 28

Credit: 567066246

RAC: 78826

I updated and rebooted one of

14 Jan 2021 18:45:21 UTC

Message 182426

(moderation:

)

I updated and rebooted one of my PCs that wasn't getting work and it began working. I'll try the same with the remaining ones.

Thanks again,

Steve

archae86

Joined: 6 Dec 05

Posts: 3164

Credit: 7373481687

RAC: 2165134

SJC_Steve wrote: In looking

14 Jan 2021 18:49:52 UTC

Message 182427 in response to message 182422

(moderation:

)

SJC_Steve wrote:

In looking into the lastest failure on WORKUNIT 517274929, it shows three different computers erroring out and one computer that completed and validated the task. It's uncompleted at this point waiting for another validate.

The task you mention is a GW task with very high issue number and a DF of .60. Probably this has quite a high requirement of VRAM on the GPU card to run successfully.

Your two other quorum partners who failed did so after about 90 seconds of run time. The first reports that the card had 2048 MiB of memory, and includes the familiar error indication:

CL_MEM_OBJECT_ALLOCATION_FAILURE

The second for some reason has an empty stderr, but the computer does report 2048 MB of GPU memory, and it is in my opinion nearly certain that it also failed for lacking adequate VRAM to run this particular task.

The single computer which has so far successfully completed this task reports GPU VRAM of 4095 MB, which obviously was sufficient.

Yours, on the other hand, failed after less than 3 seconds, not 90, does not have the standard memory inadequacy message, does have another standard message (Gary discussed this point) and the computer reports 4083 MB of VRAM on the GPU.

You had a configuration problem on the machine. The reason for your sequence of fast failures is not bad tasks.

SJC_Steve

Joined: 20 Jul 11

Posts: 28

Credit: 567066246

RAC: 78826

Thanks for the updates. I

14 Jan 2021 19:06:39 UTC

Message 182429

(moderation:

)

Thanks for the updates. I updated and rebooted all three computers. One of them was still doing work while the other two were failing after a few seconds as you stated. It's a mystery to me as I don't remember doing anything to two of them in past couple of days but all seems well now.

Thanks again,

Steve

Zalster

Joined: 26 Nov 13

Posts: 3117

Credit: 4050672230

RAC: 0

You were probably the victim

17 Jan 2021 6:53:18 UTC

Message 182494 in response to message 182429

(moderation:

)

You were probably the victim of a critical update for your Ubuntu systems. I was talking to Keith about the same issue with 2 of my machines. There was a critical update that also also updated the drivers on the machines, literally pulling the opencl component out was they were still crunching. Since the GPUs were in use, it didn't install the updated opencl until you did an manual update and reboot. That's why you get the message of the missing OpenCl

Machines not receiving new tasks after 12 per day

Forums › Problems and Bug Reports

Comment viewing options

Forums › Problems and Bug Reports