Limited number of taks per day ?

astro-marwil

Joined: 28 May 05

Posts: 532

Credit: 649916543

RAC: 1116693

9 May 2019 19:28:45 UTC

Topic 218831

(moderation:

)

Hallo !

Is there a new limitation in the number of tasks of an application availeble? I´m crunching at now not more than 12 tasks per day. Or what´s going on there? The workgenerator is allready running.

Kind regards an happy crunching

Martin

archae86

Joined: 6 Dec 05

Posts: 3157

Credit: 7226778261

RAC: 1073089

The limitation is not new.

9 May 2019 19:38:13 UTC

Message 171237

(moderation:

)

The limitation is not new. What is new is your huge error rate. In a short period of time on May 9 your host reported more than twenty tasks with "Error while computing" status.

Errors lower your daily task quota. Successful returns raise it, rapidly. Fix the error problem and the task limitation will swiftly cease to be lower than before, which obviously did not trouble you.

astro-marwil

Joined: 28 May 05

Posts: 532

Credit: 649916543

RAC: 1116693

Hallo Archea86!Thanks for

9 May 2019 20:08:21 UTC

Message 171238 in response to message 171237

(moderation:

)

Hallo Archea86!

Thanks for your quick answer.

These error rate wasn´t realy an error rate, but I canceld by my hands indeed lots of tasks, as my grafic card AMD R7790 failed. And this offcourse was about a week ago. Since I installed an oldish grafic card gtx 550ti, i got for several days sufficent new tasks, up to now.

So, I assume there is also some others.

Kind regards an happy crunching

Martin

archae86

Joined: 6 Dec 05

Posts: 3157

Credit: 7226778261

RAC: 1073089

astro-marwil wrote:These

9 May 2019 21:00:23 UTC

Message 171242 in response to message 171238

(moderation:

)

astro-marwil wrote:

These error rate wasn´t realy an error rate, but I canceld by my hands indeed lots of tasks, as my grafic card AMD R7790 failed. And this offcourse was about a week ago.

Urm... No.

The ones you cancelled a week ago show in the error task list as status Aborted with Time reported as 2 May 2019 23:23:48 UTC.

Those are not the ones I am talking about. For example this one:

https://einsteinathome.org/task/852620484

The common features of these are that they show status in your web site task list as "Error while computing", run time of about 19 seconds, CPU time of about 5 seconds, and Time reported covering the range of

9 May 2019 17:44:03 UTC through 9 May 2019 18:37:52 UTC densely, with a few others scattered as far back as May 6.

Please reread my initial response. I meant it. It applies to your situation.

Hint: A good way to look at recent troubled tasks is to go to the tasks list for your computer at the web site. Then:

Click on the "Error" link at the top of the page, in order only to see errored tasks.

Click, twice, on the "Sent" column header. This will sort the errored tasks from most recently sent to your machine on back to longer ago. That will spare you looking at week old problems instead of current problems.

astro-marwil

Joined: 28 May 05

Posts: 532

Credit: 649916543

RAC: 1116693

Hallo Archae86!Oh, indeed,

10 May 2019 8:07:19 UTC

Message 171246 in response to message 171242

(moderation:

)

Hallo Archae86!

Oh, indeed, I´d overlook this, as the nummber of erroed task was deminishing over the days. I had a buffer of tasks for 5 days before.

This here is written etc. on the same computer as E@H crunching takes place. As far as I can see, everything runs well.

In the Stderr outputs I lokalized 3 common peculiarities, I have dificulties to explain and would be pleased to get your comments:

<message>
Netzwerkzugriff verweigert.    --- network attach denied ----
 (0x41) - exit code 65 (0x41)</message>
<stderr_txt>
06:18:24 (28460): [normal]: This Einstein@home App was built at: May  8 2019 13:29:27

2)

read_checkpoint(): Couldn't open file 'LATeah1049Y_156.0_0_0.0_24311686_1_0.out.cpt': No such file or directory (2)

Error during OpenCL bloc_info host->device transfer - candidates (error: -4)
ERROR: opencl_prepare_power_toplist() returned with error 20115560
06:18:29 (28460): [CRITICAL]: ERROR: MAIN() returned with error '1'
FPU status flags:  PRECISION
Error in OpenCL context: CL_MEM_OBJECT_ALLOCATION_FAILURE error executing CL_COMMAND_WRITE_BUFFER on GeForce GTX 550 Ti (Device 0).

------

1) and 2) may be linked, but for common use network works well, 3) seems to be completly different.

Meanwhile I performed a project reset in BOINC-manager/projects and downloaded and installed the newest graficscard driver from nvidia. Now I have to wait lengthy 17h before getting new chances.

The nvidia-gpu application uses much more the cpu (about 23%) while 100% on gpu, whereas the amd-gpu application does use only 2.5% cpu while using 98% gpu, but having about 3 times higher output. So lots of crunching on nvidia-gpu application still takes place on the cpu and the question is, whether the ALU on the cpu performes well. Do you have a testprogram for the ALU, you can recommend?

After nearly 14 years of almost continuesly crunching E@H, this is the first time for me having such problems.

Looking forward to your soon reply,

I remain with kind regards

Martin

MAGIC Quantum M...

Joined: 18 Jan 05

Posts: 1889

Credit: 1412814485

RAC: 1196963

https://www.geforce.com/drive

10 May 2019 9:55:52 UTC

Message 171248

(moderation:

)

https://www.geforce.com/drivers/results/132845

this is the newest driver for your card and the one that yours still says it has is from Oct 2017

you should be able to get new tasks in 24 hours but with that old 550Ti you should test it by only running a single task

and see if you get a complete Valid.

I still have my original 550Ti from back when they didn't need the CPU to run GPU's here but I retired it from running these tasks about a year ago and just run a 660Ti SC card and a Ryzen on another pc now.

MarkJ

Joined: 28 Feb 08

Posts: 437

Credit: 139002861

RAC: 0

[quote=astro-marwilError

10 May 2019 10:56:06 UTC

Message 171249 in response to message 171246

(moderation:

)

Quote:

ERROR: opencl_prepare_power_toplist() returned with error 20115560

06:18:29 (28460): [CRITICAL]: ERROR: MAIN() returned with error '1'
FPU status flags:  PRECISION
Error in OpenCL context: CL_MEM_OBJECT_ALLOCATION_FAILURE error executing CL_COMMAND_WRITE_BUFFER on GeForce GTX 550 Ti (Device 0).

Is it really a GTX 550 Ti? They were made in 2011, I am surprised it’s still running. I suspect your issue is the card only has 1GB of memory and ran out of memory.

BOINC blog

astro-marwil

Joined: 28 May 05

Posts: 532

Credit: 649916543

RAC: 1116693

Hallo MAGIC QUANTUM and

10 May 2019 11:24:28 UTC

Message 171250 in response to message 171249

(moderation:

)

Hallo MAGIC QUANTUM and MARKJ!

Thanks for your answer!

The peculiarity is, that it was crunching successfully doozens of tasks with the driver from 2017, which was installed by Win10. It´s running only a single taks at a time. But than occured an increasing errorrate within two or three days.

The use of the oldy GTX 550 Ti is only, to bridge the gap until the new generation of midclass AMD GPUs with 7nm structure width come onto market. I assume, this will give a good step forward in crunching efficency. The same as the new generation of CPUs form AMD with 7nm structure width announced for this year.

So, it was running, but why not now more?

Kind regards and happy crunching

Martin

archae86

Joined: 6 Dec 05

Posts: 3157

Credit: 7226778261

RAC: 1073089

astro-marwil

10 May 2019 14:20:50 UTC

Message 171256 in response to message 171246

(moderation:

)

astro-marwil wrote:

read_checkpoint(): Couldn't open file 'LATeah1049Y_156.0_0_0.0_24311686_1_0.out.cpt': No such file or directory (2)

Perfectly normal. When the application starts up, it looks for a checkpoint file in case you partially ran the same WU earlier, so efficiency can be gained by starting from the checkpoint, rather than starting from the beginning. In this case it is a first run, so no checkpoint. This notation is not a symptom of your problem.

Quote:

The nvidia-gpu application uses much more the cpu (about 23%) while 100% on gpu, whereas the amd-gpu application does use only 2.5% cpu while using 98% gpu, but having about 3 times higher output. So lots of crunching on nvidia-gpu application still takes place on the cpu ...

Nope. The current Einstein Nvidia application is built on an opencl platform under which it uses a CPU polling loop to handle GPU requests of the CPU (as opposed to interrupts...). So if your system is not overloaded, you'll find the CPU support application consumes an entire CPU core--but it is doing a negligible amount of computation.

A previous Einstein Nvidia application built on a CUDA platform used a different communication method, and thus far less CPU.

If your system were working, you'd find it more productive if you reduced the allowed tasks enough so that the Einstein support task got very nearly 100% of a CPU.

One other thing: once the day times out and you get new tasks, I suggest you suspend nearly all of them. That way you can try some hoped-for fix on just a task or two, without burning up your entire stock on rapid fails.

As to possible fixes, the list is long, and not specific in any way to Einstein. Gary Roberts reports that often replacing a power supply can remedy such trouble. I usually suggest that people back down the core clock rate on their GPU card by 10% to see if that has an effect. It is possible your GPU card has chosen this moment to fail. Something else in your system may have failed.

Good luck.

Limited number of taks per day ?

Forums › Problems and Bug Reports

Comment viewing options

Forums › Problems and Bug Reports