Limited number of taks per day ?

astro-marwil
astro-marwil
Joined: 28 May 05
Posts: 532
Credit: 650066543
RAC: 1118134
Topic 218831

Hallo !

Is there a new limitation in the number of tasks of an application availeble? I´m crunching at now not more than 12 tasks per day. Or what´s going on there? The workgenerator is allready running.

09.05.2019 07:23:35 | Einstein@Home | Sending scheduler request: To fetch work.
09.05.2019 07:23:35 | Einstein@Home | Requesting new tasks for CPU and NVIDIA GPU
09.05.2019 07:23:37 | Einstein@Home | Scheduler request completed: got 0 new tasks
09.05.2019 07:23:37 | Einstein@Home | No work sent
09.05.2019 07:23:37 | Einstein@Home | No work is available for Gamma-ray pulsar binary search #1 on GPUs
09.05.2019 07:23:37 | Einstein@Home | (reached daily quota of 24 tasks)
09.05.2019 07:23:37 | Einstein@Home | Project has no jobs available

 

Kind regards an happy crunching

Martin

 

archae86
archae86
Joined: 6 Dec 05
Posts: 3157
Credit: 7226888260
RAC: 1072760

The limitation is not new. 

The limitation is not new.  What is new is your huge error rate.  In a short period of time on May 9 your host reported more than twenty tasks with "Error while computing" status.

Errors lower your daily task quota.  Successful returns raise it, rapidly.  Fix the error problem and the task limitation will swiftly cease to be lower than before, which obviously did not trouble you.

astro-marwil
astro-marwil
Joined: 28 May 05
Posts: 532
Credit: 650066543
RAC: 1118134

Hallo Archea86!Thanks for

Hallo Archea86!

Thanks for your quick answer.

These error rate wasn´t realy an error rate, but I canceld by my hands indeed lots of tasks, as my grafic card AMD R7790 failed. And this offcourse was about a week ago. Since I installed an oldish grafic card  gtx 550ti, i got for several days sufficent new tasks, up to now.

So, I assume there is also some others.

Kind regards an happy crunching

Martin

archae86
archae86
Joined: 6 Dec 05
Posts: 3157
Credit: 7226888260
RAC: 1072760

astro-marwil wrote:These

astro-marwil wrote:
These error rate wasn´t realy an error rate, but I canceld by my hands indeed lots of tasks, as my grafic card AMD R7790 failed. And this offcourse was about a week ago.

Urm... No.

The ones you cancelled a week ago show in the error task list as status Aborted with Time reported as 2 May 2019 23:23:48 UTC.

Those are not the ones I am talking about.  For example this one:

https://einsteinathome.org/task/852620484

The common features of these are that they show status in your web site task list as "Error while computing", run time of about 19 seconds, CPU time of about 5 seconds, and Time reported covering the range of 

9 May 2019 17:44:03 UTC through 9 May 2019 18:37:52 UTC densely, with a few others scattered as far back as May 6.

Please reread my initial response.  I meant it.  It applies to your situation.

Hint: A good way to look at recent troubled tasks is to go to the tasks list for your computer at the web site.  Then:

Click on the "Error" link at the top of the page, in order only to see errored tasks.

Click, twice, on the "Sent" column header.  This will sort the errored tasks from most recently sent to your machine on back to longer ago.  That will spare you looking at week old problems instead of current problems.

astro-marwil
astro-marwil
Joined: 28 May 05
Posts: 532
Credit: 650066543
RAC: 1118134

Hallo Archae86!Oh, indeed,

Hallo Archae86!

Oh, indeed, I´d overlook this, as the nummber of erroed task was deminishing over the days. I had a buffer of tasks for 5 days before.

This here is written etc. on the same computer as E@H crunching takes place. As far as I can see, everything runs well.

In the Stderr outputs I lokalized 3 common peculiarities, I have dificulties to explain and would be pleased to get your comments:

1)

<message>
Netzwerkzugriff verweigert.    --- network attach denied ----
 (0x41) - exit code 65 (0x41)</message>
<stderr_txt>
06:18:24 (28460): [normal]: This Einstein@home App was built at: May  8 2019 13:29:27

2)

read_checkpoint(): Couldn't open file 'LATeah1049Y_156.0_0_0.0_24311686_1_0.out.cpt': No such file or directory (2)

3)

Error during OpenCL bloc_info host->device transfer - candidates (error: -4)
ERROR: opencl_prepare_power_toplist() returned with error 20115560
06:18:29 (28460): [CRITICAL]: ERROR: MAIN() returned with error '1'
FPU status flags:  PRECISION
Error in OpenCL context: CL_MEM_OBJECT_ALLOCATION_FAILURE error executing CL_COMMAND_WRITE_BUFFER on GeForce GTX 550 Ti (Device 0).

------

1) and 2) may be linked, but for common use network works well, 3) seems to be completly different.

Meanwhile I performed a project reset in BOINC-manager/projects and downloaded and installed the newest graficscard driver from nvidia. Now I have to wait lengthy 17h before getting new chances.

The nvidia-gpu application uses much more the cpu (about 23%) while 100% on gpu, whereas the amd-gpu application does use only 2.5% cpu while using 98% gpu, but having about 3 times higher output. So lots of crunching on nvidia-gpu application still takes place on the cpu and the question is, whether the ALU on the cpu performes well. Do you have a testprogram for the ALU, you can recommend?

After nearly 14 years of almost continuesly crunching E@H, this is the first time for me having such problems.

Looking forward to your soon reply,

I remain with kind regards

Martin

 

MAGIC Quantum Mechanic
MAGIC Quantum M...
Joined: 18 Jan 05
Posts: 1889
Credit: 1412967816
RAC: 1199257

https://www.geforce.com/drive

https://www.geforce.com/drivers/results/132845

this is the newest driver for your card and the one that yours still says it has is from Oct 2017

you should be able to get new tasks in 24 hours but with that old 550Ti you should test it by only running a single task

and see if you get a complete Valid.

I still have my original 550Ti from back when they didn't need the CPU to run GPU's here but I retired it from running these tasks about a year ago and just run a 660Ti SC card and a Ryzen on another pc now.

 

MarkJ
MarkJ
Joined: 28 Feb 08
Posts: 437
Credit: 139002861
RAC: 0

[quote=astro-marwilError

Quote:

ERROR: opencl_prepare_power_toplist() returned with error 20115560

06:18:29 (28460): [CRITICAL]: ERROR: MAIN() returned with error '1'
FPU status flags:  PRECISION
Error in OpenCL context: CL_MEM_OBJECT_ALLOCATION_FAILURE error executing CL_COMMAND_WRITE_BUFFER on GeForce GTX 550 Ti (Device 0).

Is it really a GTX 550 Ti? They were made in 2011, I am surprised it’s still running. I suspect your issue is the card only has 1GB of memory and ran out of memory.
astro-marwil
astro-marwil
Joined: 28 May 05
Posts: 532
Credit: 650066543
RAC: 1118134

Hallo MAGIC QUANTUM and

Hallo MAGIC QUANTUM and MARKJ!

Thanks for your answer!

The peculiarity is, that it was crunching successfully doozens of tasks with the driver from 2017, which was installed by Win10. It´s running only a single taks at a time. But than occured an increasing errorrate within two or three days.

The use of the oldy GTX 550 Ti is only, to bridge the gap until the new generation of midclass AMD GPUs with 7nm structure width come onto market. I assume, this will give a good step forward in crunching efficency. The same as the new generation of CPUs form AMD with 7nm structure width announced for this year.

So, it was running, but why not now more?

Kind regards and happy crunching

Martin

archae86
archae86
Joined: 6 Dec 05
Posts: 3157
Credit: 7226888260
RAC: 1072760

astro-marwil

astro-marwil wrote:

read_checkpoint(): Couldn't open file 'LATeah1049Y_156.0_0_0.0_24311686_1_0.out.cpt': No such file or directory (2)

Perfectly normal. When the application starts up, it looks for a checkpoint file in case you partially ran the same WU earlier, so efficiency can be gained by starting from the checkpoint, rather than starting from the beginning. In this case it is a first run, so no checkpoint. This notation is not a symptom of your problem.

Quote:
The nvidia-gpu application uses much more the cpu (about 23%) while 100% on gpu, whereas the amd-gpu application does use only 2.5% cpu while using 98% gpu, but having about 3 times higher output. So lots of crunching on nvidia-gpu application still takes place on the cpu ... 

Nope. The current Einstein Nvidia application is built on an opencl platform under which it uses a CPU polling loop to handle GPU requests of the CPU (as opposed to interrupts...). So if your system is not overloaded, you'll find the CPU support application consumes an entire CPU core--but it is doing a negligible amount of computation.

A previous Einstein Nvidia application built on a CUDA platform used a different communication method, and thus far less CPU.

If your system were working, you'd find it more productive if you reduced the allowed tasks enough so that the Einstein support task got very nearly 100% of a CPU.

One other thing: once the day times out and you get new tasks, I suggest you suspend nearly all of them. That way you can try some hoped-for fix on just a task or two, without burning up your entire stock on rapid fails.

As to possible fixes, the list is long, and not specific in any way to Einstein. Gary Roberts reports that often replacing a power supply can remedy such trouble. I usually suggest that people back down the core clock rate on their GPU card by 10% to see if that has an effect. It is possible your GPU card has chosen this moment to fail. Something else in your system may have failed.

Good luck.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.