Error while computing

George

Joined: 25 Mar 14

Posts: 3

Credit: 1692559

RAC: 0

18 Oct 2019 16:40:56 UTC

Topic 219798

(moderation:

)

I have had several task lately where the task showed it was 100% completed, yet the time went on and on and eventually it stated there was an error while computing. What is causing this?

solling2

Joined: 20 Nov 14

Posts: 219

Credit: 1578367945

RAC: 17752

George schrieb:I have had

18 Oct 2019 17:38:56 UTC

Message 173930

(moderation:

)

George wrote:

I have had several task lately where the task showed it was 100% completed, yet the time went on and on and eventually it stated there was an error while computing. What is causing this?

Hi George, while I can't comment on any details why your tasks errored out, it seems to me all of them occured on your laptop when running on your internal Intel GPU. Now that is generally known to have a high error rate, possibly due to the driver having to deal with openCl apps. I'd try to avoid those apps by unselecting 'Use Intel GPU' in the account, preferences, project, resource settings. (Or by selecting 'Gamma-ray pulsar search #5' only in the Applications section.)

Also, when running tasks on laptops, it generally seems to be a good idea to monitor temperatures closely. :-)

archae86

Joined: 6 Dec 05

Posts: 3162

Credit: 7319095021

RAC: 2307378

The stderr listing on your

18 Oct 2019 17:45:54 UTC

Message 173932

(moderation:

)

The stderr listing on your host task results web page mentions this reason:

"exceeded elapsed time limit 24636.24 (350000.00G/14.21G)"

It lists the resource being used for the computation as:

"Using OpenCL device "Intel(R) UHD Graphics 620" by: Intel(R) Corporation"

While I own more than one PC with an Intel CPU that includes graphics, I gave up trying to use that resource for Einstein computation years ago, so I can't help you with current issues. Your system listing shows your CPU as "i5-8250U". Perhaps someone else here can advise you whether properly configured this one can usefully perform any Einstein tasks.

Betreger

Joined: 25 Feb 05

Posts: 992

Credit: 1626096205

RAC: 803995

Ah, the sorry saga with Intel

18 Oct 2019 18:20:54 UTC

Message 173933

(moderation:

)

Ah, the sorry saga with Intel I GPUs continues.

George

Joined: 25 Mar 14

Posts: 3

Credit: 1692559

RAC: 0

Thank you all for your

18 Oct 2019 20:00:17 UTC

Message 173936

(moderation:

)

Thank you all for your comments so far. Interestingly, Einstein@home also runs on my Android notepad, has been for years, no issues at all.

Gary Roberts

Moderator

Joined: 9 Feb 05

Posts: 5877

Credit: 118755576612

RAC: 21113507

George wrote:...

18 Oct 2019 23:14:00 UTC

Message 173938 in response to message 173936

(moderation:

)

George wrote:

... Einstein@home also runs on my Android notepad, has been for years, no issues at all.

A totally different application. The work content of those tasks is quite small by comparison - the relative credits awarded give you a huge clue, 62 as compared to 3,465.

In principle, I agree with previous comments about difficulties with Intel GPU use. Your host shows as having a 1.6GHz CPU with 8 processors (a quad core CPU with HT enabled giving 8 threads). The reason for the default speed to be so low is strictly to minimise heat generation. Essentially, the manufacturer is taking severe measures to restrict heat, rather than designing a better cooling solution that could better cope with a larger heat load. Please realise that you may need to watch temperatures very carefully when you use such a device for crunching.

Your profile shows that you support quite a range of different projects. If you run CPU tasks from other projects on the available CPU threads, I'm not surprised that the Einstein GPU tasks can't complete within the allowed time limit. There is a completed and validated FGRPB1G task in your current list that took ~9700 secs. To me, that indicates the Intel GPU can do the job under certain circumstances. I imagine those circumstances would have been a very much reduced load (prehaps even zero load) from other competing CPU tasks. Remember that your internal Intel GPU is part of the CPU and will be affected by CPU load.

The take home message is that you need to experiment with a mix of tasks in order to find what will work satisfactorily within the limitations of your hardware. I would start off by confirming the one good GPU result you have. Just suspend all CPU tasks to see if you get similar or perhaps better GPU task times. Then you could gradually introduce CPU tasks, one at a time, to see what effect that has on the GPU crunch time. Don't increase the CPU load until you are really confident that the GPU times aren't being adversely affected and that the heat load is still manageable.

While doing all this you should keep your work cache size within strict limits until you are confident that your machine can handle it. At the moment, you seem to have too many tasks on hand, particularly if you are seeing tasks that take so long that they exceed their allowed time limits.

Good luck with your experiments. If you need more assistance, please mention the type and number of concurrent CPU tasks you would like to run. It will very likely be problematic to load up anything like all available threads.

EDIT: I just had a closer look at all the tasks in your tasks list. I had originally seen the 1 valid gamma-ray pulsar task (FGRPB1G GPU task) and had assumed the earlier failed tasks (Time Limit Exceeded) had also been FGRPB1G tasks. I now see they were actually Arecibo GPU tasks. I should have looked more closely. Sorry about that mistake.

I see that you have aborted a bunch of those as well. It looks like you should opt out of that particular search for this particular host (put each of your hosts in different locations and adjust the prefs accordingly) so that you don't continue to get them. It looks like you may be OK with the FGRPB1G tasks if you can repeat the crunch times and they continue to validate. You should still test things to make sure your machine can cope with the load.

Cheers,
Gary.

George

Joined: 25 Mar 14

Posts: 3

Credit: 1692559

RAC: 0

Thank you Gary, that is a

19 Oct 2019 15:52:39 UTC

Message 173946

(moderation:

)

Thank you Gary, that is a very detailed explanation. I now understand way more than I did before. I will experiment and see what happens.

Oliver

Joined: 22 Jul 05

Posts: 6

Credit: 918813

RAC: 0

Linux Mint 19.2, AMD (x64),

23 Oct 2019 16:01:51 UTC

Message 173989

(moderation:

)

Linux Mint 19.2, AMD (x64), re-installed BOINC a few weeks back, and Milkyway, SETI and Asteroids all run fine. Yet, E@H stats are totally flat, even though it appears to be churning well. Also, I see that BOINCstats does not put E@H stats in gross total,even though I have 647k results. I see on this site that every thing since early October was reported as "error while computing."

Ideas? Thanks

Holmis

Joined: 4 Jan 05

Posts: 1118

Credit: 1055935564

RAC: 0

Oliver wrote:Linux Mint 19.2,

23 Oct 2019 17:03:15 UTC

Message 173990 in response to message 173989

(moderation:

)

Oliver wrote:

Linux Mint 19.2, AMD (x64), re-installed BOINC a few weeks back, and Milkyway, SETI and Asteroids all run fine. Yet, E@H stats are totally flat, even though it appears to be churning well. Also, I see that BOINCstats does not put E@H stats in gross total,even though I have 647k results. I see on this site that every thing since early October was reported as "error while computing."

Ideas? Thanks

When you reinstalled Boinc did you also "install" Einstein@home from the repository?
The reason I ask is because you're running an anonymous platform app, ie you supplied the app your self instead of Boinc downloading the correct application from Einstein@home as is customary. That app is not completing tasks as it should and you're trying to run tasks from a search aimed for android devices and single board computers.

To fix this you need to either uninstall whatever you installed together with Boinc or you need to go to the Boinc data directory and then to /projects/einstein.phys.uwm.edu and in that directory there will be a file called app_info.xml. Stop Boinc completely and then delete that file. When you restart Boinc go to the task tab and abort all tasks from Einstein@home, then go to the projects tab and highlight Einstein@home and click the update button. Boinc should then download new tasks from Einstein@home (if the cache isn't full) and the applications to run them.
Check your project preferences to select what searches to run tasks from.

As for stats check your privacy settings and make sure "Do you consent to exporting your data to BOINC statistics aggregation Web sites?" is set to Yes.

Gary Roberts

Moderator

Joined: 9 Feb 05

Posts: 5877

Credit: 118755576612

RAC: 21113507

Oliver wrote:... I see on

23 Oct 2019 21:55:32 UTC

Message 173992 in response to message 173989

(moderation:

)

Oliver wrote:

... I see on this site that every thing since early October was reported as "error while computing."

If you choose one of your failed tasks and click on the task ID link for it, you can see some information about why the task failed. Here is a link to one such task. Scroll down to the bottom and check the error details given. Here is a small excerpt.

[01:12:33][31474][INFO ] Data processing finished successfully!
*** stack smashing detected ***: <unknown> terminated

[01:12:33][31474][ERROR] Application caught signal 6.

The term "stack smashing" refers to a buffer overflow condition in the program you are running. Google the term if you want more information about that.

It seems that computations had finished successfully and perhaps the buffer overflow occurred while writing out the final results. Earlier on, it showed that you are deliberately running an app that doesn't create checkpoints. That seems a bit weird. You stand to lose all progress if you stop and restart crunching at any point, so why do that? Maybe you wouldn't have wasted so much crunch time if the stack smashing problem had shown up much earlier - for example when attempting to write the initial checkpoint. This is all just conjecture - you need to talk to whoever provided the anonymous platform app you are using.

Check the backtrace that follows the above error message. Notice the references to libpthread and libc. If you are using a fresh install of the OS with updated versions of build tools and libs, perhaps you just need a recompiled version of the app that is compatible with the new runtime environment. I'm not a programmer so this is just a guess on my part.

Cheers,
Gary.

Oliver

Joined: 22 Jul 05

Posts: 6

Credit: 918813

RAC: 0

Greetings, oh knowledgeable

26 Oct 2019 0:53:00 UTC

Message 174045

(moderation:

)

Greetings, oh knowledgeable ones.

So, I uninstalled the E@H from software manager, then used terminal to delete the E@H-specific files and folders, reset the project, and it apparently downloaded from this site the right software. So, it's churning again, and the site shows valid results. We'll see about credit.

Thanks for the help.

Error while computing

Forums › Cruncher's Corner

Comment viewing options

Forums › Cruncher's Corner