Results showing "Aborted by user"

Michael Robertson
Michael Robertson
Joined: 5 Nov 12
Posts: 18
Credit: 89,478,168
RAC: 0
Topic 198066

Apologies if this is covered elsewhere and I have missed it.

Since upgrading to the newest BOINC client, the results from my main crunching box are all showing as "Aborted by user." I have changed no other settings either to the software or on the machine, and needless to say I have not manually aborted any of them.

Is this a known error/behavior? Does anyone have any troubleshooting suggestions?

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 2,059
Credit: 961,316,522
RAC: 1,476,345

Results showing "Aborted by user"

This is actually a mis-reported error message, due to the old version of the website server code in use at this project.

The individual tasks each show "Exit status 200 (0xc8)", and looking at a list of BOINC exit codes rather newer than this site displays, that is

#define EXIT_UNSTARTED_LATE 200

- in other words, your computer hadn't even started running the tasks before the deadline was reached. (You can check that the supposed 'abort' happened a few seconds after the deadline for each task.)

I'm afraid you'll have to look at BOINC Manager locally, especially the Event Log, to see if you can find a reason why the tasks aren't even being started - there's no clue in the task reports I can see.

Michael Robertson
Michael Robertson
Joined: 5 Nov 12
Posts: 18
Credit: 89,478,168
RAC: 0

Thank you for the info...but

Thank you for the info...but that's rather odd, since the machine has been siting there cranking out work units the entire time. There was at one point a rather large backlog of completed results to be submitted, but a simple restart remedied that.

I'll dig around and see what I can find. If nothing obvious presents itself, I'll simply reinstall and see if that takes care of it.

mikey
mikey
Joined: 22 Jan 05
Posts: 8,422
Credit: 642,277,383
RAC: 177,412

RE: Thank you for the

Quote:

Thank you for the info...but that's rather odd, since the machine has been siting there cranking out work units the entire time. There was at one point a rather large backlog of completed results to be submitted, but a simple restart remedied that.

I'll dig around and see what I can find. If nothing obvious presents itself, I'll simply reinstall and see if that takes care of it.

Could your cache be too big with the latest units? You are taking about 5 days to return your units and some of the latest units seem to have short deadlines, ie:
25 Apr 2015 23:31:45 UTC 2 May 2015 23:31:45 UTC

where as other units have deadlines like:
25 Apr 2015 23:31:45 UTC 9 May 2015 23:31:45 UTC

It looks like the gpu units have shorter deadlines than the cpu units, which could be a problem since Boinc doesn't differentiate between them when it comes to the cache.

Michael Robertson
Michael Robertson
Joined: 5 Nov 12
Posts: 18
Credit: 89,478,168
RAC: 0

I doubt that is the issue, as

I doubt that is the issue, as this machine has been crunching in the vicinity of a100,000 pt/day clip since I added it...right up until the newest update of BOINC.

I am still trying to determine the cause of the problem. I uninstalled and reinstalled the latest version, and though the client claimed work units were processing--as it has since the update--a further check of CPU activity showed that no actual work was being done. I then uninstalled again, deleted all data, reinstalled the previous version of BOINC, and things are back to normal.

When I have some time I will upgrade again and see if the problems repeat. Perhaps it's an issue with BOINC.

Moldr
Moldr
Joined: 3 Apr 15
Posts: 11
Credit: 145,119
RAC: 0

Have you updated to Yosemite

Have you updated to Yosemite recently? There have been problems with GPU tasks.

Quote:

For now we limited the OSX OpenCL ATI app versions (i.e. plan classes) for BRP4G and BRP6 to below Yosemite. Beta app versions are available for both apps for testing, so if you want to try you may enable Beta test App versions in your E@H preferences - at your own risk.

BM

This was posted in this thread, http://einsteinathome.org/node/198054. I haven't had any luck with the beta versions either. So I'm just running CPU tasks for now.

What version of BOINC are you running?

Michael Robertson
Michael Robertson
Joined: 5 Nov 12
Posts: 18
Credit: 89,478,168
RAC: 0

That particular machine has

That particular machine has been running Yosemite since the day I added it, and the newest version of BOINC was not processing either CPU or GPU tasks. Prior to the BOINC update, the two GPUs were responsible for the majority of my WUs.

I don't have access to that box at the moment, but the BOINC versIons I'm referencing are the last two releases--revisions 36 and 42, maybe? Something like that.

WB8ILI
WB8ILI
Joined: 20 Feb 05
Posts: 45
Credit: 607,612,345
RAC: 361,660

Michael - Look again at

Michael -

Look again at your cache size (mentioned earlier). I have had your issue also.

Einstein can, and does, download more work than can be accomplished before the deadlines.

A simple example - you have your cache set at 10 days. Each task takes 24 hours - so you want, and have, 10 days of work. But, if 8 of the tasks have due dates only 6 days from now, all of the work can't possibly be accomplished before the due dates. BOINC will abort some of your tasks when it realizes the problem.

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 2,059
Credit: 961,316,522
RAC: 1,476,345

I didn't work my way all the

I didn't work my way all the back to the start of the problem, but if tasks stopped running at all after one or other upgrade...

The first task will have run for hours and hours, spinning its wheels in the sand. All tasks backed up behind it will get later and later and later, because the stalled one will block everything up for much longer than the estimated runtime. If the cache was anywhere near full at the time, the tail-end charlies won't have a chance of making it in time.

archae86
archae86
Joined: 6 Dec 05
Posts: 3,064
Credit: 5,771,833,717
RAC: 3,899,116

Richard Haselgrove wrote:if

Richard Haselgrove wrote:

if tasks stopped running at all after one or other upgrade...

The first task will have run for hours and hours, spinning its wheels in the sand...

the tail-end charlies won't have a chance of making it in time.


There is an extra-special contribution from a single super-long duration task (however it happens). The estimated time for all the remaining tasks gets boosted as soon as the unusually long-completing task finishes, not by a small averaged-in adjustment, but to the full effect of the single slow observation.

Recovery begins as soon as the first task with normal completion time is done, but for "faster than currently predicted" completion the programming responds intentionally slowly, whereas if something is slower than expected by an appreciable margin (don't know the current definition of appreciable, but maybe something like 20%), then the new prediction is bumped all the way up.

Details aside, this sort of thing is part of the risk profile of large work queues, and one of the reasons why it is especially wise to work down one's queue (by suspending fetch) before making changes.

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 2,059
Credit: 961,316,522
RAC: 1,476,345

RE: Richard Haselgrove

Quote:
Richard Haselgrove wrote:

if tasks stopped running at all after one or other upgrade...

The first task will have run for hours and hours, spinning its wheels in the sand...

the tail-end charlies won't have a chance of making it in time.


There is an extra-special contribution from a single super-long duration task (however it happens). The estimated time for all the remaining tasks gets boosted as soon as the unusually long-completing task finishes, not by a small averaged-in adjustment, but to the full effect of the single slow observation.

Recovery begins as soon as the first task with normal completion time is done, but for "faster than currently predicted" completion the programming responds intentionally slowly, whereas if something is slower than expected by an appreciable margin (don't know the current definition of appreciable, but maybe something like 20%), then the new prediction is bumped all the way up.

Details aside, this sort of thing is part of the risk profile of large work queues, and one of the reasons why it is especially wise to work down one's queue (by suspending fetch) before making changes.


That's certainly the design of DCF, as still in use at this project. But IIRC, DCF only updates on successful task completion, so that wheel-spinner wouldn't trigger that characteristic sawtooth uptick in runtime estimates for the following tasks, when - I suspect - it's killed by BOINC for 'maximum time exceeded'. But I suspect we all need a refresher course in how BOINC used to work (and how it works now, which is quite different) - preferably with a working example open on screen in front of us.

And I agree with keeping a modest cache size, not only when planning upgrades or other changes.

Edit - see, for example, task 495024967 from the OP (they're easier to find now that most have been purged)

Outcome	Client error
Client state	Compute error
Exit status	197 (0xc5)
Computer ID	11704272
Report deadline	4 May 2015 3:51:50 UTC
Run time	83,263.30
CPU time	25,913.31


Exit 197 is indeed EXIT_TIME_LIMIT_EXCEEDED, from the list I posted in the adjacent thread about EXIT_DISK_LIMIT_EXCEEDED (196).

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.