Results showing "Aborted by user"

Michael Robertson

Joined: 5 Nov 12

Posts: 18

Credit: 89478168

RAC: 0

26 Apr 2015 17:45:50 UTC

Topic 198066

(moderation:

)

Apologies if this is covered elsewhere and I have missed it.

Since upgrading to the newest BOINC client, the results from my main crunching box are all showing as "Aborted by user." I have changed no other settings either to the software or on the machine, and needless to say I have not manually aborted any of them.

Is this a known error/behavior? Does anyone have any troubleshooting suggestions?

Richard Haselgrove

Joined: 10 Dec 05

Posts: 2143

Credit: 2992796135

RAC: 709988

Results showing "Aborted by user"

26 Apr 2015 20:01:01 UTC

Message 132313

(moderation:

)

This is actually a mis-reported error message, due to the old version of the website server code in use at this project.

The individual tasks each show "Exit status 200 (0xc8)", and looking at a list of BOINC exit codes rather newer than this site displays, that is

#define EXIT_UNSTARTED_LATE 200

- in other words, your computer hadn't even started running the tasks before the deadline was reached. (You can check that the supposed 'abort' happened a few seconds after the deadline for each task.)

I'm afraid you'll have to look at BOINC Manager locally, especially the Event Log, to see if you can find a reason why the tasks aren't even being started - there's no clue in the task reports I can see.

Michael Robertson

Joined: 5 Nov 12

Posts: 18

Credit: 89478168

RAC: 0

Thank you for the info...but

26 Apr 2015 22:08:40 UTC

Message 132314 in response to message 132313

(moderation:

)

Thank you for the info...but that's rather odd, since the machine has been siting there cranking out work units the entire time. There was at one point a rather large backlog of completed results to be submitted, but a simple restart remedied that.

I'll dig around and see what I can find. If nothing obvious presents itself, I'll simply reinstall and see if that takes care of it.

mikey

Joined: 22 Jan 05

Posts: 12833

Credit: 1884143453

RAC: 929814

RE: Thank you for the

27 Apr 2015 10:37:15 UTC

Message 132315 in response to message 132314

(moderation:

)

Quote:

Thank you for the info...but that's rather odd, since the machine has been siting there cranking out work units the entire time. There was at one point a rather large backlog of completed results to be submitted, but a simple restart remedied that.

I'll dig around and see what I can find. If nothing obvious presents itself, I'll simply reinstall and see if that takes care of it.

Could your cache be too big with the latest units? You are taking about 5 days to return your units and some of the latest units seem to have short deadlines, ie:
25 Apr 2015 23:31:45 UTC 2 May 2015 23:31:45 UTC

where as other units have deadlines like:
25 Apr 2015 23:31:45 UTC 9 May 2015 23:31:45 UTC

It looks like the gpu units have shorter deadlines than the cpu units, which could be a problem since Boinc doesn't differentiate between them when it comes to the cache.

Michael Robertson

Joined: 5 Nov 12

Posts: 18

Credit: 89478168

RAC: 0

I doubt that is the issue, as

27 Apr 2015 12:27:11 UTC

Message 132316 in response to message 132315

(moderation:

)

I doubt that is the issue, as this machine has been crunching in the vicinity of a100,000 pt/day clip since I added it...right up until the newest update of BOINC.

I am still trying to determine the cause of the problem. I uninstalled and reinstalled the latest version, and though the client claimed work units were processing--as it has since the update--a further check of CPU activity showed that no actual work was being done. I then uninstalled again, deleted all data, reinstalled the previous version of BOINC, and things are back to normal.

When I have some time I will upgrade again and see if the problems repeat. Perhaps it's an issue with BOINC.

Moldr

Joined: 3 Apr 15

Posts: 11

Credit: 145119

RAC: 0

Have you updated to Yosemite

27 Apr 2015 12:45:53 UTC

Message 132317 in response to message 132316

(moderation:

)

Have you updated to Yosemite recently? There have been problems with GPU tasks.

Quote:

For now we limited the OSX OpenCL ATI app versions (i.e. plan classes) for BRP4G and BRP6 to below Yosemite. Beta app versions are available for both apps for testing, so if you want to try you may enable Beta test App versions in your E@H preferences - at your own risk.

BM

This was posted in this thread, http://einsteinathome.org/node/198054. I haven't had any luck with the beta versions either. So I'm just running CPU tasks for now.

What version of BOINC are you running?

Michael Robertson

Joined: 5 Nov 12

Posts: 18

Credit: 89478168

RAC: 0

That particular machine has

27 Apr 2015 17:14:56 UTC

Message 132318 in response to message 132317

(moderation:

)

That particular machine has been running Yosemite since the day I added it, and the newest version of BOINC was not processing either CPU or GPU tasks. Prior to the BOINC update, the two GPUs were responsible for the majority of my WUs.

I don't have access to that box at the moment, but the BOINC versIons I'm referencing are the last two releases--revisions 36 and 42, maybe? Something like that.

WB8ILI

Joined: 20 Feb 05

Posts: 45

Credit: 1182065369

RAC: 2070349

Michael - Look again at

28 Apr 2015 12:19:43 UTC

Message 132319

(moderation:

)

Michael -

Look again at your cache size (mentioned earlier). I have had your issue also.

Einstein can, and does, download more work than can be accomplished before the deadlines.

A simple example - you have your cache set at 10 days. Each task takes 24 hours - so you want, and have, 10 days of work. But, if 8 of the tasks have due dates only 6 days from now, all of the work can't possibly be accomplished before the due dates. BOINC will abort some of your tasks when it realizes the problem.

Richard Haselgrove

Joined: 10 Dec 05

Posts: 2143

Credit: 2992796135

RAC: 709988

I didn't work my way all the

28 Apr 2015 12:48:29 UTC

Message 132320

(moderation:

)

I didn't work my way all the back to the start of the problem, but if tasks stopped running at all after one or other upgrade...

The first task will have run for hours and hours, spinning its wheels in the sand. All tasks backed up behind it will get later and later and later, because the stalled one will block everything up for much longer than the estimated runtime. If the cache was anywhere near full at the time, the tail-end charlies won't have a chance of making it in time.

archae86

Joined: 6 Dec 05

Posts: 3161

Credit: 7315105022

RAC: 2301819

Richard Haselgrove wrote:if

28 Apr 2015 15:18:20 UTC

Message 132321 in response to message 132320

(moderation:

)

Richard Haselgrove wrote:

if tasks stopped running at all after one or other upgrade...

The first task will have run for hours and hours, spinning its wheels in the sand...

the tail-end charlies won't have a chance of making it in time.

There is an extra-special contribution from a single super-long duration task (however it happens). The estimated time for all the remaining tasks gets boosted as soon as the unusually long-completing task finishes, not by a small averaged-in adjustment, but to the full effect of the single slow observation.

Recovery begins as soon as the first task with normal completion time is done, but for "faster than currently predicted" completion the programming responds intentionally slowly, whereas if something is slower than expected by an appreciable margin (don't know the current definition of appreciable, but maybe something like 20%), then the new prediction is bumped all the way up.

Details aside, this sort of thing is part of the risk profile of large work queues, and one of the reasons why it is especially wise to work down one's queue (by suspending fetch) before making changes.

Richard Haselgrove

Joined: 10 Dec 05

Posts: 2143

Credit: 2992796135

RAC: 709988

RE: Richard Haselgrove

28 Apr 2015 16:07:54 UTC

Message 132322 in response to message 132321

(moderation:

)

Quote:

Richard Haselgrove wrote:
if tasks stopped running at all after one or other upgrade...

The first task will have run for hours and hours, spinning its wheels in the sand...

the tail-end charlies won't have a chance of making it in time.

There is an extra-special contribution from a single super-long duration task (however it happens). The estimated time for all the remaining tasks gets boosted as soon as the unusually long-completing task finishes, not by a small averaged-in adjustment, but to the full effect of the single slow observation.

Recovery begins as soon as the first task with normal completion time is done, but for "faster than currently predicted" completion the programming responds intentionally slowly, whereas if something is slower than expected by an appreciable margin (don't know the current definition of appreciable, but maybe something like 20%), then the new prediction is bumped all the way up.

Details aside, this sort of thing is part of the risk profile of large work queues, and one of the reasons why it is especially wise to work down one's queue (by suspending fetch) before making changes.

That's certainly the design of DCF, as still in use at this project. But IIRC, DCF only updates on successful task completion, so that wheel-spinner wouldn't trigger that characteristic sawtooth uptick in runtime estimates for the following tasks, when - I suspect - it's killed by BOINC for 'maximum time exceeded'. But I suspect we all need a refresher course in how BOINC used to work (and how it works now, which is quite different) - preferably with a working example open on screen in front of us.

And I agree with keeping a modest cache size, not only when planning upgrades or other changes.

Edit - see, for example, task 495024967 from the OP (they're easier to find now that most have been purged)

Outcome	Client error
Client state	Compute error
Exit status	197 (0xc5)
Computer ID	11704272
Report deadline	4 May 2015 3:51:50 UTC
Run time	83,263.30
CPU time	25,913.31

Exit 197 is indeed EXIT_TIME_LIMIT_EXCEEDED, from the list I posted in the adjacent thread about EXIT_DISK_LIMIT_EXCEEDED (196).

Results showing "Aborted by user"

Forums › Problems and Bug Reports

Comment viewing options

Forums › Problems and Bug Reports