Redundant Result Cancellation

Alinator
Alinator
Joined: 8 May 05
Posts: 927
Credit: 9352143
RAC: 0
Topic 197046

It's been a long time since I had a scenario like this crop up.

In this case, Computer 3677775 missed the deadline by about a day and a half, and a new result was spawned and sent out before it reported.

However, the host which got the resend has a monster cache of EAH and is running an Average Turnaround Time of 14.04 days at this point. In fact I'm not sure it will even get to this task before it runs over deadline, although it has been plowing through them and reporting completed tasks on a regular basis. I'm not even gonna ask why a host so 'close to the edge' is getting new work assigned, but it sure makes a case for why a 10 day cache setting can be a bad idea. ;-)

So the question is, I thought EAH had RRC enabled to deal with this kind situation, and if it is why hasn't this task gotten cancelled?

Pooh Bear 27
Pooh Bear 27
Joined: 20 Mar 05
Posts: 1376
Credit: 20312671
RAC: 0

Redundant Result Cancellation

If the unit had any crunch time at all on the machine it was sent to, it will not cancel it. Only if it has never started can it be rejected, otherwise people would complain that they had work time wasted.

Alinator
Alinator
Joined: 8 May 05
Posts: 927
Credit: 9352143
RAC: 0

Yes, I know that. However,

Yes, I know that.

However, if you look at the host and WU in question there was about 100 hundred or so ahead of it in the queue when I checked it earlier this morning. In addition, the task been sitting in the queue since the 29th of June. So the odds it has any runtime on it are pretty low, as well as the owner realizes the host is in a 'jam' and has been culling out tasks which are at or near deadline for the last several days since I started observing this WU. IOW's I'm almost 100% sure there have been opportunities where it could have been sent a '221'.

Also, even though it's rarely used, projects do have the ability to send an unconditional abort command to the hosts, regardless of whether there would be any complaining about it.

paul milton
paul milton
Joined: 16 Sep 05
Posts: 329
Credit: 35825044
RAC: 0

RE: (..snip..) Also, even

Quote:

(..snip..)

Also, even though it's rarely used, projects do have the ability to send an unconditional abort command to the hosts, regardless of whether there would be any complaining about it.

Einstein doesn't. they tried it, it caused to much DB load
see here Message 125267

seeing without seeing is something the blind learn to do, and seeing beyond vision can be a gift.

Alinator
Alinator
Joined: 8 May 05
Posts: 927
Credit: 9352143
RAC: 0

LOL... Yes, I remember

LOL...

Yes, I remember reading that. I guess it just goes to show when you want to use a 'big stick' you need to make sure you don't hit yourself in the head with it.

Although, it wasn't clear whether Bernd was referring to an Unconditional Abort or the 221 Conditional Abort. I assumed it was the Unconditional Abort he was talking about there.

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 2143
Credit: 2921554623
RAC: 946128

RE: Yes, I know

Quote:

Yes, I know that.

However, if you look at the host and WU in question there was about 100 hundred or so ahead of it in the queue when I checked it earlier this morning. In addition, the task been sitting in the queue since the 29th of June. So the odds it has any runtime on it are pretty low, as well as the owner realizes the host is in a 'jam' and has been culling out tasks which are at or near deadline for the last several days since I started observing this WU. IOW's I'm almost 100% sure there have been opportunities where it could have been sent a '221'.

Also, even though it's rarely used, projects do have the ability to send an unconditional abort command to the hosts, regardless of whether there would be any complaining about it.


Actually, I don't think the user has been culling tasks.

If you're referring to the task details

Client state	Aborted by user
Exit status	200 (0xc8)


that's an unfortunate incompatibility between an updated set of exit codes and the elderly web rendering code used by this project.

Referring to http://boinc.berkeley.edu/trac/browser/boinc-v2/lib/error_numbers.h, the current meaning of exit code 200 is 'EXIT_UNSTARTED_LATE' - in other words, the BOINC client aborted the task automatically when the deadline passed. Active culling by the user would get exit status 203, 'EXIT_ABORTED_VIA_GUI'.

Alinator
Alinator
Joined: 8 May 05
Posts: 927
Credit: 9352143
RAC: 0

LOL... Oh well, I was

LOL...

Oh well, I was hoping there was a pilot at the controls...

UGHHHHHH....

I guess I'm really going to have to plow through all that source code boilerplate (or schedule a root canal)! :-D

Why can't those codes be in the WIKI? Oh... silly me, that kind of info is much to confusing and dangerous for mere users to know anything about! ;-)

So I guess that means both project initiated cases would show as a '202'.

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 2143
Credit: 2921554623
RAC: 946128

RE: LOL... Oh well, I was

Quote:

LOL...

Oh well, I was hoping there was a pilot at the controls...

UGHHHHHH....

I guess I'm really going to have to plow through all that source code boilerplate (or schedule a root canal)! :-D

Why can't those codes be in the WIKI? Oh... silly me, that kind of info is much to confusing and dangerous for mere users to know anything about! ;-)

So I guess that means both project initiated cases would show as a '202'.


Yes - we actually had a few examples of '202' outcomes in the thread which started me looking into this code - Status 'Cancelled by server' changed

Alinator
Alinator
Joined: 8 May 05
Posts: 927
Credit: 9352143
RAC: 0

RE: Yes - we actually had

Quote:

Yes - we actually had a few examples of '202' outcomes in the thread which started me looking into this code - Status 'Cancelled by server' changed

LOL...

Good thing I've had more free time open up lately.

I was 'chasing the links' you posted and discovered the reason one of my other hosts' all of a sudden stopped DL'ing EAH work because of Disk Space Exceeded.

At the time I just said, "OK, increase the limit and be done with it!", but while reading Gary's thesis from a couple of years ago about "End of Science Run v. Locality Scheduling v. WUG v. Bandwidth Burned v. A Whole Plethora of Other Interesting Phenomena" dawn broke on Marblehead what the story was about that!

Now I'm dusting off my list of BOINC pet peeves again with an eye to getting to the bottom of them once and for all! ;-)

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.