Black hole for tasks

Alicia, Dan, and Marv
Alicia, Dan, an...
Joined: 4 Jul 07
Posts: 7
Credit: 525867
RAC: 0
Topic 193248

I noticed that one of the tasks I completed about a month ago is still "pending" credit so I checked it out. It is assigned to computer 964318. In pulling the thread I find that computer 964318 has a huge pile of assigned tasks, no recent completions, last task completed about a month ago. Appears there's a problem with this machine. Owner is "Anonymous". Don't know if you or owner are aware of the situation.

DanNeely
DanNeely
Joined: 4 Sep 05
Posts: 1364
Credit: 3562358667
RAC: 109

Black hole for tasks

there's really not anything that the project can do. Eventually the WUs will time out and be reassigned, meanwhile all the fails will slowly reduce the number the busted machine gets daily to only 1.

th3
th3
Joined: 24 Aug 06
Posts: 208
Credit: 2208434
RAC: 0

Thats a funny coincidence, i

Thats a funny coincidence, i came across the exact same computer yesterday when checking an old pending R2 result. I concluded it was of little significance now since its CPU quota has reached 1 per day. Its still a good example of why a qouta of 72 WUs per CPU is too high and has been to high ever since S5R1 ended, that computer has collected 412 WUs, most of them just sitting there waiting for timeout. With a somewhat smaller quota theres still some room for beta-app testing and experimenting without too much danger of running out of quota (unless for those who have 5-10 days of work cache).

Bikeman (Heinz-Bernd Eggenstein)
Bikeman (Heinz-...
Moderator
Joined: 28 Aug 06
Posts: 3522
Credit: 690776380
RAC: 271517

RE: ... With a somewhat

Message 74730 in response to message 74729

Quote:
... With a somewhat smaller quota theres still some room for beta-app testing and experimenting without too much danger of running out of quota (unless for those who have 5-10 days of work cache).

A work cache of 5-10 days is nothing unusual, tho. I think the work quota was raised some time ago as a result of public demand, although work units must have been somewhat smaller then.

CU
Bikeman

th3
th3
Joined: 24 Aug 06
Posts: 208
Credit: 2208434
RAC: 0

yes, i was one of the public

yes, i was one of the public wanting higher quota during that time, the shortest WUs completed in around 20 minutes. But now, with shortest WUs taking 5 hours thats not an issue. Lets say a Penryn Extreme Edition can be clocked high enough to do it in 4 hours, thats still only 6 results per cpu per day, so a quota of 32 would be more now than 72 was back then.

DanNeely
DanNeely
Joined: 4 Sep 05
Posts: 1364
Credit: 3562358667
RAC: 109

People are pushing

People are pushing engineering sample penryns to >4.5gigs on air. I don't have a conroe, so I'm not sure what that'd equate to in terms of runtime per WU.

Erik
Erik
Joined: 14 Feb 06
Posts: 2815
Credit: 2645600
RAC: 0

RE: RE: ... With a

Message 74733 in response to message 74730

Quote:
Quote:
... With a somewhat smaller quota theres still some room for beta-app testing and experimenting without too much danger of running out of quota (unless for those who have 5-10 days of work cache).

A work cache of 5-10 days is nothing unusual, tho. I think the work quota was raised some time ago as a result of public demand, although work units must have been somewhat smaller then.

CU
Bikeman

The work units were quite a bit smaller, and some people's computers were hitting the 32 quota. Particularly those using Akos apps. Maybe the project should consider lowering the quota from 72?

archae86
archae86
Joined: 6 Dec 05
Posts: 3145
Credit: 7059454931
RAC: 1217464

RE: People are pushing

Message 74734 in response to message 74732

Quote:
People are pushing engineering sample penryns to >4.5gigs on air. I don't have a conroe, so I'm not sure what that'd equate to in terms of runtime per WU.


Conroe at 3 GHz is 8 to 9 hours for my diet of recent S5R3. Penryn claims some clocks-per-instruction improvement, has larger caches, and may, eventually, gain from SSE4. Even so getting down below 4 hours for the results I'm processing now looks iffy to me.

What I don't know is how much S5R3 varies in compute requirement at the various frequencies. (is that the right name for the second field in the result name, with values like 368.55?). Across S5R2 it varied a lot, and my hosts have only seen a few frequencies.

By the way, when you return bad ones, you get limited much more quickly than when you are a black hole. My host that stumbled on the 4.11 bug that just got fixed had a small queue, so just one or at most two Einstein units at a time. So in a total of under 45 minutes of downloading a result, starting it up, bombing out, and getting another tone on a 75 second cycle, it stopped. It had started with the 72/day limit, but each bombed return reduced the limit, so when it had done 35, it was told it was overdrawn for its (reduced) limit for the day.

At this minute, it has a limit of 34, having bombed after 30,000 seconds running 4.07.

I'm not arguing a particular side here, just giving a little data.

Now I will argue a point for a moment. One of the really painful things for me on the SETI forums is the violence with which some participants argue that other participants should be thrown out (pick your cause--overclaiming clients, underclaiming clients, optimized applications, outdated clients, excessive queues, inadequate queues...) I hope Einstein remains a place where orderly reasoned disagreement and discussion (hoping this thread is an example) does not metastasize to that sort of thing.

Peace,
Peter

Jim Bailey
Jim Bailey
Joined: 31 Aug 05
Posts: 91
Credit: 1452829
RAC: 0

It can be some what

It can be some what irritating at times. (Now there's an understatement) If you can't contact the person there's not much else you can do except grin and bare it. Not worth getting your BP all cranked up, they will get done when they get done, and not before. Have several in pending right now that might be done this month, or next month, at least I hope they will!

I too hope that the boards here will remain as they have been. Quiet, easy going, and pleasant.

Alicia, Dan, and Marv
Alicia, Dan, an...
Joined: 4 Jul 07
Posts: 7
Credit: 525867
RAC: 0

I guess my question really is

I guess my question really is why the 412 tasks assigned to this machine aren't or can't be reassigned to a "currently reliable" machine before they "time out" with the "black hole"?

Alinator
Alinator
Joined: 8 May 05
Posts: 927
Credit: 9352143
RAC: 0

One reason is to that

One reason is to that arbitrarily issue another result before the deadline is due is that it means one of the two in progress wingmen will be wasting their time (and power) running a trailer if both report on time.

The only way to avoid that is if your running 5.10.something (I forget the exact build number) which has 221 Auto Abort Redundant Results capability, the project has 221 functionality enabled, and the result had never been started at all.

Alinator

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.