Petition - Deadline Relief for Longest Results

Bernd Machenschalk

Moderator

Administrator

Joined: 15 Oct 04

Posts: 4273

Credit: 245220413

RAC: 12880

RE: RE: The other thing

18 Jul 2007 12:14:32 UTC

Message 70034 in response to message 70030

(moderation:

)

Quote:

Quote:
The other thing that SETI does that would help here is the initial replication of 3 with a quorum of 2. With the new BOINC software(5.8.x and up), the software will send 3, wait for the first 2 results and then cancel the 3rd WU if the host has not started it yet. We could eliminate the 45-60 day waits some people have gotten for credit that way.
Off topic: We could also send smaller, more reasonable WU's that don't scare people off.

But doesn't that take a backend update as well (which is needed here)?

I think so, but even more important is that it also requires a newer "minimal" Client version. We're still issuing work for all Clients from version 4.19 on, and I don't intend to change this without need.

Another aspect is that this way the computation time spent on the canceled result is alway wasted (does the participant get credit for it anyway?). My guess would be that while this gives faster results for the project and faster credit for the fast participants the waste of computing power is larger than what is lost by results arriving too late in the current scheme.

ohiomike

Joined: 4 Nov 06

Posts: 80

Credit: 6453639

RAC: 0

RE: RE: RE: The other

18 Jul 2007 13:40:01 UTC

Message 70035 in response to message 70034

(moderation:

)

Quote:

Quote:
Quote:
The other thing that SETI does that would help here is the initial replication of 3 with a quorum of 2. With the new BOINC software(5.8.x and up), the software will send 3, wait for the first 2 results and then cancel the 3rd WU if the host has not started it yet. We could eliminate the 45-60 day waits some people have gotten for credit that way.
Off topic: We could also send smaller, more reasonable WU's that don't scare people off.

But doesn't that take a backend update as well (which is needed here)?

I think so, but even more important is that it also requires a newer "minimal" Client version. We're still issuing work for all Clients from version 4.19 on, and I don't intend to change this without need.

Another aspect is that this way the computation time spent on the canceled result is alway wasted (does the participant get credit for it anyway?). My guess would be that while this gives faster results for the project and faster credit for the fast participants the waste of computing power is larger than what is lost by results arriving too late in the current scheme.

BM

There is no work/credit lost. WU's are not aborted if they have begun to run. Only WU's in the clients queue that are no longer needed are aborted. All in all it is a good thing because almost 100% of the WU's crunched are used. The sending of the "trailer" and the book-keeping overhead might be a pain for the project however.

Bikeman (Heinz-...

Moderator

Joined: 28 Aug 06

Posts: 3522

Credit: 689310808

RAC: 217272

RE: RE: RE: RE: The

18 Jul 2007 13:58:39 UTC

Message 70036 in response to message 70035

(moderation:

)

Quote:

Quote:
Quote:
Quote:
The other thing that SETI does that would help here is the initial replication of 3 with a quorum of 2. With the new BOINC software(5.8.x and up), the software will send 3, wait for the first 2 results and then cancel the 3rd WU if the host has not started it yet. We could eliminate the 45-60 day waits some people have gotten for credit that way.
Off topic: We could also send smaller, more reasonable WU's that don't scare people off.

But doesn't that take a backend update as well (which is needed here)?

I think so, but even more important is that it also requires a newer "minimal" Client version. We're still issuing work for all Clients from version 4.19 on, and I don't intend to change this without need.

Another aspect is that this way the computation time spent on the canceled result is alway wasted (does the participant get credit for it anyway?). My guess would be that while this gives faster results for the project and faster credit for the fast participants the waste of computing power is larger than what is lost by results arriving too late in the current scheme.

BM

There is no work/credit lost. WU's are not aborted if they have begun to run. Only WU's in the clients queue that are no longer needed are aborted. All in all it is a good thing because almost 100% of the WU's crunched are used. The sending of the "trailer" and the book-keeping overhead might be a pain for the project however.

But if 3 WUs get crunched, but 2 would be enough for validation, I would regard the effort for the 3rd result "wasted" (regardless of credits granted). I'm crunching for science, not for credits. I personally would not be happy with such a policy at all.

BRM

Nothing But Idl...

Joined: 24 Aug 05

Posts: 158

Credit: 289204

RAC: 0

RE: But if 3 WUs get

18 Jul 2007 17:34:01 UTC

Message 70037 in response to message 70036

(moderation:

)

Quote:

But if 3 WUs get crunched, but 2 would be enough for validation, I would regard the effort for the 3rd result "wasted" (regardless of credits granted). I'm crunching for science, not for credits. I personally would not be happy with such a policy at all.
BRM

Strongly agree.

Alinator

Joined: 8 May 05

Posts: 927

Credit: 9352143

RAC: 0

RE: But if 3 WUs get

18 Jul 2007 18:14:41 UTC

Message 70038 in response to message 70036

(moderation:

)

Quote:

But if 3 WUs get crunched, but 2 would be enough for validation, I would regard the effort for the 3rd result "wasted" (regardless of credits granted). I'm crunching for science, not for credits. I personally would not be happy with such a policy at all.

CU

BRM

I started tracking this about 10 months ago when enough Core 2 D's and then Q's started appearing over at SAH to start making an impact on the work stream my hosts were seeing.

As I said before, the criteria SAH used to set the tightness factor for the deadline was that a PI-100 runing with 33.3333% machine ontime should be able to make the deadline (IIRC).

What's working out in practice currently is that with a 3/2 IR/MQ there is no scientifically useful reason to run basically anything less than a PIII or Athlon 'Classic' on SAH, since the odds are the result returned will be the trailer for the WU, even if you run it 24/7. Currently, my Katmai 550 running 24/7 with a 0.01 cache coupled CI is still effective, but if I ran it for 12 hours per day or with a 1 or 2 day CI, I estimated it would be returning 50% trailers or more.

So looking it at it from the viewpoint of not wasting my money on electricity I'd have to serious consider dropping the project on this host.

The beauty of EAH has been ever since you went to 2/2 way back when, as long as the host can meet the deadline, you know for a fact your host has contributed to the science, and therefore it was worth running it here no matter how fast or slow it is. This has only broken down recently in S5R2, and then only because the beta apps have been a lot slower than what we had before, as well as for some reason the scheduler has seen fit to send the oldest ones template frequencies which are beyond their capabilities with a 2 week deadline.

Alinator

Bikeman (Heinz-...

Moderator

Joined: 28 Aug 06

Posts: 3522

Credit: 689310808

RAC: 217272

RE: This has only broken

18 Jul 2007 18:28:23 UTC

Message 70039 in response to message 70038

(moderation:

)

Quote:

This has only broken down recently in S5R2, and then only because the beta apps have been a lot slower than what we had before, ....

Akos made an interesting remark, stating that the new apps are, in fact, several orders of magnitude *faster* than the old ones, probably meaning that they can do the same "scientific work" many times faster. So if the pre-S5R2 apps were biplanes, the new ones seem to be jet fighters. Problem is they get assigned much longer missions (just to stretch your paradigm a bit more :-) ) in the hierarchical all-sky search of S5R2.

I just wanted to clearify "slow" a bit so people don't get the impression that the apps "deteriorated" over time in some way.

BRM

Alinator

Joined: 8 May 05

Posts: 927

Credit: 9352143

RAC: 0

RE: RE: This has only

18 Jul 2007 18:38:27 UTC

Message 70040 in response to message 70039

(moderation:

)

Quote:

Quote:
This has only broken down recently in S5R2, and then only because the beta apps have been a lot slower than what we had before, ....

Akos made an interesting remark, stating that the new apps are, in fact, several orders of magnitude *faster* than the old ones, probably meaning that they can do the same "scientific work" many times faster. So if the pre-S5R2 apps were biplanes, the new ones seem to be jet fighters. Problem is they get assigned much longer missions (just to stretch your paradigm a bit more :-) ) in the hierarchical all-sky search of S5R2.

I just wanted to clearify "slow" a bit so people don't get the impression that the apps "deteriorated" over time in some way.

CU

BRM

LOL...

Agreed. It's all relative (as it should be on on EAH). ;-)

The new work is more difficult compared to the old work. So even though the new apps have a lot of the improvements which were in the old apps performance wise, in the current configuration it seems like they are slower, relativityly (pun intended) speaking! :-)

Alinator

Joined: 8 May 05

Posts: 927

Credit: 9352143

RAC: 0

RE: I think so, but even

18 Jul 2007 18:43:11 UTC

Message 70041 in response to message 70034

(moderation:

)

Quote:

I think so, but even more important is that it also requires a newer "minimal" Client version. We're still issuing work for all Clients from version 4.19 on, and I don't intend to change this without need.

Another aspect is that this way the computation time spent on the canceled result is alway wasted (does the participant get credit for it anyway?). My guess would be that while this gives faster results for the project and faster credit for the fast participants the waste of computing power is larger than what is lost by results arriving too late in the current scheme.

BM

Bernd,

Take a look at this host:

Bad 4.19 Host

You may have to consider cutting off clients older than the later releases of 4x in some circumstances. IIRC, there were serious client side scheduler and other issues with some of them. While this wasn't such a big deal back then, it seems to cause some problems with the state of the project today. ;-)

Alinator

Bikeman (Heinz-...

Moderator

Joined: 28 Aug 06

Posts: 3522

Credit: 689310808

RAC: 217272

RE: RE: I think so, but

18 Jul 2007 19:42:31 UTC

Message 70042 in response to message 70041

(moderation:

)

Quote:

Quote:
I think so, but even more important is that it also requires a newer "minimal" Client version. We're still issuing work for all Clients from version 4.19 on, and I don't intend to change this without need.

Another aspect is that this way the computation time spent on the canceled result is alway wasted (does the participant get credit for it anyway?). My guess would be that while this gives faster results for the project and faster credit for the fast participants the waste of computing power is larger than what is lost by results arriving too late in the current scheme.

BM

Bernd,

Take a look at this host:

Bad 4.19 Host

You may have to consider cutting off clients older than the later releases of 4x in some circumstances. IIRC, there were serious client side scheduler and other issues with some of them. While this wasn't such a big deal back then, it seems to cause some problems with the state of the project today. ;-)

Alinator

Is the link correct?
CU

BRM

Alinator

Joined: 8 May 05

Posts: 927

Credit: 9352143

RAC: 0

OOPS..... Bad 4.19

18 Jul 2007 19:51:15 UTC

Message 70043

(moderation:

)

OOPS.....

Bad 4.19 host

LOL...

The one time I didn't check to make sure the link was right before moving on to other problems! ;-)

The thought just occured to me that hosts like this might have been a contributing factor to the trouble we saw at the end of S5R1. Since it appears it will grab a big load of work every time it connnects and then blow the deadline for all but a few, it could leave the project side thinking that a given set of datapaks has an adequate number of hosts running it and possibly delay the time it takes to get around to issuing them to a host more likely to actually return them on time.

Alinator

Petition - Deadline Relief for Longest Results

Forums › Cruncher's Corner

Comment viewing options

Forums › Cruncher's Corner