Scheduler now sends "frontrunners"

miw

Joined: 18 Jan 05

Posts: 19

Credit: 46235552

RAC: 0

30 May 2005 5:32:45 UTC

Topic 189266

(moderation:

)

I noticed my pending credit has ballooned, and I was having a look to see why....

I notice that in all my workunits lately, the unit is sent out to one host first (for my best host, almost aways me. :-) and then sent out to three others only when the WU comes back from that host (what happens when there is an error I don't know.)

Is this a new policy to send out "frontrunners" to see if the WU is ok before sending it to more hosts?
--miw

--miw

JoeB

Joined: 24 Feb 05

Posts: 124

Credit: 90504086

RAC: 30386

Scheduler now sends "frontrunners"

30 May 2005 9:36:56 UTC

Message 12005

(moderation:

)

I notice that in all my workunits lately, the unit is sent out to one host first ...

That happened to me for awhile. At one point I had 11 pendings, which for my output of less than 2.5 WU per day is alot. When I checked the WUs they had been sent to me way before they were sent to anyone else. Here's an example of 4 days ahead, there were many at 3 days ahead: http://einsteinathome.org/workunit/1045849

After awhile the 11 cleared and I no longer get the WU's way ahead of anyone else. It was really odd and I wondered what was happening to the scheduler. I guess it still has the problem.

Joe B

Heffed

Joined: 18 Jan 05

Posts: 257

Credit: 12368

RAC: 0

It's the way E@H sends out

30 May 2005 15:47:38 UTC

Message 12006

(moderation:

)

It's the way E@H sends out WUs. Since it doesn't send the large input file out with each WU, it needs to find other hosts with the same input file on their machine to process the same WU as you. (or wait until another host is free, then send out the same input file) If a host is slower, or doesn't connect for several days, it can sometimes take a while to find another host with the same input file.

gravywavy

Joined: 22 Jan 05

Posts: 392

Credit: 68962

RAC: 0

I notice that in all my

30 May 2005 16:58:09 UTC

Message 12007

(moderation:

)

Is this a new policy to send out "frontrunners" to see if the WU is ok before sending it to more hosts?

My guess is that this is a non-deliberate side-effect of two other rules. Such unintended consequences are surprisingly common, and are most likely patented by Murphy.

Consider

rule 1 - assign work from the data the client already holds

rule 2 - don't assign consecutive wu to the same pairings of computers

Now suppose A (by luck) is the first computer to be assigned work from a new dataset.

Eventually, along comes B who has no more wu to be assigned form their old data, and thay are assigned wu from the same dataset as B. Because of rule 2, B will only be assigned one wu that is shared with B. B's next wu after that will be a different wu from the same dataset. Meanwhile A may well want a second wu.

Then along comes C, D, E each will only be assigned one of the WU that any other computer has had. We might have this picture just after G gets their frist wu from this dataset:

wu 1 : A, B, C, D
wu 2 : B,
wu 3 : A, E, F, G
wu 4 : A,
wu 5 : B, E
wu 6 : A,
wu 7 : C, E
wu 8 : B, F
wu 9 : A
wu 10: D, F
wu 11: C,
wu 12: B
wu 13: A
wu 14: D
wu 15: C
wu 16: B
wu 17: A
wu 18: C
wu 19: D

Notice, even with 7 different computers, we have just 2 wu completed,
and most wu issued still have only one computer assigned.

Of course this is just one possible picture - I have assumed all computers are running at the same speed and each new computer arriving one step later, so we have 7xA, 6xB, 5xC, 4xD, 3xE at the time shown. Real life is never that neatand if 'A', is significantly faster than average A the frontrunner effect will be magnified further.

Later on, when there are many computers all crunching data from this dataset, the frontrunner effect disappears.

I wonder if your observatins fit this sort of pattern ?

In other words, if you were first (A) the scheduler was not so much waiting for you to finish the wu, but for other computers to arrive?

~~gravywavy

Glenn Hawley, R...

Joined: 6 Mar 05

Posts: 48

Credit: 900991235

RAC: 277679

I've noticed my "pending" is

1 Jun 2005 20:28:36 UTC

Message 12008

(moderation:

)

I've noticed my "pending" is increasing as well. I'm up to 3,700 or so pending credits now, while my RAC continues to decline from a peak of about 1100 toward 900.

Ziran

Joined: 26 Nov 04

Posts: 194

Credit: 655119

RAC: 1823

My guess is that this is a

6 Jun 2005 14:56:19 UTC

Message 12009 in response to message 12007

(moderation:

)

My guess is that this is a non-deliberate side-effect of two other rules. Such unintended consequences are surprisingly common, and are most likely patented by Murphy.

Consider

rule 1 - assign work from the data the client already holds

rule 2 - don't assign consecutive wu to the same pairings of computers

Now this is interesting. So if for some reason computer A manages to download all the 150 WU's from the data set, we would need 450 other computers to download the same data set to complete it, if no WU had to be re sent.

What is the minimum number of computers needed to complete a data set, if no WU have to be re sent?

Is there a limit on how many WU's a computer can download from the same data set? should there be? if a computer downloads more WU's from the same data set then is needed to complete the data set with the minimum number of computers, it would mean that this computer would block the remaining computers from downloading WU's and cause more computers to have to download the same data set, so it can be completed. It would also mean that it would take more time before an unused paring is possible for the remaining WU's and there by prolonging the time the WU remains in the database.

Then you're really interested in a subject, there is no way to avoid it. You have to read the Manual.

Wurgl (speak^Wc...

Joined: 11 Feb 05

Posts: 321

Credit: 140550008

RAC: 0

Well, actually it is a little

6 Jun 2005 17:14:21 UTC

Message 12010

(moderation:

)

Well, actually it is a little bit nasty. When you look here, then you can see that all the WU's on the first page and some on the second page (at this time including WU 1219597) are only assigned to my machine.

Yes, I have downloaded a lot of WU's to stress the scheduler, I thought that a higher number of Wu's assigned to only one machine may force it to assign them to a second machine. Well, lets see what happens.

BTW: I learned one thing: At least with the 4.19 client one shall never ever set the connection interval to a value higher than 1. The dumb thing downloaded too many WU's. The logic behind seems to be written during some drunken pahse of the programmer ... Happily the machine is fast enough not to loose any calculation, but downloading 19 WU's should be prevented by the scheduler (actually loading another 8 at a time when there are already 11 hanging around).

Divide Overflow

Joined: 9 Feb 05

Posts: 91

Credit: 183220

RAC: 0

So you're dissapointed that

6 Jun 2005 20:23:02 UTC

Message 12011

(moderation:

)

So you're dissapointed that the client did what you told it to do? You asked for a larger cache of WU's and the project gave it to you! There are limits of 8 WU's per day per host to keep some sanity going, but the rest is in your hands... :)

Wurgl (speak^Wc...

Joined: 11 Feb 05

Posts: 321

Credit: 140550008

RAC: 0

So you're dissapointed that

6 Jun 2005 20:31:45 UTC

Message 12012 in response to message 12011

(moderation:

)

Yes, I am disappointed!

I did change the settings to 1.75 days and the @%$! gave me close to 5 days.

Scheduler now sends "frontrunners"

Forums › Cruncher's Corner

Comment viewing options

Forums › Cruncher's Corner