Scheduler now sends "frontrunners"

miw
miw
Joined: 18 Jan 05
Posts: 19
Credit: 46235552
RAC: 0
Topic 189266

I noticed my pending credit has ballooned, and I was having a look to see why....

I notice that in all my workunits lately, the unit is sent out to one host first (for my best host, almost aways me. :-) and then sent out to three others only when the WU comes back from that host (what happens when there is an error I don't know.)

Is this a new policy to send out "frontrunners" to see if the WU is ok before sending it to more hosts?
--miw

--miw

JoeB
JoeB
Joined: 24 Feb 05
Posts: 124
Credit: 89416076
RAC: 28499

Scheduler now sends "frontrunners"


I notice that in all my workunits lately, the unit is sent out to one host first ...

That happened to me for awhile. At one point I had 11 pendings, which for my output of less than 2.5 WU per day is alot. When I checked the WUs they had been sent to me way before they were sent to anyone else. Here's an example of 4 days ahead, there were many at 3 days ahead: http://einsteinathome.org/workunit/1045849

After awhile the 11 cleared and I no longer get the WU's way ahead of anyone else. It was really odd and I wondered what was happening to the scheduler. I guess it still has the problem.

Joe B

Heffed
Heffed
Joined: 18 Jan 05
Posts: 257
Credit: 12368
RAC: 0

It's the way E@H sends out

It's the way E@H sends out WUs. Since it doesn't send the large input file out with each WU, it needs to find other hosts with the same input file on their machine to process the same WU as you. (or wait until another host is free, then send out the same input file) If a host is slower, or doesn't connect for several days, it can sometimes take a while to find another host with the same input file.

gravywavy
gravywavy
Joined: 22 Jan 05
Posts: 392
Credit: 68962
RAC: 0

I notice that in all my


I notice that in all my workunits lately, the unit is sent out to one host first (for my best host, almost aways me. :-) and then sent out to three others only when the WU comes back from that host (what happens when there is an error I don't know.)

Is this a new policy to send out "frontrunners" to see if the WU is ok before sending it to more hosts?

My guess is that this is a non-deliberate side-effect of two other rules. Such unintended consequences are surprisingly common, and are most likely patented by Murphy.

Consider

rule 1 - assign work from the data the client already holds

rule 2 - don't assign consecutive wu to the same pairings of computers

Now suppose A (by luck) is the first computer to be assigned work from a new dataset.

Eventually, along comes B who has no more wu to be assigned form their old data, and thay are assigned wu from the same dataset as B. Because of rule 2, B will only be assigned one wu that is shared with B. B's next wu after that will be a different wu from the same dataset. Meanwhile A may well want a second wu.

Then along comes C, D, E each will only be assigned one of the WU that any other computer has had. We might have this picture just after G gets their frist wu from this dataset:

wu 1 : A, B, C, D
wu 2 : B,
wu 3 : A, E, F, G
wu 4 : A,
wu 5 : B, E
wu 6 : A,
wu 7 : C, E
wu 8 : B, F
wu 9 : A
wu 10: D, F
wu 11: C,
wu 12: B
wu 13: A
wu 14: D
wu 15: C
wu 16: B
wu 17: A
wu 18: C
wu 19: D

Notice, even with 7 different computers, we have just 2 wu completed,
and most wu issued still have only one computer assigned.

Of course this is just one possible picture - I have assumed all computers are running at the same speed and each new computer arriving one step later, so we have 7xA, 6xB, 5xC, 4xD, 3xE at the time shown. Real life is never that neatand if 'A', is significantly faster than average A the frontrunner effect will be magnified further.

Later on, when there are many computers all crunching data from this dataset, the frontrunner effect disappears.

I wonder if your observatins fit this sort of pattern ?

In other words, if you were first (A) the scheduler was not so much waiting for you to finish the wu, but for other computers to arrive?

~~gravywavy

Glenn Hawley, RASC Calgary
Glenn Hawley, R...
Joined: 6 Mar 05
Posts: 48
Credit: 890900213
RAC: 333292

I've noticed my "pending" is

I've noticed my "pending" is increasing as well. I'm up to 3,700 or so pending credits now, while my RAC continues to decline from a peak of about 1100 toward 900.

Ziran
Ziran
Joined: 26 Nov 04
Posts: 194
Credit: 605124
RAC: 836

My guess is that this is a

Message 12009 in response to message 12007


My guess is that this is a non-deliberate side-effect of two other rules. Such unintended consequences are surprisingly common, and are most likely patented by Murphy.

Consider

rule 1 - assign work from the data the client already holds

rule 2 - don't assign consecutive wu to the same pairings of computers

Now this is interesting. So if for some reason computer A manages to download all the 150 WU's from the data set, we would need 450 other computers to download the same data set to complete it, if no WU had to be re sent.

What is the minimum number of computers needed to complete a data set, if no WU have to be re sent?

Is there a limit on how many WU's a computer can download from the same data set? should there be? if a computer downloads more WU's from the same data set then is needed to complete the data set with the minimum number of computers, it would mean that this computer would block the remaining computers from downloading WU's and cause more computers to have to download the same data set, so it can be completed. It would also mean that it would take more time before an unused paring is possible for the remaining WU's and there by prolonging the time the WU remains in the database.

Then you're really interested in a subject, there is no way to avoid it. You have to read the Manual.

Wurgl (speak^Wcrunching for Special: Off-Topic)
Wurgl (speak^Wc...
Joined: 11 Feb 05
Posts: 321
Credit: 140550008
RAC: 0

Well, actually it is a little

Well, actually it is a little bit nasty. When you look here, then you can see that all the WU's on the first page and some on the second page (at this time including WU 1219597) are only assigned to my machine.

Yes, I have downloaded a lot of WU's to stress the scheduler, I thought that a higher number of Wu's assigned to only one machine may force it to assign them to a second machine. Well, lets see what happens.

BTW: I learned one thing: At least with the 4.19 client one shall never ever set the connection interval to a value higher than 1. The dumb thing downloaded too many WU's. The logic behind seems to be written during some drunken pahse of the programmer ... Happily the machine is fast enough not to loose any calculation, but downloading 19 WU's should be prevented by the scheduler (actually loading another 8 at a time when there are already 11 hanging around).

Divide Overflow
Divide Overflow
Joined: 9 Feb 05
Posts: 91
Credit: 183220
RAC: 0

So you're dissapointed that

So you're dissapointed that the client did what you told it to do? You asked for a larger cache of WU's and the project gave it to you! There are limits of 8 WU's per day per host to keep some sanity going, but the rest is in your hands... :)

Wurgl (speak^Wcrunching for Special: Off-Topic)
Wurgl (speak^Wc...
Joined: 11 Feb 05
Posts: 321
Credit: 140550008
RAC: 0

So you're dissapointed that

Message 12012 in response to message 12011

So you're dissapointed that the client did what you told it to do? You asked for a larger cache of WU's and the project gave it to you! There are limits of 8 WU's per day per host to keep some sanity going, but the rest is in your hands... :)

Yes, I am disappointed!

I did change the settings to 1.75 days and the @%$! gave me close to 5 days.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.