Why not pair up machines with similar average turnaround times?

Richard Schumacher

Joined: 8 Aug 06

Posts: 32

Credit: 14212314

RAC: 0

20 Nov 2007 16:26:10 UTC

Topic 193333

(moderation:

)

If I'm thinking about this correctly, doing so would (1) increase throughput for the project and (2) reduce the user annoyance that occurs when the wingman's turnaround time is about eight times one's own.

Richard Haselgrove

Joined: 10 Dec 05

Posts: 2140

Credit: 2769551570

RAC: 944952

Why not pair up machines with similar average turnaround times?

20 Nov 2007 16:39:02 UTC

Message 75505

(moderation:

)

Quote:

If I'm thinking about this correctly, doing so would (1) increase throughput for the project and (2) reduce the user annoyance that occurs when the wingman's turnaround time is about eight times one's own.

I agree that this would be a useful feature - even more so at SETI@home, where deadlines can vary from 8 days to 3.5 months.

Having said that, it's already on the long term BOINC Development Projects planning list - look at the very bottom of the page.

Their wording is

Quote:

Scheduler: implement mechanisms so that server:
Attempts to send results from the same WU to hosts with similar speed, so that a fast host doesn't have to wait weeks to get credit.

Note that the BOINC developers use the words "speed" and "fast", whereas you - correctly and much more appropriately - use the word "turnaround". I hope that any developer taking on this challenge will appreciate the difference, and code accordingly.

Edit - another benefit for projects is that if tasks which are issued at the same time, are returned at the same time, then they can be purged from the working BOINC database on average sooner, and hence keep the server database 'lean and mean' and more likely to run smoothly.

Bernd Machenschalk

Moderator

Administrator

Joined: 15 Oct 04

Posts: 4273

Credit: 245204163

RAC: 13351

The scheduler - database

20 Nov 2007 16:40:44 UTC

Message 75506

(moderation:

)

The scheduler - database interaction is the bottleneck of our project anyway, which was responsible for all the DB-related troubles we had in the past. Adding another criteria for picking workunits from the DB for a specific host would make the system slower and weaker. In principle it's a good idea, but I don't want to put the current stability of the project at stake for this.

Richard Haselgrove

Joined: 10 Dec 05

Posts: 2140

Credit: 2769551570

RAC: 944952

I hear what you say, Bernd,

20 Nov 2007 16:49:26 UTC

Message 75507

(moderation:

)

I hear what you say, Bernd, and I defer to your superior knowledge of the server implications!

Perhaps a developer could work up the code, but only include it in the distributions as a configurable option - default 'off'.

Brian Silvers

Joined: 26 Aug 05

Posts: 772

Credit: 282700

RAC: 0

RE: I hear what you say,

20 Nov 2007 18:11:01 UTC

Message 75508 in response to message 75507

(moderation:

)

Quote:

I hear what you say, Bernd, and I defer to your superior knowledge of the server implications!

Perhaps a developer could work up the code, but only include it in the distributions as a configurable option - default 'off'.

Just as an aside, I currently have 4 results here that I'm waiting on a roughly equivalent host... I have the equivalent of a FX-57, while they have a X2 4000+. They reported 4 tasks at one time on the 14th, but nothing since. They only have just under 2 days left on the deadline.

Ziran

Joined: 26 Nov 04

Posts: 194

Credit: 379734

RAC: 1328

As I sad the last time this

20 Nov 2007 19:53:08 UTC

Message 75509

(moderation:

)

As I sad the last time this discussion came up, Einstein have an advantage over other projects because of the multiple result datasets. A project like SETI would have to classify the host every time the host asks for more work. Einstein on the other hand would only have to classify the host every time a new dataset is selected for download. If one would use average turnaround time as the divider, this would make the size of the database smaller, because the two hosts assigned to a result would be closer together in terms of the date and time the results were retuned, compared to a totally random selection of pared hosts.

Lets say we use the average turnaround time of 4 days as the divider. Then hosts with an average turnaround time of less then 4 days would be assigned datasets from group A and hosts with average turnaround time of 4 and more would be assigned datasets from group B. This is more or les the setup we had then the project had booth long and short results, except the hosts â€œspeedâ€? were used as a divider . If I remember correctly, we got big problems with the database then the project run out of short results, because the scheduler would look through the hole database in search of short results every time a host would request short results. The trick would be not to let one group run out of work before the other.

If I understand things correctly, we usually would send out datasets in order of rising frequency, starting at the lower frequencies at the beginning of the run and sending datasets at the highest frequency at the end. I guess the scheduler keep track of how many hosts that have been assigned to each dataset to know then to start assigning hosts to the next and what datasets are completed. So why not give grope A datasets in descending order and group B datasets in ascending order.

The last time this was discussed, Bruce said he would look at its practicality, the next time he looked at the scheduler code.

http://einsteinathome.org/node/192019

Then you're really interested in a subject, there is no way to avoid it. You have to read the Manual.

Brian Silvers

Joined: 26 Aug 05

Posts: 772

Credit: 282700

RAC: 0

RE: RE: I hear what you

22 Nov 2007 3:58:49 UTC

Message 75510 in response to message 75508

(moderation:

)

Quote:

Quote:
I hear what you say, Bernd, and I defer to your superior knowledge of the server implications!

Perhaps a developer could work up the code, but only include it in the distributions as a configurable option - default 'off'.

Just as an aside, I currently have 4 results here that I'm waiting on a roughly equivalent host... I have the equivalent of a FX-57, while they have a X2 4000+. They reported 4 tasks at one time on the 14th, but nothing since. They only have just under 2 days left on the deadline.

Well, the host I'm waiting on has about 12 hours and 22 minutes to report. I'm speculating that they won't... Their machine certainly was capable of doing all 10 results they have assigned to them that are still out. Dunno why they haven't reported...

Annika

Joined: 8 Aug 06

Posts: 720

Credit: 494410

RAC: 0

It keeps happening... maybe

22 Nov 2007 19:26:01 UTC

Message 75511

(moderation:

)

It keeps happening... maybe the box had a reinstall, or the owner was on vacation and didn't use it... or maybe some newbish user accidentally blocked BOINC behind the firewall and it has all results completed but can't report them (don't laugh, I've seen it happen)...

Stranger7777

Joined: 17 Mar 05

Posts: 436

Credit: 418265840

RAC: 36201

In terms of database use,

23 Nov 2007 5:54:50 UTC

Message 75512

(moderation:

)

In terms of database use, this is no need to lookup a hole database for the computer with similar productivity. It is better AFAIK to look up with something like SEEK or LOCATE (it terms of FoxPro) with indexes turned off, or - if you have an index on productivity - to pick up top available user with near requested parameters. In my practice it significantly raises the perfomance of time critical moments.
And another thoght about that is to... remove the bottleneck of Scheduler-DB interaction in hardware (upgrade computer or network in this part) or redistribute the load over the cluster another way.

EclipseHA

Joined: 19 Feb 05

Posts: 41

Credit: 10540182

RAC: 0

Let the project team manage

23 Nov 2007 6:15:44 UTC

Message 75513

(moderation:

)

Let the project team manage stuff like this.

They're the best to make decisions that are not only fair to All users, but will also keeps the project servers happy!

My guess, after being a systems programmer for 30 years, is that it's not a good use of the server to be doing involved DB lookups on each new WU request. Let the WU's fall where they may, as anything involved in a "guess" is completely under the user (client's) control.

For example, let's say the "juggle" occurs, and I get a WU that's "matched" to another cruncher. If that cruncher or I modifies the resource share on another project, it totally negates the extra juggle by Einstein servers! It's not just your client and the project servers. The project servers are dealing with a whole boatload of requests from many clients every minute, and are dealing with databases that are huge in comparison to what most client users have ever seen in a single user mode.

I'd say, crunch, be happy, and be patient! You'll get the credit you earned, but maybe not as fast as you like.

Why not pair up machines with similar average turnaround times?

Forums › Cruncher's Corner

Comment viewing options

Forums › Cruncher's Corner