Why not pair up machines with similar average turnaround times?

Richard Schumacher
Richard Schumacher
Joined: 8 Aug 06
Posts: 32
Credit: 14,212,314
RAC: 0
Topic 193333

If I'm thinking about this correctly, doing so would (1) increase throughput for the project and (2) reduce the user annoyance that occurs when the wingman's turnaround time is about eight times one's own.

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 2,143
Credit: 2,906,762,116
RAC: 637,109

Why not pair up machines with similar average turnaround times?

Quote:
If I'm thinking about this correctly, doing so would (1) increase throughput for the project and (2) reduce the user annoyance that occurs when the wingman's turnaround time is about eight times one's own.


I agree that this would be a useful feature - even more so at SETI@home, where deadlines can vary from 8 days to 3.5 months.

Having said that, it's already on the long term BOINC Development Projects planning list - look at the very bottom of the page.

Their wording is

Quote:
Scheduler: implement mechanisms so that server:
Attempts to send results from the same WU to hosts with similar speed, so that a fast host doesn't have to wait weeks to get credit.


Note that the BOINC developers use the words "speed" and "fast", whereas you - correctly and much more appropriately - use the word "turnaround". I hope that any developer taking on this challenge will appreciate the difference, and code accordingly.

Edit - another benefit for projects is that if tasks which are issued at the same time, are returned at the same time, then they can be purged from the working BOINC database on average sooner, and hence keep the server database 'lean and mean' and more likely to run smoothly.

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4,301
Credit: 248,260,132
RAC: 32,856

The scheduler - database

The scheduler - database interaction is the bottleneck of our project anyway, which was responsible for all the DB-related troubles we had in the past. Adding another criteria for picking workunits from the DB for a specific host would make the system slower and weaker. In principle it's a good idea, but I don't want to put the current stability of the project at stake for this.

BM

BM

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 2,143
Credit: 2,906,762,116
RAC: 637,109

I hear what you say, Bernd,

I hear what you say, Bernd, and I defer to your superior knowledge of the server implications!

Perhaps a developer could work up the code, but only include it in the distributions as a configurable option - default 'off'.

Brian Silvers
Brian Silvers
Joined: 26 Aug 05
Posts: 772
Credit: 282,700
RAC: 0

RE: I hear what you say,

Message 75508 in response to message 75507

Quote:

I hear what you say, Bernd, and I defer to your superior knowledge of the server implications!

Perhaps a developer could work up the code, but only include it in the distributions as a configurable option - default 'off'.

Just as an aside, I currently have 4 results here that I'm waiting on a roughly equivalent host... I have the equivalent of a FX-57, while they have a X2 4000+. They reported 4 tasks at one time on the 14th, but nothing since. They only have just under 2 days left on the deadline.

Ziran
Ziran
Joined: 26 Nov 04
Posts: 194
Credit: 549,717
RAC: 1,740

As I sad the last time this

As I sad the last time this discussion came up, Einstein have an advantage over other projects because of the multiple result datasets. A project like SETI would have to classify the host every time the host asks for more work. Einstein on the other hand would only have to classify the host every time a new dataset is selected for download. If one would use average turnaround time as the divider, this would make the size of the database smaller, because the two hosts assigned to a result would be closer together in terms of the date and time the results were retuned, compared to a totally random selection of pared hosts.

Lets say we use the average turnaround time of 4 days as the divider. Then hosts with an average turnaround time of less then 4 days would be assigned datasets from group A and hosts with average turnaround time of 4 and more would be assigned datasets from group B. This is more or les the setup we had then the project had booth long and short results, except the hosts “speed� were used as a divider . If I remember correctly, we got big problems with the database then the project run out of short results, because the scheduler would look through the hole database in search of short results every time a host would request short results. The trick would be not to let one group run out of work before the other.

If I understand things correctly, we usually would send out datasets in order of rising frequency, starting at the lower frequencies at the beginning of the run and sending datasets at the highest frequency at the end. I guess the scheduler keep track of how many hosts that have been assigned to each dataset to know then to start assigning hosts to the next and what datasets are completed. So why not give grope A datasets in descending order and group B datasets in ascending order.

The last time this was discussed, Bruce said he would look at its practicality, the next time he looked at the scheduler code.

http://einsteinathome.org/node/192019

Then you're really interested in a subject, there is no way to avoid it. You have to read the Manual.

Brian Silvers
Brian Silvers
Joined: 26 Aug 05
Posts: 772
Credit: 282,700
RAC: 0

RE: RE: I hear what you

Message 75510 in response to message 75508

Quote:
Quote:

I hear what you say, Bernd, and I defer to your superior knowledge of the server implications!

Perhaps a developer could work up the code, but only include it in the distributions as a configurable option - default 'off'.

Just as an aside, I currently have 4 results here that I'm waiting on a roughly equivalent host... I have the equivalent of a FX-57, while they have a X2 4000+. They reported 4 tasks at one time on the 14th, but nothing since. They only have just under 2 days left on the deadline.

Well, the host I'm waiting on has about 12 hours and 22 minutes to report. I'm speculating that they won't... Their machine certainly was capable of doing all 10 results they have assigned to them that are still out. Dunno why they haven't reported...

Annika
Annika
Joined: 8 Aug 06
Posts: 720
Credit: 494,410
RAC: 0

It keeps happening... maybe

It keeps happening... maybe the box had a reinstall, or the owner was on vacation and didn't use it... or maybe some newbish user accidentally blocked BOINC behind the firewall and it has all results completed but can't report them (don't laugh, I've seen it happen)...

Stranger7777
Stranger7777
Joined: 17 Mar 05
Posts: 436
Credit: 425,006,325
RAC: 65,367

In terms of database use,

In terms of database use, this is no need to lookup a hole database for the computer with similar productivity. It is better AFAIK to look up with something like SEEK or LOCATE (it terms of FoxPro) with indexes turned off, or - if you have an index on productivity - to pick up top available user with near requested parameters. In my practice it significantly raises the perfomance of time critical moments.
And another thoght about that is to... remove the bottleneck of Scheduler-DB interaction in hardware (upgrade computer or network in this part) or redistribute the load over the cluster another way.

EclipseHA
EclipseHA
Joined: 19 Feb 05
Posts: 41
Credit: 10,540,182
RAC: 0

Let the project team manage

Let the project team manage stuff like this.

They're the best to make decisions that are not only fair to All users, but will also keeps the project servers happy!

My guess, after being a systems programmer for 30 years, is that it's not a good use of the server to be doing involved DB lookups on each new WU request. Let the WU's fall where they may, as anything involved in a "guess" is completely under the user (client's) control.

For example, let's say the "juggle" occurs, and I get a WU that's "matched" to another cruncher. If that cruncher or I modifies the resource share on another project, it totally negates the extra juggle by Einstein servers! It's not just your client and the project servers. The project servers are dealing with a whole boatload of requests from many clients every minute, and are dealing with databases that are huge in comparison to what most client users have ever seen in a single user mode.

I'd say, crunch, be happy, and be patient! You'll get the credit you earned, but maybe not as fast as you like.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.