send out Work Units closer together

David@home
David@home
Joined: 11 Feb 05
Posts: 24
Credit: 11639
RAC: 0
Topic 189170

I have noticed that einstein@home sends out the four work units with very wide gaps between them. This means there is a gap of around 12 hours (or more)from when the first work unit is sent and the last one is sent.

This means that the system that gets allocated the fourth work unit will often be the last to report since it is 12 hours behind the first one. This means that is likely to be wasting CPU time, OK you will get the credit but the science contribution has been wasted since the quorum of three returned work units has already been reached.

I would like to think that the CPU time is being put to good scientific use. Can the work units be sent out closer together so that all four systems are on a more equal footing in terms of contributing to the scientific value?

Many thanks

Wurgl (speak^Wcrunching for Special: Off-Topic)
Wurgl (speak^Wc...
Joined: 11 Feb 05
Posts: 321
Credit: 140550008
RAC: 0

send out Work Units closer together

I have noticed that einstein@home sends out the four work units with very wide gaps between them. This means there is a gap of around 12 hours (or more)from when the first work unit is sent and the last one is sent.

In my understanding of the way it works, there seem to be two files. One which has the actual data and one which hold the range which we are crunching inside that data.

The first one is large, as I have seen ~15MB. The second one is small. To avoid sending too much data a machine gets crunch sets which are for that large file and only if there are no more available another big data file is sent.

So actually you will find groups of machines crunching ranges of that large file. And this means that the scheduler -- and you -- have to wait until one machine of this group gets ready for another one of those small files.

Look at my box, every WU I get is also sent to the host 93352, and actually I am waiting for a third one. If you have a reasonable fast machine, you will have the same experiance.

David@home
David@home
Joined: 11 Feb 05
Posts: 24
Credit: 11639
RAC: 0

Many thanks Wurgl I had

Message 11608 in response to message 11607

Many thanks Wurgl

I had not noticed before but you are correct the WUs are allocated to the same systems for me as well. As you described this must be linked to the initial large data file that is sent out.

This makes it worse in many respects because if you are the slow guy you will have a high chance of being the last to report and hence will be last to get sent the next work unit and this leads to a spiral where you will tend to always be the last to report and the quorum will have already been met. As such your contribution to the science will be nothing, OK you wil get the credit if you return by the deadline but it is contributing to science that interests me.

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5846
Credit: 109979497503
RAC: 29064886

.... As such your

Message 11609 in response to message 11608


....
As such your contribution to the science will be nothing, ....

This is simply not true and certainly not true for the "slow guy".

The developers made the decision that 4 work units would be sent and that 3 agreeing results would constitute a quorum. For those situations where all 4 results are returned, someone has to be last, no matter how fast or slow his box happens to be. So if 4 results were always returned, you could argue that the sending of the fourth one in the first place wast a complete waste, irrespective of when it happened to arrive back.

However, the real story is that 4 returned results within the deadline is probably the exception rather than the rule. In many cases 3 out of 4 come back and and everyone is happy. The last one didn't matter. In probably even more cases only 2 (or less) out of 4 come back and then we have to start resending the work. They'd be doing this resending stunt far more often if only 3 were sent out in the first place.

So, someone with the benefit of the stats on return rates has probably worked out what is the most efficient way to get 3 agreeing results and the answer is to send 4 in the first place. At least we would all hope that something like that is the case. If so, we should be happy in the knowledge that best efficient practices are being followed and the maximum science outcome is the result. Nobody should deem their work to be "wasted" just because the quorum eventually ended up with more than 3.

Cheers,
Gary.

David@home
David@home
Joined: 11 Feb 05
Posts: 24
Credit: 11639
RAC: 0

In the bigger picture maybe

Message 11610 in response to message 11609

In the bigger picture maybe the average is closer to only three out of the four being returned.

I can only comment in my situation and four results are returned close on 100% of the time, and if you are the last one returning then you have not contributed to the science since the work unit has already been validated from the first three returned. All they do with the fourth result is give you the median credit of the first three.

If you have the slowest of the four systems that get sent the original big data file then this scheduling design will tend for the slowest to contribute nothing to the science as you will in in that downward spiral of always being the last one to return the work.

Personally I would prefer to return to the original scheduling method when only three WUs were sent out. OK when quorum was not met by the deadline a new WU would have to be sent out which would cause a delay. But I would sooner have this delay in getting credit than the current situation in which the last person to return has not contriuted to the science. With the very short deadline used by einstein@home then is delay would not be very long.

Just my opionion.

Wurgl (speak^Wcrunching for Special: Off-Topic)
Wurgl (speak^Wc...
Joined: 11 Feb 05
Posts: 321
Credit: 140550008
RAC: 0

In the bigger picture maybe

Message 11611 in response to message 11610

In the bigger picture maybe the average is closer to only three out of the four being returned.

Agree, but ...

I can only comment in my situation and four results are returned close on 100% of the time, and if you are the last one returning then you have not contributed to the science since the work unit has already been validated from the first three returned. All they do with the fourth result is give you the median credit of the first three.

... who of us does really know what they do with the 4th result?

If you have the slowest of the four systems that get sent the original big data file then this scheduling design will tend for the slowest to contribute nothing to the science as you will in in that downward spiral of always being the last one to return the work.

... guys with slower machines are lucky to have almost no pending credits :-)

Personally I would prefer to return to the original scheduling method when only three WUs were sent out. OK when quorum was not met by the deadline a new WU would have to be sent out which would cause a delay. But I would sooner have this delay in getting credit than the current situation in which the last person to return has not contriuted to the science. With the very short deadline used by einstein@home then is delay would not be very long.

Actually it depends on the chance that a result is returned correctly. If all results are almost always returned correctly I tend to agree, but in reality a lot of machines crash, send back bad resulty, people have vacation, switch to other projects or even sometimes the results do not reach the machine. So if you have only three sent out and you have to wait for the third the data would hang around longer in the database, making it larger and slower. I think this problem must be taken in account too.

Udo
Udo
Joined: 19 May 05
Posts: 203
Credit: 8945570
RAC: 0

Would it be a good idea to

Would it be a good idea to send the big data file to computers with a similar average turnaround time?
Then the results of slow computers - or fast computers which are not often connected to the internet - would arrive at the same time (in average).
Udo Fischer

Udo

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.