Redundant 4th result

verty
verty
Joined: 31 Jul 05
Posts: 69
Credit: 16658
RAC: 0
Topic 189647

At present in work unit 1659094, 3 of the 4 PC's have finished and have been granted credit. The fourth PC is still busy, and the deadline is 4 days away.

It so happens in this case that that fourth PC has a Transmeta Crusoe CPU, which isn't too fast, and its turnaround time is about 7.5 days.

There is no reason for this PC to waste the next 4 days trying to complete the work unit when 3 valid results have already been submitted for it.

It seems that this PC will go ahead and waste the next 4 days of CPU-time for nothing.

My wish is that when the BOINC client contacts the server it abandons all work units for which the required number of valid results are in.

I realise that it can happen that the fourth PC also gets credit, but if my computer was in such a situation, I would rather forfeit the credit for a few hours of CPU time saved. Over thousands of work-units this would add up.

It could be an option in the client, to adandon superfluous work units or to complete them for the credit. I imagine many would prefer to be less wasteful.

John McLeod VII
John McLeod VII
Moderator
Joined: 10 Nov 04
Posts: 547
Credit: 632255
RAC: 0

Redundant 4th result

Quote:

At present in work unit 1659094, 3 of the 4 PC's have finished and have been granted credit. The fourth PC is still busy, and the deadline is 4 days away.

It so happens in this case that that fourth PC has a Transmeta Crusoe CPU, which isn't too fast, and its turnaround time is about 7.5 days.

There is no reason for this PC to waste the next 4 days trying to complete the work unit when 3 valid results have already been submitted for it.

It seems that this PC will go ahead and waste the next 4 days of CPU-time for nothing.

My wish is that when the BOINC client contacts the server it abandons all work units for which the required number of valid results are in.

I realise that it can happen that the fourth PC also gets credit, but if my computer was in such a situation, I would rather forfeit the credit for a few hours of CPU time saved. Over thousands of work-units this would add up.

It could be an option in the client, to adandon superfluous work units or to complete them for the credit. I imagine many would prefer to be less wasteful.


The problem with this is that the computer does not normally contact the server in the middle of a WU. This would take intervention on the part of the user to initiate the contact.

j2satx
j2satx
Joined: 22 Jan 05
Posts: 46
Credit: 1650297
RAC: 0

RE: RE: The problem with

Message 14956 in response to message 14955

Quote:
Quote:


The problem with this is that the computer does not normally contact the server in the middle of a WU. This would take intervention on the part of the user to initiate the contact.

Perhaps the partially completed WU is "preempted" while another project is running and the server has been / will be contacted.

verty
verty
Joined: 31 Jul 05
Posts: 69
Credit: 16658
RAC: 0

RE: Perhaps the partially

Message 14957 in response to message 14956

Quote:
Perhaps the partially completed WU is "preempted" while another project is running and the server has been / will be contacted.

I don't know how the system works but there seems to be a set interface. A client can return results, request work, and not much else. I doesn't seem to preempt superfluous work units. If it does, I apologise for going on about this unnecessarily.

It would be nice if there was a third action, which returned whether a specified work unit had enough valid results. On each client-server communication the client could request the status of all its pending work units, so that it could abort any that are no longer required.

As I said before, it could be made a preference in the client. I suspect predominantly people would prefer the saved CPU-time to the extra credit. It would make the system more efficient.

It wouldn't need much on the server: one lookup for all superfluous pending work units for the specific server, and one update per aborted work unit.

I do realise though that it would require a new server release, new client release, testing, etc. Of course, it would not be worthwhile if the delays and hassle caused by doing this would be more costly than living with the problem. It is something worth noting for the future.

verty
verty
Joined: 31 Jul 05
Posts: 69
Credit: 16658
RAC: 0

Continuing: In fact, it is

Message 14958 in response to message 14957

Continuing:

In fact, it is easier that I thought. The client could query the status of the work-units from a web service, then report a client error for the work units it wants to abort. On some work units I have seen the fourth host had a client error, which I imagine must mean the user decided to cancel that work unit, because it was superfluous.

But I suppose the client would need to know where the web service was, and this would need modification in the project, etc. So you can't get around it, it will always be involved.

Ziran
Ziran
Joined: 26 Nov 04
Posts: 194
Credit: 356403
RAC: 1626

David Anderson and Bruce

David Anderson and Bruce Allen recently made modifications to the BOINC scheduler which are designed to resend WU's to hosts which have lost them. If it is found that a host have lost a WU the scheduler then decides if it should be resent to that host. If there are enough results and a consensus among them, the WU isn't resent. So then ewer the host is examined for lost WU's, it could also be examined for remaining work that is no longer needed. Now the question is: Is the gain from this worth the extra load on the servers?

Then you're really interested in a subject, there is no way to avoid it. You have to read the Manual.

verty
verty
Joined: 31 Jul 05
Posts: 69
Credit: 16658
RAC: 0

RE: David Anderson and

Message 14960 in response to message 14959

Quote:
David Anderson and Bruce Allen recently made modifications to the BOINC scheduler which are designed to resend WU's to hosts which have lost them. If it is found that a host have lost a WU the scheduler then decides if it should be resent to that host. If there are enough results and a consensus among them, the WU isn't resent. So then ewer the host is examined for lost WU's, it could also be examined for remaining work that is no longer needed. Now the question is: Is the gain from this worth the extra load on the servers?

Okay, I didn't know that.

When one branch fails, and 3 valid results have not been submitted, it allocates a new host (or I suppose resends to the same host). What it doesn't do, but I admit it would be involved to implement, is to tell the client when any of the pending work units on that client are superfluous because 3 valid results have been submitted. Otherwise, the system works very well.

Of course, there is another way to avoid the fourth host wasting CPU-time, and that is not to require a fourth host. With four hosts per work unit the turnaround times will be more consistent: only greater than the deadline if two of the hosts don't submit results in time. With three hosts, any host not returning results in time will mean that work unit is delayed.

So I fully appreciate why four hosts are allocated, it gives a lower turnaround time, and having 25% more processing power is probably not worth having some results come in early and some late.

verty
verty
Joined: 31 Jul 05
Posts: 69
Credit: 16658
RAC: 0

Continuing: Notifying

Message 14961 in response to message 14960

Continuing:

Notifying clients when they connect of superfluous work units would lower that 25%. The amount saved will only be great where the gap in processing power between the 3rd and the 4th hosts is large. I suppose then that it would only be the slowpokes who lose out. Trying to save some time for the slowest host would probably not be that beneficial, because it is the most likely of the four hosts to be last on any work units in the future.

So really, the trend is for slow hosts to suffer superfluous work units. And actually, that is good because the CPU-time lost is slow CPU-time. Fast CPU-time is used more efficiently, by default.

This problem is really not as important as I had thought. Those that are suffering superfluous work units should up their speed, increase their duty cycle, or accept it (they will still get credit).

Sven Glueckspilz
Sven Glueckspilz
Joined: 18 Mar 05
Posts: 23
Credit: 27474851
RAC: 0

Well, one week ago, a user

Well, one week ago, a user get more than 22,000 workunits on 22,000 virtual hosts, becauce a mistake was in his firewall or something else.

I get many of this workunits too. The result is, that 75 % of this workunits get ready with three users. The other 25 % are still pending.

It would be interesting, how many of the 22,000 workunits get ready in the last week. If it is a high percentage, it could be better to send out at first one workunit to only three users. So you can save much time (25%) for the hole projekt.

John McLeod VII
John McLeod VII
Moderator
Joined: 10 Nov 04
Posts: 547
Credit: 632255
RAC: 0

RE: Continuing: Notifying

Message 14963 in response to message 14961

Quote:

Continuing:

Notifying clients when they connect of superfluous work units would lower that 25%. The amount saved will only be great where the gap in processing power between the 3rd and the 4th hosts is large. I suppose then that it would only be the slowpokes who lose out. Trying to save some time for the slowest host would probably not be that beneficial, because it is the most likely of the four hosts to be last on any work units in the future.

So really, the trend is for slow hosts to suffer superfluous work units. And actually, that is good because the CPU-time lost is slow CPU-time. Fast CPU-time is used more efficiently, by default.

This problem is really not as important as I had thought. Those that are suffering superfluous work units should up their speed, increase their duty cycle, or accept it (they will still get credit).


The typical response if there is no credit for the work sent will just to be to turn off the computer. The loss of computing power will be enormous as only the fastest computers will be left. Not everyone has the cash to buy a new computer every 6 months, and the old ones are doing useful work.

The projects could only send out the exact number of required results, and then only send out replacements as needed. The expected savings in CPU time will be less than 25% as there are some WUs that are sent that never come back for one reason or another - and it could be as low as 0%. Neither of us has the information to accurately answer this, but we can put bounds on the savings. The project administrators do have this information.

verty
verty
Joined: 31 Jul 05
Posts: 69
Credit: 16658
RAC: 0

RE: It would be

Message 14964 in response to message 14962

Quote:
It would be interesting, how many of the 22,000 workunits get ready in the last week. If it is a high percentage, it could be better to send out at first one workunit to only three users. So you can save much time (25%) for the hole projekt.

It can happen that there are anomalies, and inadvertantly a fast computer receives too many work units. The task is to stop these anomalies.

Anyhow, this type of thing proves the usefulness of having 4 hosts process each work unit, because otherwise 22000 work units would automatically be delayed by a week, potentially more.

Remember that the CPU-time wasted is the slow CPU-time. It won't matter that this user was assigned 22000 work units, he/she will submit a few results as they usually would, being the slowest of the four or not as usual, and the process will continue largely undisturbed.

Allocating 3 hosts per work unit would only be feasible if there was a high expectation of reliability. I don't think there is: you can see the anomalies.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.