Why the short due date with long workunits?

tullio
tullio
Joined: 22 Jan 05
Posts: 2,081
Credit: 47,433,957
RAC: 10,800

RE: It is not just about

Message 66986 in response to message 66985

Quote:
It is not just about processing speeds, it is also about how many hours per day you allow your computers to run. For example I use my home computer (Pentium IV 3 Ghz) for 2 hrs. a day on average. From the part of the world I come neiher electricity nor computers are cheap, so it is just not possible to keep computers running 24/7. There are a lot of people like me and they will be forced out of this project, which is very unfortunate.


You have a lot of sunlight in India. Why don't you exploit it?
Tullio

Urban
Urban
Joined: 20 Feb 05
Posts: 7
Credit: 49,301,337
RAC: 15,643

RE: RE: It is not just

Message 66987 in response to message 66986

Quote:
Quote:
It is not just about processing speeds, it is also about how many hours per day you allow your computers to run. For example I use my home computer (Pentium IV 3 Ghz) for 2 hrs. a day on average. From the part of the world I come neiher electricity nor computers are cheap, so it is just not possible to keep computers running 24/7. There are a lot of people like me and they will be forced out of this project, which is very unfortunate.

You have a lot of sunlight in India. Why don't you exploit it?
Tullio

VERY helpfull comment :-(

Neal Watkins
Neal Watkins
Joined: 19 Feb 05
Posts: 15
Credit: 1,055,607
RAC: 389

OK - I originated this thread

OK - I originated this thread expecting some sort of answer. But I didn't get one - only a lot of "Me too!"s, some spam, some OT comments.
Well here is an on-topic comment:
It just did it again - 2 more 64hour WUs with due dates shorter than the original seti WUs (see top of thread) that are still in the quese. So I stopped getting anymore Einstein workunits. When these are done, I'll maybe visit this message board in a month or so to see if anything has changed.
TTFN (Ta Ta For Now)

RottenMutt
RottenMutt
Joined: 11 Jun 05
Posts: 4
Credit: 427,395
RAC: 0

ditto, i've lost workunits

ditto,
i've lost workunits that took 70 cpu hours to complete and couldn't be completed within the 2 week due date. i've gotten credit for a few others that were late if quorum had not been met.
please lengthen the due date or change the milestone to start/stop the clock at the workunit start time.

Alinator
Alinator
Joined: 8 May 05
Posts: 927
Credit: 9,352,143
RAC: 0

There's an issue here which

There's an issue here which hasn't been addressed, which is why is the project sending work to hosts which cannot meet the deadline in the first place?.

For example looking at this result 570498:

You can get the project estimated FLOP count for the result from the command line sent to the app out of the client_state file. We also know the estimated run time for the result will be:

(Est_FLOPs / FP_BM) * (RDCF / (On_Frac * Run_Frac * CPU_Eff))

Which in the case is over 2.8 Msecs. The parameters haven't changed significantly for this host for months and are always available to the project side scheduler, therefore why was this result sent to the host at all since it's obvious it can't possibly make a 2 week deadline by a wide margin.

Wrote down formula wrong!

Alinator

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 2,042
Credit: 809,301,824
RAC: 1,237,944

RE: There's an issue here

Message 66991 in response to message 66990

Quote:

There's an issue here which hasn't been addressed, which is why is the project sending work to hosts which cannot meet the deadline in the first place?.

For example looking at this result 570498:

You can get the project estimated FLOP count for the result from the command line sent to the app out of the client_state file. We also know the estimated run time for the result will be:

(Est_FLOPs / FP_BM) * (RDCF / (On_Frac * Run_Frac * CPU_Eff))

Which in the case is over 2.5 Msecs. The parameters haven't changed significantly for this host for months and are always available to the project side scheduler, therefore why was this result sent to the host at all since it's obvious it can't possibly make a 2 week deadline by a wide margin.

Alinator


Has that computer finished crunching an S5R2 unit yet? If not, the RDCF will be left over from S5RI: your host metrics may not have changed for months, but the project's certainly have - I was going to say especially with the de-optimisation of AMD discussed elsewhere, but that doesn't apply to MMX! Let it crunch to the end to get a new RDCF, and then let us know what the difference was?

Edited to quote corrected formula

roadrunner_gs
roadrunner_gs
Joined: 7 Mar 06
Posts: 94
Credit: 3,369,656
RAC: 0

RE: (...) why is the

Message 66992 in response to message 66990

Quote:
(...)
why is the project sending work to hosts which cannot meet the deadline in the first place?.
(...)

Because they don't care as someone made me aware of today:

Quote:

The application we'll use for this run is all new and has never been used before. The algorithm used in the old App is still a part of the new one, and other parts have also been used before, but they have never been used in the present combination, and in particular not in a distributed computing project of that scale. We expect some problems to arise from this.

One of the issues we are still working on is some "overhead", i.e. calculations that are performed for technical reasons, but don't actually contribute to the result, thus wasting computing power.

We therefore will set up S5R2 as a short, experimental run that limits the search to parts of the parameter space where the overhead is well under control. During this short run we will improve the Application in various aspects. The results will also help us tuning the parameters for the next, larger run (probably named S5R3).

So don't bother either and just turn off your client if it isn't running 24/7 due to other reasons.

Alinator
Alinator
Joined: 8 May 05
Posts: 927
Credit: 9,352,143
RAC: 0

RE: Has that computer

Message 66993 in response to message 66991

Quote:
Has that computer finished crunching an S5R2 unit yet? If not, the RDCF will be left over from S5RI: your host metrics may not have changed for months, but the project's certainly have - I was going to say especially with the de-optimisation of AMD discussed elsewhere, but that doesn't apply to MMX! Let it crunch to the end to get a new RDCF, and then let us know what the difference was?

This was the second S5R2 this host has run. The first was a ~160 MHz one which finished just within the deadline, and there was no appreciable change in the RDCF after it reported and validated.

However, let's look at this logically for a second. We know the current app is plain vanilla at this point across all platforms. Therefore the RDCF should be going up for all platforms since the app is less efficient by definition.

Under that condition, you should be able to safely assume that if the estimated time to completion is outside the deadline with the old parameters there is no way a less efficient app can improve on that, given the WU will take longer to complete compared to the last runs at a similar template frequency.

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 2,042
Credit: 809,301,824
RAC: 1,237,944

RE: RE: Has that computer

Message 66994 in response to message 66993

Quote:
Quote:
Has that computer finished crunching an S5R2 unit yet? If not, the RDCF will be left over from S5RI: your host metrics may not have changed for months, but the project's certainly have - I was going to say especially with the de-optimisation of AMD discussed elsewhere, but that doesn't apply to MMX! Let it crunch to the end to get a new RDCF, and then let us know what the difference was?

This was the second S5R2 this host has run. The first was a ~160 MHz one which finished just within the deadline, and there was no appreciable change in the RDCF after it reported and validated.

However, let's look at this logically for a second. We know the current app is plain vanilla at this point across all platforms. Therefore the RDCF should be going up for all platforms since the app is less efficient by definition.

Under that condition, you should be able to safely assume that if the estimated time to completion is outside the deadline with the old parameters there is no way a less efficient app can improve on that, given the WU will take longer to complete compared to the last runs at a similar template frequency.


If the previous WU had exited before the scheduler work-fetch contact that resulted in the current WU, then I would agree with you. (I'm sure the RDCF is calculated and stored locally, so the reporting/validation doesn't matter: it's only reported to the project so we can see it conveniently on the webserver).

I very much doubt that they would even have considered changing the server algorithm to say 'hey, the new app is de-optimised - let's include a fiddle-factor in the Est_FLOPs - to allow a bit of leeway'.

Looking at your formula, I see four possibilities:

  • * Bad Est_FLOPs - their end - perfectly possible with a new work-generator
    * Bad FP_BM or RDCF - your end - should correct itself over time
    * Bad crunch - checkpoint read error causing restart, for example. Should be visible in
    * Bad scheduler allocation decision.

If you can absolutely rule out 1, 2 and 3, then 4 is a BOINC server bug and a candidate for reporting on trac.

Winterknight
Winterknight
Joined: 4 Jun 05
Posts: 482
Credit: 201,237,879
RAC: 120,759

I'm pretty sure that the

I'm pretty sure that the projects can reset the RDCF from the server end, If so should this have been done when S5R2 started.

Andy

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.