Why the short due date with long workunits?

tullio

Joined: 22 Jan 05

Posts: 2118

Credit: 61407735

RAC: 0

RE: It is not just about

21 May 2007 9:49:17 UTC

Message 66986 in response to message 66985

(moderation:

)

Quote:

It is not just about processing speeds, it is also about how many hours per day you allow your computers to run. For example I use my home computer (Pentium IV 3 Ghz) for 2 hrs. a day on average. From the part of the world I come neiher electricity nor computers are cheap, so it is just not possible to keep computers running 24/7. There are a lot of people like me and they will be forced out of this project, which is very unfortunate.

You have a lot of sunlight in India. Why don't you exploit it?
Tullio

Urban

Joined: 20 Feb 05

Posts: 7

Credit: 56288559

RAC: 602

RE: RE: It is not just

21 May 2007 9:54:16 UTC

Message 66987 in response to message 66986

(moderation:

)

Quote:

Quote:
It is not just about processing speeds, it is also about how many hours per day you allow your computers to run. For example I use my home computer (Pentium IV 3 Ghz) for 2 hrs. a day on average. From the part of the world I come neiher electricity nor computers are cheap, so it is just not possible to keep computers running 24/7. There are a lot of people like me and they will be forced out of this project, which is very unfortunate.

You have a lot of sunlight in India. Why don't you exploit it?
Tullio

VERY helpfull comment :-(

http://www.boincstats.com/stats/banner.php?cpid=3837f9fafc28ff2e9df5b13ae2f8aaf7

Neal Watkins

Joined: 19 Feb 05

Posts: 18

Credit: 1272157

RAC: 69

OK - I originated this thread

22 May 2007 0:24:24 UTC

Message 66988

(moderation:

)

OK - I originated this thread expecting some sort of answer. But I didn't get one - only a lot of "Me too!"s, some spam, some OT comments.
Well here is an on-topic comment:
It just did it again - 2 more 64hour WUs with due dates shorter than the original seti WUs (see top of thread) that are still in the quese. So I stopped getting anymore Einstein workunits. When these are done, I'll maybe visit this message board in a month or so to see if anything has changed.
TTFN (Ta Ta For Now)

RottenMutt

Joined: 11 Jun 05

Posts: 4

Credit: 427395

RAC: 0

ditto, i've lost workunits

22 May 2007 5:22:23 UTC

Message 66989

(moderation:

)

ditto,
i've lost workunits that took 70 cpu hours to complete and couldn't be completed within the 2 week due date. i've gotten credit for a few others that were late if quorum had not been met.
please lengthen the due date or change the milestone to start/stop the clock at the workunit start time.

Alinator

Joined: 8 May 05

Posts: 927

Credit: 9352143

RAC: 0

There's an issue here which

22 May 2007 13:03:07 UTC

Message 66990

(moderation:

)

There's an issue here which hasn't been addressed, which is why is the project sending work to hosts which cannot meet the deadline in the first place?.

For example looking at this result 570498:

You can get the project estimated FLOP count for the result from the command line sent to the app out of the client_state file. We also know the estimated run time for the result will be:

(Est_FLOPs / FP_BM) * (RDCF / (On_Frac * Run_Frac * CPU_Eff))

Which in the case is over 2.8 Msecs. The parameters haven't changed significantly for this host for months and are always available to the project side scheduler, therefore why was this result sent to the host at all since it's obvious it can't possibly make a 2 week deadline by a wide margin.

Wrote down formula wrong!

Alinator

Richard Haselgrove

Joined: 10 Dec 05

Posts: 2140

Credit: 2770379929

RAC: 920519

RE: There's an issue here

22 May 2007 13:13:27 UTC

Message 66991 in response to message 66990

(moderation:

)

Quote:

There's an issue here which hasn't been addressed, which is why is the project sending work to hosts which cannot meet the deadline in the first place?.

For example looking at this result 570498:

You can get the project estimated FLOP count for the result from the command line sent to the app out of the client_state file. We also know the estimated run time for the result will be:

(Est_FLOPs / FP_BM) * (RDCF / (On_Frac * Run_Frac * CPU_Eff))

Which in the case is over 2.5 Msecs. The parameters haven't changed significantly for this host for months and are always available to the project side scheduler, therefore why was this result sent to the host at all since it's obvious it can't possibly make a 2 week deadline by a wide margin.

Alinator

Has that computer finished crunching an S5R2 unit yet? If not, the RDCF will be left over from S5RI: your host metrics may not have changed for months, but the project's certainly have - I was going to say especially with the de-optimisation of AMD discussed elsewhere, but that doesn't apply to MMX! Let it crunch to the end to get a new RDCF, and then let us know what the difference was?

Edited to quote corrected formula

roadrunner_gs

Joined: 7 Mar 06

Posts: 94

Credit: 3369656

RAC: 0

RE: (...) why is the

22 May 2007 13:14:59 UTC

Message 66992 in response to message 66990

(moderation:

)

Quote:

(...)
why is the project sending work to hosts which cannot meet the deadline in the first place?.
(...)

Because they don't care as someone made me aware of today:

Quote:

The application we'll use for this run is all new and has never been used before. The algorithm used in the old App is still a part of the new one, and other parts have also been used before, but they have never been used in the present combination, and in particular not in a distributed computing project of that scale. We expect some problems to arise from this.

One of the issues we are still working on is some "overhead", i.e. calculations that are performed for technical reasons, but don't actually contribute to the result, thus wasting computing power.

We therefore will set up S5R2 as a short, experimental run that limits the search to parts of the parameter space where the overhead is well under control. During this short run we will improve the Application in various aspects. The results will also help us tuning the parameters for the next, larger run (probably named S5R3).

So don't bother either and just turn off your client if it isn't running 24/7 due to other reasons.

Alinator

Joined: 8 May 05

Posts: 927

Credit: 9352143

RAC: 0

RE: Has that computer

22 May 2007 13:37:08 UTC

Message 66993 in response to message 66991

(moderation:

)

Quote:

Has that computer finished crunching an S5R2 unit yet? If not, the RDCF will be left over from S5RI: your host metrics may not have changed for months, but the project's certainly have - I was going to say especially with the de-optimisation of AMD discussed elsewhere, but that doesn't apply to MMX! Let it crunch to the end to get a new RDCF, and then let us know what the difference was?

This was the second S5R2 this host has run. The first was a ~160 MHz one which finished just within the deadline, and there was no appreciable change in the RDCF after it reported and validated.

However, let's look at this logically for a second. We know the current app is plain vanilla at this point across all platforms. Therefore the RDCF should be going up for all platforms since the app is less efficient by definition.

Under that condition, you should be able to safely assume that if the estimated time to completion is outside the deadline with the old parameters there is no way a less efficient app can improve on that, given the WU will take longer to complete compared to the last runs at a similar template frequency.

Richard Haselgrove

Joined: 10 Dec 05

Posts: 2140

Credit: 2770379929

RAC: 920519

RE: RE: Has that computer

22 May 2007 14:13:18 UTC

Message 66994 in response to message 66993

(moderation:

)

Quote:

Quote:
Has that computer finished crunching an S5R2 unit yet? If not, the RDCF will be left over from S5RI: your host metrics may not have changed for months, but the project's certainly have - I was going to say especially with the de-optimisation of AMD discussed elsewhere, but that doesn't apply to MMX! Let it crunch to the end to get a new RDCF, and then let us know what the difference was?

This was the second S5R2 this host has run. The first was a ~160 MHz one which finished just within the deadline, and there was no appreciable change in the RDCF after it reported and validated.

However, let's look at this logically for a second. We know the current app is plain vanilla at this point across all platforms. Therefore the RDCF should be going up for all platforms since the app is less efficient by definition.

Under that condition, you should be able to safely assume that if the estimated time to completion is outside the deadline with the old parameters there is no way a less efficient app can improve on that, given the WU will take longer to complete compared to the last runs at a similar template frequency.

If the previous WU had exited before the scheduler work-fetch contact that resulted in the current WU, then I would agree with you. (I'm sure the RDCF is calculated and stored locally, so the reporting/validation doesn't matter: it's only reported to the project so we can see it conveniently on the webserver).

I very much doubt that they would even have considered changing the server algorithm to say 'hey, the new app is de-optimised - let's include a fiddle-factor in the Est_FLOPs - to allow a bit of leeway'.

Looking at your formula, I see four possibilities:

* Bad Est_FLOPs - their end - perfectly possible with a new work-generator
* Bad FP_BM or RDCF - your end - should correct itself over time
* Bad crunch - checkpoint read error causing restart, for example. Should be visible in
* Bad scheduler allocation decision.

If you can absolutely rule out 1, 2 and 3, then 4 is a BOINC server bug and a candidate for reporting on trac.

Winterknight

Joined: 4 Jun 05

Posts: 1242

Credit: 321149153

RAC: 427597

I'm pretty sure that the

24 May 2007 8:22:03 UTC

Message 66995

(moderation:

)

I'm pretty sure that the projects can reset the RDCF from the server end, If so should this have been done when S5R2 started.

Andy

Why the short due date with long workunits?

Forums › Cruncher's Corner

Comment viewing options

Forums › Cruncher's Corner