About lack of time to finish workunits until deadline

metalius
metalius
Joined: 29 Dec 05
Posts: 44
Credit: 172607243
RAC: 24804
Topic 193368

This a message, addressed to our CHIEFS :)
If we try to compare Einstein@home and, for example, SETI@home projects, we can see such situation today:
- CPU time, necessary to finish SETI@home workunit, can be about 3 times (or more) shorter;
- time, given from SETI@home project until report deadline, is at least 2 times longer...
Why? Such scurrying in Einstein@home... :)
Quite big part of my computers, attached to Einstein@home, are laptops (or mobile workstations). Of course, such computers can be rarely turned on and more rarely - online.
Finally, I have problem with such type of PC - work is done, result is reported, but no sense, effect or reason of finished work and, of course... no credits. :) IMHO, most posisible origin is - Einstein@home don't need this result anymore, because it is reported after deadline, for example look here - http://einsteinathome.org/workunit/35339025 (Id of my PC is 1054416).
What to do? Maybe retarget my laptops to another projects (like CPDN - no deadlines at all)? ;) :)
P.S. And of course... Sorry for my English.

metalius
metalius
Joined: 29 Dec 05
Posts: 44
Credit: 172607243
RAC: 24804

About lack of time to finish workunits until deadline

One example more...
An a little bit older CPU, like Pentium III, needs about 100 hours to finish a workunit. It is possible to do this, when such PC is permanently turned on...

B.I.G
B.I.G
Joined: 26 Oct 07
Posts: 105
Credit: 949151636
RAC: 1264443

100 Hours takes about 4 days

Message 75790 in response to message 75789

100 Hours takes about 4 days to finish if the computer is running 24/7. Deadline is 2 weeks. So yes it is possible.

Alinator
Alinator
Joined: 8 May 05
Posts: 927
Credit: 9352143
RAC: 0

@ metalius Well, in one

@ metalius

Well, in one regard you sort of answered your own question. You have to take into account the amount of time the machine will actually be turned on, the nominal network connection frequency it will have, and the number of other projects it runs in addition to the tightness factor of the project (defined as the ratio of the actual runtime of the work to the deadline interval) when deciding whether the host can run the project. As you've seen, a part time cruncher with an infrequent network connection can easily blow the deadline, even if it has the horsepower to crunch the result within the deadline given more uptime or allowed to connect to the project to report more frequently.

As you suspected, the reason for no credit on the example task you posted was that your host missed the deadline, the WU was reissued to a new wingman and it reported before yours did. Once a task is over the deadline, you must get your result back before the reissue is returned in order to get credit for it.

As far as SAH goes, the long deadlines were set so that a very wide spectrum of hosts would be able to participate and get the work back ontime. EAH, obviously, is much more demanding in this regard, especially now on S5R3.

To answer your last question in the followup post, yes, PIII's are fully capable of running EAH and making the deadline, especially if they are run 24/7, even if they run more than one project. In fact older/slower hosts than PIII's can make the deadline, except for the very longest running of the WU's.

HTH,

Alinator

I took a second fuller look at the WU in question and it turns out you were the extra wingman on that one when the original one missed his deadline. However, since the original one managed to report before you and you missed the deadline by a few hours you got left out in the cold, so to speak, when it came to getting credit for it.

Alinator

Brian Silvers
Brian Silvers
Joined: 26 Aug 05
Posts: 772
Credit: 282700
RAC: 0

@ everyone This subject is

@ everyone

This subject is coming up constantly now, with the inevitable comparison to SETI. The piece that is missing from most people's mind in regards to SETI's deadlines is only partially described in Alinator's post. Another component to it was the amount of complaining that happens when those of us end users see our systems entering what is colloquially known as "EDF" (Earliest Deadline First) mode so as to attempt to complete a workunit by its' deadline. SETI was distributing a lot of short-running tasks (4-day deadline) that when hosts that were either slower/had large caches/smaller resource allocations/less uptime would receive, those hosts would immediately enter EDF, which people translate to be: "YOU ARE DISRESPECTING MY SETTINGS! HOW DARE YOU!" (seriously, people get that bent out of shape about it...) With the longer deadlines now, the incidence of EDF happening at SETI is greatly reduced, but I think it has caused the increase in complaining here due to the "tightness factor" that Alinator mentioned.

Anyway, if you are working on your first Einstein result, there is something called a RDCF (Result Duration Correction Factor) that is set to 1. This is an assistant to the benchmark system that helps better estimate the actual runtime of a result. If your system's RDCF really should be 1.2, the RDCF won't get corrected until you at least complete a result, which is why some of you may be missing deadlines. Your specific BOINC installation doesn't know at first that you need longer than the estimate. Once it knows this, it will bring your system into Earliest Deadline First / "High Priority" mode sooner so as to get the Einstein task worked on more so it has a chance to complete "on time", or at least real close. The "real close" bit is key right now because of how Einstein results have a variation in their runtimes that was unexpected by the development team. This variation is probably also causing variations in the RDCF, which is why if you have a slower host with a smaller resource allocation, you may be OK for one result, then miss on a result that runs longer because of the variation. You can see your current RDCF by looking at the statistics for your computer. My RDCF is currently 0.525797. This means it takes approximately 52.6% of the estimated time based on the benchmark. Due to SETI being down over the next day, I'm going to run some Einstein results. I'll check the variance with respect to RDCF and post follow-up...

That said, I have begun a mini-campaign of asking the Einstein project to lengthen the deadlines back out to 3 weeks for a while. This will help cut down on what Alinator mentioned as the "tightness factor". The other thing that would help the tightness is if the science application had SSE optimization so it could work better on Pentium III and newer processors. That was one of the goals of the development team, but fixing bugs has taken a priority, which is why I'm asking them to consider increasing the deadline to at least 18 days, if not 21 days, until such time as they can get the optimization worked into at least all of the major platforms. Mac OS X Intel already has it. It would be great if the optimization was in all platforms, but, IMO, it would need to at least be into the Linux and Windows apps before reducing deadlines again.

Brian Silvers
Brian Silvers
Joined: 26 Aug 05
Posts: 772
Credit: 282700
RAC: 0

RE: I took a second

Message 75793 in response to message 75791

Quote:

I took a second fuller look at the WU in question and it turns out you were the extra wingman on that one when the original one missed his deadline. However, since the original one managed to report before you and you missed the deadline by a few hours you got left out in the cold, so to speak, when it came to getting credit for it.

The project team could change the 221 policy to zap a result even if it had been started... ;)

@metalius - this is an inside joke, I'm not serious.

Alinator
Alinator
Joined: 8 May 05
Posts: 927
Credit: 9352143
RAC: 0

RE: RE: I took a second

Message 75794 in response to message 75793

Quote:
Quote:

I took a second fuller look at the WU in question and it turns out you were the extra wingman on that one when the original one missed his deadline. However, since the original one managed to report before you and you missed the deadline by a few hours you got left out in the cold, so to speak, when it came to getting credit for it.

The project team could change the 221 policy to zap a result even if it had been started... ;)

@metalius - this is an inside joke, I'm not serious.

LOL...

Acutally, it's a not very funny "inside" joke. EAH will issue an unconditional abort to a host actually running the task in a fully validated WU case. I've had it happen more than once on my K6/300's when they got issued tasks which they had zero chance of making the deadline on under any circumstances. I suspect his case was one where the host had already finished the task in plenty of time. IOW, was completed and waiting for a network connection to upload and/or report.

Alinator

Just for reference, this will show as a 197 (user abort) on the Task and WU summaries on the project site pages, even though they were project and not user initiated.

Alinator

metalius
metalius
Joined: 29 Dec 05
Posts: 44
Credit: 172607243
RAC: 24804

RE: To answer your last

Message 75795 in response to message 75791

Quote:
To answer your last question in the followup post, yes, PIII's are fully capable of running EAH and making the deadline, especially if they are run 24/7, even if they run more than one project.


Thank You for answer, but... this was not a question, this was an example - PIII's are capable, when turned on permanently, and not capable in most another situations.
Example
PIII at work, turned on - 8 hours/day.
To finish Einstein@home workunit this PC needs 100/8=12.5 days (and if running Einstein@home only). But...
In 2 weeks we have 10 workdays only. Conclusion - it is impossible to finish workunit until deadline with such standing at work PIII!!!
I don't understant, how are you calculating, especially what is 24/7, but, IMHO, I am right... :)
I must repeat my REAL question - why 2 weeks only until deadline???
Earlier, there was no problems with missing deadlines, this problem is new in Einstein@home, and this frustrates me, of course...

metalius
metalius
Joined: 29 Dec 05
Posts: 44
Credit: 172607243
RAC: 24804

2 Brian Silvers RE: The

Message 75796 in response to message 75793

2 Brian Silvers

Quote:
The project team could change the 221 policy to zap a result even if it had been started... ;)
@metalius - this is an inside joke, I'm not serious.


How do You think, I understood your joke or not? Answer is not!!! :) For example, "the 221 policy to zap.." says me... totally nothing. :)
And don't forget - not all the people in the world are "English speaking", a lot of people (like me) never studied English at school.
This was some kind of joke too. :) Good luck!

Alinator
Alinator
Joined: 8 May 05
Posts: 927
Credit: 9352143
RAC: 0

RE: Thank You for answer,

Message 75797 in response to message 75795

Quote:
Thank You for answer, but... this was not a question, this was an example - PIII's are capable, when turned on permanently, and not capable in most another situations.
Example
PIII at work, turned on - 8 hours/day.
To finish Einstein@home workunit this PC needs 100/8=12.5 days (and if running Einstein@home only). But...
In 2 weeks we have 10 workdays only. Conclusion - it is impossible to finish workunit until deadline with such standing at work PIII!!!
I don't understant, how are you calculating, especially what is 24/7, but, IMHO, I am right... :)
I must repeat my REAL question - why 2 weeks only until deadline???
Earlier, there was no problems with missing deadlines, this problem is new in Einstein@home, and this frustrates me, of course...

First, 24/7 = 24 hours per day, 7 days a week. Often seen with 365 (days in a year) trailing. (ie: 24/7/365)

As far as PIII capability goes; I have a 550 MHz Katmai (which is down with a bad hard drive currently). However, my personal database for it shows it had been running S5R3 tasks in about 90,000 seconds each on average so far. This means I would only have to run the machine roughly 6430 seconds per day to meet the deadline if EAH were the only project I ran on it. I'm not sure why your PIII would need 100 hours (roughly 5 times more than mine) to run an EAH task at this point in S5R3, since any later Socket 370 PIII should be able to make the deadline under the uptime constraints you mentioned with ease right now.

Concerning the 'why' about the choice of project deadlines; it has as much to do with resources available to and the goals and timelines for the research of a given project as it does with the concerns of client host eligibility, multi project friendlieness, and participant perceptions.

Alinator

Alinator
Alinator
Joined: 8 May 05
Posts: 927
Credit: 9352143
RAC: 0

RE: 2 Brian

Message 75798 in response to message 75796

Quote:
2 Brian Silvers
Quote:
The project team could change the 221 policy to zap a result even if it had been started... ;)
@metalius - this is an inside joke, I'm not serious.

How do You think, I understood your joke or not? Answer is not!!! :) For example, "the 221 policy to zap.." says me... totally nothing. :)
And don't forget - not all the people in the world are "English speaking", a lot of people (like me) never studied English at school.
This was some kind of joke too. :) Good luck!

LOL....

Point well taken. ;-)

To clarify:

A 221 abort refers to a relatively new BOINC feature where the project can request the host to not run a task it was sent if and only if that task has not started at all by the host. They will show as aborted by the project as a redundant result on the Host Summary pages. The 221 refers to the error code shown in the more detailed information you get on the Task and Workunit pages.

Alinator

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.