What's with these 28 day tasks with 2 week deadlines?

JerryLerman
JerryLerman
Joined: 11 Feb 15
Posts: 1
Credit: 11076813
RAC: 657
Topic 224839

I've suspended work for Einstein@home because the recent tasks are estimated to run ridiculously long, in one case, 28 days, with a 14-day deadline. It seems not worth the effort to try to complete them.

For example:


Application Gravitational Wave search O2 Multi-Directional 2.09 (GWnew)
Name h1_0930.45_O2C02Cl5In0__O2MD1S3_Spotlight_931.50Hz_1844
State Project suspended by user
Received 2/13/2021 11:38:17 AM
Report deadline 2/27/2021 11:38:18 AM
Estimated computation size 144,000 GFLOPs
CPU time 00:04:05
CPU time since checkpoint 00:04:05
Elapsed time 00:05:04
Estimated time remaining 28d 07:05:49    <---- HUH??
Fraction done 0.211%
Virtual memory size 2.10 GB
Working set size 2.10 GB
Directory slots/4
Process ID 24072
Progress rate 2.520% per hour
Executable einstein_O2MD1_2.09_windows_x86_64__GWnew.exe

 

 

 

 

Keith Myers
Keith Myers
Joined: 11 Feb 11
Posts: 4753
Credit: 17681796802
RAC: 5723283

Since you are new here, you

Since you are new here, you haven't completed enough work for BOINC to make any kind of valid estimated time to complete calculations.

You need to report 10 valid tasks for BOINC to calculate an APR normally.

But since Einstein uses the DCF formula instead of APR it is slightly different.  Still need to report work for the real DCF to be calculated.

The GW tasks do not take that long to complete.  Just let the tasks run.

 

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5846
Credit: 109975723612
RAC: 29543326

JerryLerman wrote:What's with

JerryLerman wrote:

What's with these 28 day tasks with 2 week deadlines?

I've suspended work for Einstein@home because the recent tasks are estimated to run ridiculously long, in one case, 28 days, with a 14-day deadline. It seems not worth the effort to try to complete them.

First of all, there is no such thing as a 28 day task and an estimate is just an estimate which can get really screwed up (as possibly happened in this case) by unrelated things happening on the machine.

It seems to be possible to work out how this might have happened so here's my take on the cause.  It's just a guesstimate so please correct if you know otherwise.

You didn't supply the actual host ID so I looked at all 3 that are listed.  You seem to have two with identical hardware, one that last contacted the project a couple of days ago and the other much more recently.  Sure enough, the more recent one first contacted the project a couple of hours after the older one last did - in other words, it seems like a brand new host ID for the same hardware.

Perhaps you were having issues with the old one in some way.  I first wondered if you had reset the project but I don't think you get a brand new host ID by doing that.  If in some way you did take action to get a new host ID, please realise that the outstanding tasks associated with the old ID are now in limbo and waiting to 'time-out'.  When they do, they will be reissued to someone else.  It's good practice to remove this delay by aborting and returning any outstanding tasks before doing anything that will create a new host ID.

In the tasks list for the new host ID, you have tasks for 3 different searches, gamma-ray pulsar (GRP) CPU tasks and gravitational wave (GW) CPU and GPU tasks.  The first thing I noticed was that completed GRP tasks show an elapsed time close to double the actual CPU time.  In other words, when they are running, they have to fight with something else to get access to a CPU core.  Your machine seems to be heavily overloaded.

This is borne out by looking at the GW GPU task run times.  There is one showing at 1,047 secs which is probably what you would expect for your high end GPU.  But look at the two other examples immediately above that one - 7,943 and 31,939 secs.   And just below, there is a computation error after 48,003 secs.  I looked at the exit status and saw:-

Exit status: 197 (0x000000C5) EXIT_TIME_LIMIT_EXCEEDED

In other words, the task didn't actually fail - it just couldn't complete in the allowed time so BOINC pulled the plug whilst it was really struggling.

In summary, if your machine (outside of BOINC) has a normal heavy workload (huge video editing or gaming, large databases, etc.), then it's probably not a good idea to run multiple searches using all CPU cores and the GPU.  You should use your preferences to limit the type of search to one that uses just those resources the machine isn't already maxing out.

On the other hand, if the machine is supposed to be doing normal 'office' type stuff only, with no known really heavy loads, perhaps you should consider investigating for some sort of malware.  If you're sure it's not anything like that, do an experiment with all GW tasks (both CPU and GPU) suspended to see if the GRP tasks on their own can run without the huge discrepancy between CPU time and elapsed time.  Even if each CPU core was running a separate GRP task, the two times should be quite close to each other.

As a final comment, it's now easy to understand why BOINC has increased the estimated run times to what you have observed.  As you would imagine, if a task that normally takes say 1,000 secs, suddenly is seen not to be able to even finish in 48,000 secs, this will cause BOINC to panic and try to protect itself by increasing the duration correction factor (DCF) to ridiculous levels.

Once you have worked out what is causing this apparent overloading, the quickest way to get BOINC corrected will be to edit the state file (with BOINC not running) to put the value of <duration_correction_factor> in the Einstein Project section back to its default value of 1.000000.   When you get things sorted, I can guide you through the job of correcting the DCF.  You just need to make a simple edit with a plain text editor.  Post a followup message when you're ready to fix it.  You really need to sort out the apparent overloading issue first.

Cheers,
Gary.

mmonnin
mmonnin
Joined: 29 May 16
Posts: 291
Credit: 3232277016
RAC: 186736

No such thing as a 28 day

No such thing as a 28 day task... at E@H. The tasks at CPDN right now can last that long and RNA World tasks last 6mo-1yr.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.