Here is the job info:
Job ID: p2030_53614_09366_0084_G73.19-00.29.N_3.dm_532_0
Binary: einsteinbinary_ABP1 version 104
The job has been running for 7 hours and claims to be at 2.730% progress. BOINC reports that it will complete at 14:48, with the completion time going up one second every second. Just dividing by the progress gives an expected completion time of 200 hours, 30 times more than normal jobs.
I have tried suspending and resuming this job to no avail. Another job with the same binary is running smoothly (is already at 10% after less than an hour)
Any ideas? Am I just supposed to abort jobs like this? Or are the 200 hours of computation useful?
Unlike that other post, it is actually taking up 100% CPU, so it's definitely doing something.
Sat 06 Jun 2009 02:23:44 PM PDT||Starting BOINC client version 6.2.18 for x86_64-pc-linux-gnu
Sat 06 Jun 2009 02:23:44 PM PDT||log flags: task, file_xfer, sched_ops
Sat 06 Jun 2009 02:23:44 PM PDT||Libraries: libcurl/7.18.2 OpenSSL/0.9.8g zlib/1.2.3.3 libidn/1.10
Sat 06 Jun 2009 02:23:44 PM PDT||Data directory: /var/lib/boinc-client
Sat 06 Jun 2009 02:23:44 PM PDT||Processor: 4 GenuineIntel Intel(R) Core(TM)2 Quad CPU @ 2.40GHz [Family 6 Model 15 Stepping 7]
Sat 06 Jun 2009 02:23:44 PM PDT||Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdc
Sat 06 Jun 2009 02:23:44 PM PDT||OS: Linux: 2.6.28-11-generic
Sat 06 Jun 2009 02:23:44 PM PDT||Memory: 3.81 GB physical, 4.92 GB virtual
Sat 06 Jun 2009 02:23:44 PM PDT||Disk: 24.62 GB total, 12.99 GB free
Sat 06 Jun 2009 02:23:44 PM PDT||Local time is UTC -7 hours
Sat 06 Jun 2009 02:23:44 PM PDT||No coprocessors
Here is the job info:
129391691 53825008 6 Jun 2009 22:14:53 UTC 20 Jun 2009 22:14:53 UTC In progress --- New
One other person has also received this job and is still crunching since the same time. I guess I'll leave it on overnight and see what happens.
Copyright © 2024 Einstein@Home. All rights reserved.
Arecibo Binary Search 1.04 taking 200 hours?
)
I have one just like it.
http://einsteinathome.org/workunit/53824866
5 hours 20 minutes in, 2.4% done.
Did I miss the stop at E@H and end up at CPDN? :-D
Kathryn :o)
Einstein@Home Moderator
I currently have two of
)
I currently have two of them.
Typically those ABP jobs run about 5-8 hours on those computers,
for these two 'G73' it claims to have less than 7% after more than 12 hours
of computing time with always increasing time to completion - hopefully these
are no black holes in the computing space, becoming always larger tasks with
more computation time ;o)
Hi guys, This is
)
Hi guys,
This is definitely not intended! Preliminary investigations show that this particular set of workunits (p2030_53614_09366_0084_G73.19-00.29.N_[2|3]*) is based on a corrupted data file that went unnoticed unfortunately.
We are looking into this more closely right now and we're going to cancel the workunits as soon as our first findings are confirmed. You should abort the tasks concerned when we cancelled the workunits on the server side. Until then I suggest you simple pause them and wait for further notice. I expect this to happen in the next 1-2 days...
Sorry for the inconvenience this might have caused!
Oliver
Einstein@Home Project
Thanks, I will pause this
)
Thanks, I will pause this task.
By the way, according to this page, someone else managed to finish my stalled work unit in only 10 hours of CPU time. How did that happen?
http://einsteinathome.org/workunit/53825008
Maybe the problem is specific
)
Maybe the problem is specific for intel processors?
The two samples I have run on intel too, the finished one we can see is on AMD.
However looking on the computation time and the percentage value,
this seems to result in a stable estimate of about 183 hours for
the jobs I have - this is within the deadline ;o)
RE: By the way, according
)
Well, the corrupted data file usually contains (or leads to) "NaN"s and different machines handle/interpret those differently.
In the meantime we determined the root cause of this problem and added additional checks to the workunit generator to detect this kind of data error before we start processing the affected files!
Cheers,
Oliver
Einstein@Home Project
Hmmmm... Well, you're
)
Hmmmm...
Well, you're going to have to make a ruling one way or the other on this set of tasks pretty quick now.
First off, not everyone can afford to sit on an EAH task for extended periods without causing schedule jams. I know I'm starting to approach the "fish or cut bait" point.
Secondly, I've started to notice people are starting to just summarily abort the set when they see them.
I really hate to abort tasks I've been assigned, but the one I have is not looking so grim as the other ones reported so far. OTOH, I don't want to make a 161 hour and have it die on user aborts or get canceled.
It's 12 hours in on K6 III/450 and showing just over 10% complete. Estimated runtime is about 60% more than what I've seem for ABPS on this host before, but is still less than half of the deadline.
Alinator
Update: the affected
)
Update: the affected workunits have been cancelled! This means no more tasks/results will be created for them. You may therefore now abort any local work based on the affected (see my previous post) workunits. Recent clients should do this automatically after their next server (scheduler) contact but I recommend doing this manually to save precious CPU cycles.
Cheers,
Oliver
Einstein@Home Project
Isn't it a good idea to post
)
Isn't it a good idea to post about it on the front page of the project as well, in the News section? That way it goes out as RSS to many more people who do not read the forums.
Just found this thread. I
)
Just found this thread. I have had a work unit running for 77 hours. Am I supposed to abort and get no credit ? That does not seem to be fair. I assume that the other person with this wu is unaware of the situation. If it a problem from Einstein they should cancell the work unit and return it ans give the appropriate credit.
http://einsteinathome.org/workunit/53824911
Proud Founder of
Have a look at my WebCam<