Multi-Directed Continuous Gravitational Wave search Tuning run G v1.01 (SSE2)
i686-pc-linux-gnu
seems consistently (7 times out of 7) to terminate with an error after approximately 6 seconds run-time on my
GenuineIntel Intel(R) Core(TM)2 Duo CPU E4500 @ 2.20GHz [Family 6 Model 15 Stepping 13]
running Linux 4.4.0-42-generic (32 bit Ubuntu 14.04).
No problems to date on 64 bit Ubuntu 16.04 (GenuineIntel Pentium(R) Dual-Core CPU T4500 @ 2.30GHz [Family 6 Model 23 Stepping 10])
also running Linux 4.4.0-42-generic, although the two WUs have yet to complete so far (~2 hours run-time).
Alan Barnes
Copyright © 2024 Einstein@Home. All rights reserved.
I'm having the same problem
)
I'm having the same problem on an i5-3570k host, Kubuntu 14.04 64bit. Tasks run for 7 sec then error out - 26 out of 26 so far... Here's a link to the host: https://einsteinathome.org/host/5501745/tasks/error.
Same here on Linux Mint 17.3
)
Same here on Linux Mint 17.3 64 bit, 3.19.0-32-generic, i7-4702MQ.
46 of 46 terminated after few seconds.
https://einsteinathome.org/host/11467994/tasks/error
I noticed the problem too and
)
I noticed the problem too and I'm still monitoring it. So far I couldn't find a pattern why this is happening on some hosts. It seems to be happening on Linux only but is not related to specific CPU or OS version. It also does not look like an application problem as there are some successful results using the same app on the same CPU type.
I could not see any Linux
)
I could not see any Linux hosts with AVX enabled CPUs generating results, i probably didn't search hard enough - my sample size is drawn from those reporting success and failures in these threads.
Edit: after searching some more - i did find one returning good results, so apologies, just a herring coloured red.
See my post https://einsteinathome.org/goto/comment/150777
(There are other non AVX enabled hosts generating errors with the other application.)
I have one host generating a detailed stack trace -
Just to add to the
)
Just to add to the strangeness.
I have a host https://einsteinathome.org/host/11905468 which is generating the errors.
Using Virtualbox a i created a VM - same OS and installed boinc and attached to E@H this is the (virtual) host https://einsteinathome.org/host/12268233
It does not error out tasks, and has been crunching away for several hours without error.
Not sure if that helps but at least i have a method around the error.
Edit: Real host
Edit: Virtual host (Virtualbox 5.0.18)
Edit2: I should have mentioned the real host has produced some good results
Some folks may see some
)
Some folks may see some DCF-related craziness in running this work. My most capable host has been running just BRP4G-cuda55 work on a GTX1070 plus a GTX 1060 and no CPU tasks. The DCF at this moment is reported as .40 which gives elapsed time estimates for the BRP4G work in queue of 0:36:53, near the midpoint of the actual 29 minutes for (3x) 1070 tasks and 42 minutes for 1060 tasks.
The rub is that the estimated elapsed time for the 1.01 Multi-Directed CV tasks is showing as 1:57:00, while the single task in progress has reached 94.6% completion at 13:53:00 elapsed time. So presumably on completion of that task the DCF will bump straight up to something quite near 3.0, raising the estimated amount of GPU work estimated in queue by over a factor of 7.
I don't know whether my CV task is an unusually difficult one, or even whether there is something mis-configured on my machine that is greatly slowing it. But if this is typical I think the estimated work contained in the CV tasks may need a substantial revision upward to behave well in scheduling.
archae86 wrote:I don't know
)
By coincidence the host i mentioned below is running the same CPU, task completion times are quite varied ranging from 829s to 25,884s (along with many 8s errors). My other i7-860 host has been quite stable around 16000s. The virtual host has not completed any yet but is looking to be around 30,000s
AgentB wrote:By coincidence
)
Do the tasks arrive with varying work required estimates? Or does the server send them all marked the same (as indicated by the "Remaining (estimated)" column in the boincmgr tasks list, or the "Time Left" column in BoincTasks)?
archae86 wrote:Do the tasks
)
I hadn't noticed that at the time, but yes looking over the job_log files for these tasks you see the large difference in the "flops" figure (smallest*25 = largest) which explains the difference.
1476397563 ue 536.781145 ct 975.752000 fe 5760000000000 nm h1_0034.75_O1C02Cl1In1C__O1MD1TCV_CasA_34.85Hz_1_0 et 987.293506 es 0 1476397572 ue 536.781145 ct 985.924000 fe 5760000000000 nm h1_0034.70_O1C02Cl1In1C__O1MD1TCV_CasA_34.80Hz_1_0 et 997.928398 es 0 1476397575 ue 536.781145 ct 982.964000 fe 5760000000000 nm h1_0034.80_O1C02Cl1In1C__O1MD1TCV_CasA_34.90Hz_1_0 et 994.826568 es 0 1476398393 ue 536.781145 ct 822.328000 fe 5760000000000 nm h1_0034.85_O1C02Cl1In1C__O1MD1TCV_CasA_34.95Hz_1_0 et 829.568351 es 0 1476429311 ue 939.367003 ct 1965.548000 fe 10080000000000 nm h1_0149.05_O1C02Cl1In1C__O1MD1TCV_VelaJr_149.15Hz_1_1 et 1990.800820 es 0 1476439039 ue 5904.592593 ct 11605.230000 fe 63360000000000 nm h1_0149.05_O1C02Cl1In1C__O1MD1TCV_CasA_149.20Hz_2_1 et 11718.838962 es 0 1476453205 ue 13419.528620 ct 25663.980000 fe 144000000000000 nm h1_0149.05_O1C02Cl1In1C__O1MD1TCV_VelaJr_149.15Hz_0_1 et 25884.797168 es 0
While CV work distribution
)
While CV work distribution has been continuing at a rapid pace, G work dried up quite a few hours ago, and the O1MD1TG work generator line on the Einstein server status page has shown red "not running" during much of that time. The legend asserts that not running status means "Program failed or ran out of work (or the project is down)".