WU having problems completing

Heflin
Heflin
Joined: 5 Mar 05
Posts: 8
Credit: 267,146
RAC: 0
Topic 193584

I have an Einstein WU that doesn’t seem to be able to complete even thought it has been running at ‘high priority’ for well more than a week.

BOINC (5.10.30) as a SERVICE is set to run all the time, the Win2K Ver 5SP4 machine is used mostly for e-mail, word-processing, and web browsing; total CPU time is ‘only’ 24 hours; and interestingly BOINC keeps suspending all work for MANY hours or days (appears till I next open boinc manager).

I know the times are estimates, but I’ve noticed in BOINC the total CPU time increasing without any decrease in ‘to completion’ time. In fact, I’ve seen multiple instances where the ‘to completion’ time jumps to a larger number (with no decrease or change for at least minutes before or after).

I’ve rebooted the machine with no major difference.
I know the deadline for the WU has passed, but at this point I want to see if the WU can even be completed.
It is frustrating to have wasted more than a week of processing time.

Have others seen BOINC SERVICE suspending work till boinc-manager is reopened?
Or is this an Einstein WU issue?

I wonder if this is related to (McAfee) ePolicy Ochestrator Agent 3.6.0.574 which never seems to work right.

BOINC log: notice last 2 lines
3/17/2008 9:54:38 PM||Starting BOINC client version 5.10.30 for windows_intelx86
3/17/2008 9:54:38 PM||log flags: task, file_xfer, sched_ops
3/17/2008 9:54:38 PM||Libraries: libcurl/7.17.1 OpenSSL/0.9.8e zlib/1.2.3
3/17/2008 9:54:38 PM||Executing as a daemon
3/17/2008 9:54:38 PM||Data directory: C:\\Program Files\\BOINC
3/17/2008 9:54:38 PM||BOINC is running as a service and as a non-system user.
3/17/2008 9:54:38 PM||No application graphics will be available.
3/17/2008 9:54:39 PM|SETI@home|Found app_info.xml; using anonymous platform
3/17/2008 9:54:40 PM||Processor: 1 GenuineIntel Intel(R) Pentium(R) 4 CPU 2.40GHz [x86 Family 15 Model 2 Stepping 4]
3/17/2008 9:54:40 PM||Processor features: fpu tsc sse mmx
3/17/2008 9:54:40 PM||OS: Microsoft Windows 2000: Professional Edition, Service Pack 4, (05.00.2195.00)
3/17/2008 9:54:40 PM||Memory: 1021.99 MB physical, 1.27 GB virtual
3/17/2008 9:54:40 PM||Disk: 37.21 GB total, 13.24 GB free
3/17/2008 9:54:40 PM||Local time is UTC -7 hours
3/17/2008 9:54:41 PM|Einstein@Home|URL: http://einstein.phys.uwm.edu/; Computer ID: 998993; location: home; project prefs: default
3/17/2008 9:54:41 PM|Predictor @ Home|URL: http://predictor.scripps.edu/; Computer ID: 351165; location: home; project prefs: default
3/17/2008 9:54:41 PM|SETI@home|URL: http://setiathome.berkeley.edu/; Computer ID: 3746978; location: home; project prefs: default
3/17/2008 9:54:41 PM||General prefs: from http://boinc.bakerlab.org/rosetta/ (last modified 05-Jan-2008 13:06:55)
3/17/2008 9:54:41 PM||Host location: home
3/17/2008 9:54:41 PM||General prefs: no separate prefs for home; using your defaults
3/17/2008 9:54:41 PM||Reading preferences override file
3/17/2008 9:54:41 PM||Preferences limit memory usage when active to 766.49MB
3/17/2008 9:54:41 PM||Preferences limit memory usage when idle to 1011.77MB
3/17/2008 9:54:41 PM||Preferences limit disk usage to 13.23GB
3/17/2008 9:54:41 PM|Einstein@Home|Task h1_0877.40_S5R3__360_S5R3b_2 is 3.33 days overdue.
3/17/2008 9:54:41 PM|Einstein@Home|You may not get credit for it. Consider aborting it.
3/17/2008 9:55:32 PM|Einstein@Home|Restarting task h1_0877.40_S5R3__360_S5R3b_2 using einstein_S5R3 version 426
3/17/2008 9:57:39 PM||Suspending computation
3/18/2008 8:17:20 AM||Resuming computation
3/18/2008 8:45:13 AM||Suspending computation
3/19/2008 8:48:47 AM||Resuming computation
3/19/2008 9:06:38 AM||Suspending computation
3/20/2008 9:24:16 AM||Resuming computation
3/20/2008 9:46:54 AM||Suspending computation
3/20/2008 9:59:53 AM||Resuming computation
3/20/2008 11:32:23 AM||Suspending computation
3/20/2008 11:33:03 AM||Resuming computation
3/20/2008 12:22:38 PM||Suspending computation
3/20/2008 12:26:20 PM||Resuming computation
3/20/2008 12:40:23 PM|SETI@home|Sending scheduler request: To fetch work. Requesting 2422 seconds of work, reporting 0 completed tasks
3/20/2008 12:40:26 PM||A new version of BOINC (5.10.45) is available for your computer
3/20/2008 12:40:26 PM||Visit http://boinc.berkeley.edu/download.php to get it.
3/20/2008 12:40:28 PM|SETI@home|Scheduler request succeeded: got 1 new tasks
3/20/2008 12:40:31 PM|SETI@home|Started download of 28mr07ah.6849.18886.6.7.11
3/20/2008 12:40:32 PM|SETI@home|Finished download of 28mr07ah.6849.18886.6.7.11
3/20/2008 12:41:14 PM||Suspending computation
3/20/2008 12:42:29 PM||Resuming computation
3/20/2008 12:50:05 PM||Suspending computation
3/20/2008 12:50:29 PM||Resuming computation
3/20/2008 12:54:31 PM||Suspending computation
3/20/2008 1:07:56 PM||Resuming computation
3/20/2008 2:02:50 PM||Suspending computation
3/21/2008 8:13:09 AM||Resuming computation

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5,870
Credit: 115,905,340,982
RAC: 35,432,924

WU having problems completing

Quote:
I have an Einstein WU that doesn’t seem to be able to complete even thought it has been running at ‘high priority’ for well more than a week.

It is now shown as successfully completed but with no credit because it was so late. CPU time is listed as 90K secs (probably quite normal for your machine) but wall clock time was almost a month.

Quote:
BOINC (5.10.30) as a SERVICE is set to run all the time, the Win2K Ver 5SP4 machine is used mostly for e-mail, word-processing, and web browsing; total CPU time is ‘only’ 24 hours; and interestingly BOINC keeps suspending all work for MANY hours or days (appears till I next open boinc manager).

There are various preference settings that cause BOINC to suspend processing. Have you thoroughly checked all of your preferences? It's also possible that anti-malware software may be interfering. Have you tried suspending any anti-malware software for a period to see if the problem goes away?

Quote:
I know the times are estimates, but I’ve noticed in BOINC the total CPU time increasing without any decrease in ‘to completion’ time. In fact, I’ve seen multiple instances where the ‘to completion’ time jumps to a larger number (with no decrease or change for at least minutes before or after).

What you describe may be normal. Without a better description, particularly with some hard numbers on times and percentages, it's impossible to say.

Quote:
I’ve rebooted the machine with no major difference.
I know the deadline for the WU has passed, but at this point I want to see if the WU can even be completed.
It is frustrating to have wasted more than a week of processing time.

Well you certainly have an answer to whether or not the task could complete. Have you used something like Task Manager to check if the BOINC service and the Einstein science app are both running all the time? The science app should be using a very high percentage of CPU cycles and the idle loop should be getting nothing.

Quote:
Have others seen BOINC SERVICE suspending work till boinc-manager is reopened?
Or is this an Einstein WU issue?

No and No.

The BOINC service is a background process that is running all the time. It should be quite independent of whether or not the Manager is running. Stopping or starting the Manager should have no effect on the BOINC service. I have seen the odd example where the BOINC service (and consequently the science app) does stop running for some reason. Starting the Manager will also start BOINC which is the normal single user (non-service) mode of operation. This is pretty obvious because the messages log show BOINC being started by the Manager when it should have been running all along.

Quote:
I wonder if this is related to (McAfee) ePolicy Ochestrator Agent 3.6.0.574 which never seems to work right.

I have no idea. Shut it down for a few days and see if things go back to normal.

In the whole period shown by your log snippet (almost 3 days), there was only one download of a single Seti task. There were no uploads whatsoever which is quite strange for almost 3 days. Is Seti work being suspended and making no progress as well? If so, it would appear that it is an external issue of something interfering with BOINC rather than a project issue.

Cheers,
Gary.

Alinator
Alinator
Joined: 8 May 05
Posts: 927
Credit: 9,352,143
RAC: 0

RE: RE: I wonder if

Message 80131 in response to message 80130

Quote:

Quote:
I wonder if this is related to (McAfee) ePolicy Ochestrator Agent 3.6.0.574 which never seems to work right.

I have no idea. Shut it down for a few days and see if things go back to normal.

In the whole period shown by your log snippet (almost 3 days), there was only one download of a single Seti task. There were no uploads whatsoever which is quite strange for almost 3 days. Is Seti work being suspended and making no progress as well? If so, it would appear that it is an external issue of something interfering with BOINC rather than a project issue.

Hmmmm...

I don't know if he would have the option of shutting it down. This ePO Agent sounds like the client portion of an Enterprise Policy Management package. The machine might run, but you will most likely not be able to get to resources which are the whole point of having the machine in the first place.

One thing you can do though; What are the Time performance metrics for the host showing (the 4 between the Last Contact Time and the Location on the Computer Summary page)? The project doesn't displayed those to the general public. My bet would be one or more of them is 'taking it on the nose'. My bet is the CPU Efficiency.

Alinator

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.