A36 issue? 94 hours to crunch ONE WU !

UBT - Timbo
UBT - Timbo
Joined: 18 Jan 05
Posts: 13
Credit: 30160326
RAC: 9764
Topic 190955

Hi all,

OK - so A36 is now "old hat" - but such is the pace of development change that some (like me) are still using it on remote PC's so cannot easily upgrade them quickly...!

Anyways - can someone take a look at this - I left it running as I thought it was "strange"

This is on a "clunker" of a PC - P3 @ 550MHz (it's a small file server, so it's on 24/7 anyways, so why not do something useful before it gets upgraded !! But it works OK for other projects with lesser requirements)!!

Result ID 21947155
Name r1_1495.5__1707_S4R2a_0
Workunit 5907773
Created 19 Mar 2006 10:46:11 UTC
Sent 19 Mar 2006 10:46:12 UTC
Received 24 Mar 2006 13:58:51 UTC
Server state Over
Outcome Success
Client state Done
Exit status 0 (0x0)
Computer ID 427166
Report deadline 2 Apr 2006 10:46:12 UTC
CPU time 334745.806
stderr out 5.2.13

2006-03-19 10:59:27.3899 [normal]: Start of BOINC application 'projects/einstein.phys.uwm.edu/albert_4.37_windows_intelx86.exe'.
2006-03-19 10:59:27.5000 [normal]: Started search at lalDebugLevel = 0
2006-03-19 10:59:32.4899 [normal]: Checkpoint-file 'Fstat.out.ckp' not found.
2006-03-19 10:59:32.4899 [normal]: No usable checkpoint found, starting from beginning.

2006-03-19 11:11:49.5399 [normal]: Optimized by akosf (A-36) --> 'projects/einstein.phys.uwm.edu/albert_4.37_windows_intelx86.exe'
2006-03-19 11:11:49.5900 [normal]: Started search at lalDebugLevel = 0

2006-03-19 14:55:36.4099 [normal]: Optimized by akosf (A-36) --> 'projects/einstein.phys.uwm.edu/albert_4.37_windows_intelx86.exe'
2006-03-19 14:55:36.4099 [normal]: Started search at lalDebugLevel = 0
2006-03-19 14:55:40.3099 [normal]: Found checkpoint-file 'Fstat.out.ckp'
2006-03-19 14:55:40.3099 [normal]: Trying to read Fstat-file into toplist ...
2006-03-19 14:56:21.2299 [normal]: Checksum Ok. Successfully read_toplist_from_fp()
2006-03-19 14:56:21.2299 [normal]: Resuming computation at (434/215995118/4334945).

2006-03-20 00:04:16.6599 [normal]: Optimized by akosf (A-36) --> 'projects/einstein.phys.uwm.edu/albert_4.37_windows_intelx86.exe'
2006-03-20 00:04:16.7199 [normal]: Started search at lalDebugLevel = 0
2006-03-20 00:04:21.3899 [normal]: Found checkpoint-file 'Fstat.out.ckp'
2006-03-20 00:04:21.3899 [normal]: Trying to read Fstat-file into toplist ...
2006-03-20 00:05:07.0299 [normal]: Checksum Ok. Successfully read_toplist_from_fp()
2006-03-20 00:05:07.0799 [normal]: Resuming computation at (4466/215995118/4334945).

2006-03-20 12:11:14.3699 [normal]: Optimized by akosf (A-36) --> 'projects/einstein.phys.uwm.edu/albert_4.37_windows_intelx86.exe'
2006-03-20 12:11:14.4200 [normal]: Started search at lalDebugLevel = 0
2006-03-20 12:11:18.3199 [normal]: Found checkpoint-file 'Fstat.out.ckp'
2006-03-20 12:11:18.3199 [normal]: Trying to read Fstat-file into toplist ...
2006-03-20 12:12:00.1199 [normal]: Checksum Ok. Successfully read_toplist_from_fp()
2006-03-20 12:12:00.1199 [normal]: Resuming computation at (8510/215995118/4334945).

2006-03-21 03:38:51.9699 [normal]: Optimized by akosf (A-36) --> 'projects/einstein.phys.uwm.edu/albert_4.37_windows_intelx86.exe'
2006-03-21 03:38:52.1899 [normal]: Started search at lalDebugLevel = 0
2006-03-21 03:38:59.2699 [normal]: Found checkpoint-file 'Fstat.out.ckp'
2006-03-21 03:38:59.3299 [normal]: Trying to read Fstat-file into toplist ...
2006-03-21 03:39:45.3499 [normal]: Checksum Ok. Successfully read_toplist_from_fp()
2006-03-21 03:39:45.3499 [normal]: Resuming computation at (12488/215995118/4334945).

2006-03-21 13:56:21.4599 [normal]: Optimized by akosf (A-36) --> 'projects/einstein.phys.uwm.edu/albert_4.37_windows_intelx86.exe'
2006-03-21 13:56:21.6800 [normal]: Started search at lalDebugLevel = 0
2006-03-21 13:56:30.1999 [normal]: Found checkpoint-file 'Fstat.out.ckp'
2006-03-21 13:56:30.1999 [normal]: Trying to read Fstat-file into toplist ...
2006-03-21 13:57:19.1899 [normal]: Checksum Ok. Successfully read_toplist_from_fp()
2006-03-21 13:57:19.1899 [normal]: Resuming computation at (15947/215995118/4334945).

2006-03-22 10:45:53.0100 [normal]: Optimized by akosf (A-36) --> 'projects/einstein.phys.uwm.edu/albert_4.37_windows_intelx86.exe'
2006-03-22 10:45:53.0599 [normal]: Started search at lalDebugLevel = 0
2006-03-22 10:45:57.2399 [normal]: Found checkpoint-file 'Fstat.out.ckp'
2006-03-22 10:45:57.2399 [normal]: Trying to read Fstat-file into toplist ...
2006-03-22 10:46:50.1299 [normal]: Checksum Ok. Successfully read_toplist_from_fp()
2006-03-22 10:46:50.1299 [normal]: Resuming computation at (28751/215995118/4334945).

2006-03-23 12:26:08.5000 [normal]: Optimized by akosf (A-36) --> 'projects/einstein.phys.uwm.edu/albert_4.37_windows_intelx86.exe'
2006-03-23 12:26:08.7199 [normal]: Started search at lalDebugLevel = 0
2006-03-23 12:26:24.9200 [normal]: Found checkpoint-file 'Fstat.out.ckp'
2006-03-23 12:26:24.9200 [normal]: Trying to read Fstat-file into toplist ...
2006-03-23 12:27:16.9399 [normal]: Checksum Ok. Successfully read_toplist_from_fp()
2006-03-23 12:27:16.9399 [normal]: Resuming computation at (43509/215995118/4334945).
2006-03-24 13:50:55.6499 [normal]: Search finished successfully.


Validate state Invalid
Claimed credit 384.277984848295
Granted credit 0
application version 4.37

The real issue here is: after 94 hours, I got ZERO credit...Thanks to the project - you really know how to hurt a guy !

As for the other two WU's left on this machine - they will be aborted because already they are shown to be another 92+ hours before completion - and no point carrying on with them !!.

So, this machine will go back to being an optimised-SETI only cruncher...!

regards,

Tim


regards,
Tim

UK BOINC Team Founder
Join the UK BOINC Team: http://www.ukboincteam.org.uk/newforum

Akos Fekete
Akos Fekete
Joined: 13 Nov 05
Posts: 561
Credit: 4527270
RAC: 0

A36 issue? 94 hours to crunch ONE WU !

Hi Timbo!

Did you find the reason of this problem?
Or could you give me the link of this result?
Perhaps, it would be interesting!

Oh, and I feel with you!
But every interesting thing has at least one fault.

Ziran
Ziran
Joined: 26 Nov 04
Posts: 194
Credit: 356403
RAC: 1626

The result can be found hear

The result can be found hear

94H would sounds high event with the standard ap. My old PIII 450 did an old Einstein unit in 40H. The longest Albert results should take only 30-35H on your machine using the standard application. With A36 the result should be done in 10-20H.

Edit. You are running multiple project and windows 98 on this host, right? The 94H is because of the time bug with BOINC and windows 98. What is your “switch project every� set to?

Then you're really interested in a subject, there is no way to avoid it. You have to read the Manual.

Stef
Stef
Joined: 8 Mar 05
Posts: 206
Credit: 110568193
RAC: 0

RE: Or could you give me

Message 26514 in response to message 26512

Quote:
Or could you give me the link of this result?


I think it's this one:
http://einsteinathome.org/task/21947155
But looks like BOINC-related problem to me, rather than A-36.

Akos Fekete
Akos Fekete
Joined: 13 Nov 05
Posts: 561
Credit: 4527270
RAC: 0

Thanks for the links! I

Thanks for the links!

I have never seen fault under Einstein@Home in same constellation.
The validator said it to invalid, so the imprecise result is just possible.

The 94 hours is come from Win98, it isn't careful in task time measuring.

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 2139
Credit: 2752618030
RAC: 1501563

I have a couple of old

I have a couple of old Windows 98 machines, a P II 350MHz and a Celeron 400MHz, both running Seti and Einstein.

I've sometimes had these very long run times, especially on the P II (though never so bad that it failed to validate....)

They seem to run much smoother and quicker if you set the flag to keep the app in memory when suspended.

Could yours be switched out of memory? Would that account for the multiple restarts in the result file?

UBT - Timbo
UBT - Timbo
Joined: 18 Jan 05
Posts: 13
Credit: 30160326
RAC: 9764

RE: Hi Timbo! Did you find

Message 26517 in response to message 26512

Quote:
Hi Timbo! Did you find the reason of this problem? Or could you give me the link of this result? Perhaps, it would be interesting!
Oh, and I feel with you! But every interesting thing has at least one fault.

Well, I wasn't sure about it at all.

So, to start with, as the progress moved so slowly I thought - maybe it's like some of the SETI Classic WU's that took a long time - the ones with low angle range..!.

But as it went on and on and on, and other PC's (in my small collection) were returning valid work much more quickly. So I thought must be something wrong with the PC - but I didn't want to stop, just in case it's like the current Rosetta problem.

So, the machine is on 24/7 only working on Einstein - I don't know why it switched - except that it had a cache of 3 Einstein WU's - but it didn't start the other two at all....they were always at zero progress and zero time.

After it finished, I thought something is strange, so had to abort the 2 cache WU's and instead I downloaded some SETI WU's - they seemed to progress OK.

So, I was still "stuck".

Everything was normal on the PC - all seemed OK

In the end, I rebooted the PC and started fresh with a single new Einstein to check.

It finshed in 7 hours - like normal - but I did change the client over to S39L "just in case" - see this result here:
http://einsteinathome.org/task/22575228

So, the PC must have had some "process going on - but I'd already checked the Task Manager and nothing was working except "normal services" and BOINC....!

VERY strange....!

A lesson to everyone with a farm - don't assume the PC is always working OK, especially if on 24/7.

Will try and figure out answers to otehr comments tomorrow.

Too late right now (12:30 am UK)

regards,

Tim
(edit) added new result link

regards,
Tim

UK BOINC Team Founder
Join the UK BOINC Team: http://www.ukboincteam.org.uk/newforum

archae86
archae86
Joined: 6 Dec 05
Posts: 3145
Credit: 7022984931
RAC: 1834456

Of my four machines running

Of my four machines running both Einstein and SETI, two are Pentium IIIs running Win98SE.

Both have had instances of extremely slow or zero rate of progress, leading to eventual reporting of completed results with extremely long execution times (up to 10x normal, I think).

Someone suggested that turn the general preference setting:

Leave applications in memory while preempted?

to No for these two machines. I've done so, and not seen that particular misbehavior since. But I can't affirm this was cause and effect, as the issue was irregular.

However, one of the two machines still occasionally gets into what I might call "double count time" mode.

In this state, boincmgr ticks along updating CPU time every 5 seconds, as usual, but the CPU time increments by 10 seconds!! If I don't notice until completion, I believe it returns the result with this falsified long time. If I do notice, and either reboot the PC, or, I think, just exit and restart BOINCmgr, on restart it displays half the previously displayed accumulated CPU for that result and increments by 5 seconds as it should.

Among the several applications and the OS involved, I certainly can't apportion blame--but I'd not assume akosf's science application was specifically at fault without more evidence.

I need to get the Win98SE machines up to XP, the question is how to do the conversion, how much hardware to change out (lots), or whether to just get new machines and abandon the installations I can't transfer from the old successfully.

UBT - Timbo
UBT - Timbo
Joined: 18 Jan 05
Posts: 13
Credit: 30160326
RAC: 9764

RE: I need to get the

Message 26519 in response to message 26518

Quote:
I need to get the Win98SE machines up to XP, the question is how to do the conversion, how much hardware to change out (lots), or whether to just get new machines and abandon the installations I can't transfer from the old successfully.

I guess with PC's so cheap these days, it's probably a question of upgrading to a new machine - a new copy of Win XP Pro (in UK) costs nearly as much as a better-spec PC (with a faster CPU/bigger HDD/faster CDROM/more memory) including at least Win XP Home....

So, it might be time to say "bye, bye" to the old (but still faithful)workhorse's we both have..!

regards,

Tim

regards,
Tim

UK BOINC Team Founder
Join the UK BOINC Team: http://www.ukboincteam.org.uk/newforum

RC
RC
Joined: 20 Mar 05
Posts: 1
Credit: 35032782
RAC: 0

RE: I guess with PC's so

Message 26520 in response to message 26519

Quote:

I guess with PC's so cheap these days, it's probably a question of upgrading to a new machine - a new copy of Win XP Pro (in UK) costs nearly as much as a better-spec PC (with a faster CPU/bigger HDD/faster CDROM/more memory) including at least Win XP Home....

So, it might be time to say "bye, bye" to the old (but still faithful)workhorse's we both have..!

Either that, or convert to Linux. I have an AMD K6/2 350 that is too slow even for Windows 2000 (never mind XP) but runs Linux quite well. It takes a few days to complete a WU, though...

StoneLord
StoneLord
Joined: 15 Jun 05
Posts: 13
Credit: 50798
RAC: 0

I have also small problem

I have also small problem with A-36. One of my results is invalid

http://einsteinathome.org/task/21183490

Can U tell my why?

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.