I did what you said and ran BOINC against the WU that has been stuck in my queue for 3 days. It picked it right up at 98% finished it and set home to you. from what I can see it is now correcting the other WUs as well. I'll keep you posted and if something hangs I'll send you another snapshot fo the hang. Good work on your part.
Highest regards (you the man)
Phil
We Must look for intelligent life on other planets as,
it is becoming increasingly apparent we will not find any on our own.
I have been watching for a while now and most things seem normal except for WU w1_0979.5__0979.5_0.1_T01S4hA_4. This WU was caught up in the stall during processing by 008 and somehow the time to complete got set to a very high number of hours. currently it shows 01:27:27 into the process with 4.85% complete and 28:35:50 to go. When it restarted with 008 the first time it showed over 2000 hours to complete. This is way abnormal. Usually the WUs start out showing 7:18:00 to complete and they usually finish within a few seconds of that time. I think it is slowly catching up with reality each time the process cycles through, but I will let you know what happens to it. I should have some result during the day tomorrow. Next time it comes up in the queue I will try to snap shot it and see if anything looks different than the other Wus.
So far the app is switching and playing well with others.
Regards
Phil
We Must look for intelligent life on other planets as,
it is becoming increasingly apparent we will not find any on our own.
This is the Status dump from the WU with the screwed up clock -
2005-08-03 22:51:59.380 sample[1128] Couldn't start c++filt for C++ name demangling
2005-08-03 22:52:00.703 sample[1128] thread_read_stack: stack appears to be in inconsistent state to trace. Truncating stack.
2005-08-03 22:52:02.197 sample[1128] thread_read_stack: stack appears to be in inconsistent state to trace. Truncating stack.
Analysis of sampling pid 1126 every 10.000000 milliseconds
Call graph:
193 Thread_0e0b
193 _dyld_start
193 _start
193 main
193 _Z24boinc_init_graphics_implPFvvEP16BOINC_MAIN_STATE
193 boinc_init_options_graphics_impl
193 _Z24xwin_graphics_event_loopv
193 sleep
193 nanosleep
193 clock_sleep_trap
193 clock_sleep_trap
193 Thread_0f03
191 _pthread_body
191 _Z6foobarPv
191 worker
116 TestLALDemod_for_cpu_type_1
115 TestLALDemod_for_cpu_type_1
1 __isfinited
1 __isfinited
30 cos
30 cos
26 sin
25 sin
1 sin
1 sin
11 PrintTopValues
11 qsort
11 qsort
10 qsort
8 qsort
7 qsort
6 qsort
4 qsort
2 qsort
1 qsort
1 qsort
1 qsort
1 compare
1 compare
1 compare
1 compare
1 compare
1 compare
1 qsort
1 qsort
3 EstimateFLines
2 ComputeOutliers
2 ComputeOutliers
2 ComputeOutliers
1 EstimateFLines
1 EstimateFLines
2 0x9012a184
2 0x9012a184
1 0x9012a188
1 0x9012a188
1 0xf1bd8
1 0xf1bd8
1 0xf1cd8
1 0xf1cd8
2 _sigtramp
2 _sigtramp
Total number in stack (recursive counted multiple, when >=5):
8 qsort
Sort by top of stack, same collapsed (when >= 5):
clock_sleep_trap 193
TestLALDemod_for_cpu_type_1 115
cos 30
sin 26
qsort 8
Sample analysis of process 1126 written to file /dev/stdout
Sampling process 1126 each 10 msecs 300 times
We Must look for intelligent life on other planets as,
it is becoming increasingly apparent we will not find any on our own.
w1_0979.5__0979.5_0.1_T01S4hA_4 completed last night with a normal length run time and was sent out this morning. After it terminated its run the app was left in memory but was using 0 cpu and did not seem to be bothering any other processes. I stopped and restarted BOINC and it now seems to be running and trading CPU slots just fine. There are still two WU that were downloaded under 008 but are now actually being processed by 011. There is 1 WU that has started fresh with 011. So by sometime this afternoon all the old stuff should cleared out. But even with the spoofed app things seem to be running ok. If I see anything i'll let you know.
Regards
phil
We Must look for intelligent life on other planets as,
it is becoming increasingly apparent we will not find any on our own.
The 011 seems to be runing very stabile and sharing the CPU on the powerbook G4 (System 382316). In fact it is running well enough that I put the "MacNN" optimized SETI Version 4.30 on the machine to see how well 011 would work with it. I will keep the other machine runing standard stuff during the Beta so as not to mess up the testing. I know a lot of folks are going to want to run some of the optimized S@H apps so I can at least test one of them for you. Besides it helps to speed up the processing so it gets back to E@H more offten and increases the number of swap outs.
So far 011 is running very well with the S@H optimized build. If I see anythiing I will let you know.
I won't know how the other system is doing until I get back to it later, but based on returns to other projects it must be running ok.
Looks like you guys may have killed another bug with 011.
Regards
Phil
We Must look for intelligent life on other planets as,
it is becoming increasingly apparent we will not find any on our own.
Well when I got to the machine (356364) it was processing 2 S@H WU in tandum, but the 011 app was still loaded in memory. It was doing almost nothing. From time to time it would use about .5% - 1% CPU and it had about 3Mb of memory reserved. In order to get things going again I stopped Boinc (had to do it twice) and restarted. It is now running 2 E@H WU in tandum and they seem normal. The are the last 2 WU on this system from the 008 hangups. When they are done the first fresh 011 WU will start.
Total number in stack (recursive counted multiple, when >=5):
Sort by top of stack, same collapsed (when >= 5):
__spin_lock_relinquish 89
clock_sleep_trap 88
Sample analysis of process 1553 written to file /dev/stdout
Sampling process 1553 each 10 msecs 300 times
I waited to let natural app switching restart E@H and the system tried to reuse the loded app but there was no indication of activity either on the Moinc manager or the Activity monitor. At that point Boinc was running 2 E@H WU in tandum. one was working the other was stalled. the working one was using memory and CPU normally. The stalled app was only using about 85% CPU with small fluxuations and showed no activity anywhere else. The first dump below is from the stalled app and the second dump is from the working app.
Total number in stack (recursive counted multiple, when >=5):
Sort by top of stack, same collapsed (when >= 5):
clock_sleep_trap 157
__spin_lock_relinquish 83
__spin_lock 74
Sample analysis of process 1553 written to file /dev/stdout
Sampling process 1553 each 10 msecs 300 times
Total number in stack (recursive counted multiple, when >=5):
5 qsort
Sort by top of stack, same collapsed (when >= 5):
clock_sleep_trap 277
TestLALDemod_for_cpu_type_1 173
sin 43
cos 33
0x9012a184 8
__isfinited 5
Sample analysis of process 1676 written to file /dev/stdout
Sampling process 1676 each 10 msecs 300 times
We Must look for intelligent life on other planets as,
it is becoming increasingly apparent we will not find any on our own.
Well it did the same thing again. it ran ok for a while then it switched from WU w1_0979.5_0979.5_0.1_T01A4hA_4, and left the app in memory. Same conditions as before. It went on to process 2 S@H WU in tandum with the E@H app still in memory. Of course I can not be certain but it looks as though the problem may be with the WU or at least the startup proceedure for the WU. It seems to hang as it returns to continue processing rather than simply staying in memory at the end.
I am also begining to suspect that this may be a Dual processor problem rather than just the same old hangup. i have a powerbook running all this stuff as well and it is not hanging up any more. The only diff is that the powerbook is not running Climate. But Climate does not seem to be the problem as the system is not switching to that app because of the nearing deadline on the E@H WUs. It might be because the Dual system can (and does) run different app on each processor that it somehow adversly effects the switch over.
Of course it could just be the WU too. I will try to process the WU out of the system tonight and see if that fixes things tomorrow.
Regards
Phil
We Must look for intelligent life on other planets as,
it is becoming increasingly apparent we will not find any on our own.
......I am also begining to suspect that this may be a Dual processor problem rather than just the same old hangup. i have a powerbook running all this stuff as well and it is not hanging up any more. The only diff is that the powerbook is not running Climate. But Climate does not seem to be the problem as the system is not switching to that app because of the nearing deadline on the E@H WUs. It might be because the Dual system can (and does) run different app on each processor that it somehow adversly effects the switch over.......
I have a single processor G4 (450mHz) but am getting the same 'hanging' problem with Einstein 0.11 as with 0.08. I didn't have the problem with the earlier beta versions 0.02 and 0.05, which let BOINC switch between Einstein and SETI seamlessly. As it takes so long for my Mac to run Einstein anyway, and the hanging means nothing gets processed at all, I'm sorry to say I've suspended Einstein till my current SETI units have finished.
"Mad Moggies" is correct. Sometime during the night my powerbook hung, and the G4 dual also had a retaind copy of the app stuck in memory as well. I am still not convinced that the WUs may be involved in some way. On my systems it will habg on the same WU until it is gone. Some of them run just fine. On the dual system it just seems odd that the one WU hangs and the others do not. I just sent the problem one in and have started working the first WU that will run start to finish with 011. We'll see.
This is the result from one of the two old WUs on the Dual system. The one that has been the most trouble will not be ready until tonight.
Bernd- I did what you said
)
Bernd-
I did what you said and ran BOINC against the WU that has been stuck in my queue for 3 days. It picked it right up at 98% finished it and set home to you. from what I can see it is now correcting the other WUs as well. I'll keep you posted and if something hangs I'll send you another snapshot fo the hang. Good work on your part.
Highest regards (you the man)
Phil
We Must look for intelligent life on other planets as,
it is becoming increasingly apparent we will not find any on our own.
I have been watching for a
)
I have been watching for a while now and most things seem normal except for WU w1_0979.5__0979.5_0.1_T01S4hA_4. This WU was caught up in the stall during processing by 008 and somehow the time to complete got set to a very high number of hours. currently it shows 01:27:27 into the process with 4.85% complete and 28:35:50 to go. When it restarted with 008 the first time it showed over 2000 hours to complete. This is way abnormal. Usually the WUs start out showing 7:18:00 to complete and they usually finish within a few seconds of that time. I think it is slowly catching up with reality each time the process cycles through, but I will let you know what happens to it. I should have some result during the day tomorrow. Next time it comes up in the queue I will try to snap shot it and see if anything looks different than the other Wus.
So far the app is switching and playing well with others.
Regards
Phil
We Must look for intelligent life on other planets as,
it is becoming increasingly apparent we will not find any on our own.
This is the Status dump from
)
This is the Status dump from the WU with the screwed up clock -
2005-08-03 22:51:59.380 sample[1128] Couldn't start c++filt for C++ name demangling
2005-08-03 22:52:00.703 sample[1128] thread_read_stack: stack appears to be in inconsistent state to trace. Truncating stack.
2005-08-03 22:52:02.197 sample[1128] thread_read_stack: stack appears to be in inconsistent state to trace. Truncating stack.
Analysis of sampling pid 1126 every 10.000000 milliseconds
Call graph:
193 Thread_0e0b
193 _dyld_start
193 _start
193 main
193 _Z24boinc_init_graphics_implPFvvEP16BOINC_MAIN_STATE
193 boinc_init_options_graphics_impl
193 _Z24xwin_graphics_event_loopv
193 sleep
193 nanosleep
193 clock_sleep_trap
193 clock_sleep_trap
193 Thread_0f03
191 _pthread_body
191 _Z6foobarPv
191 worker
116 TestLALDemod_for_cpu_type_1
115 TestLALDemod_for_cpu_type_1
1 __isfinited
1 __isfinited
30 cos
30 cos
26 sin
25 sin
1 sin
1 sin
11 PrintTopValues
11 qsort
11 qsort
10 qsort
8 qsort
7 qsort
6 qsort
4 qsort
2 qsort
1 qsort
1 qsort
1 qsort
1 compare
1 compare
1 compare
1 compare
1 compare
1 compare
1 qsort
1 qsort
3 EstimateFLines
2 ComputeOutliers
2 ComputeOutliers
2 ComputeOutliers
1 EstimateFLines
1 EstimateFLines
2 0x9012a184
2 0x9012a184
1 0x9012a188
1 0x9012a188
1 0xf1bd8
1 0xf1bd8
1 0xf1cd8
1 0xf1cd8
2 _sigtramp
2 _sigtramp
Total number in stack (recursive counted multiple, when >=5):
8 qsort
Sort by top of stack, same collapsed (when >= 5):
clock_sleep_trap 193
TestLALDemod_for_cpu_type_1 115
cos 30
sin 26
qsort 8
Sample analysis of process 1126 written to file /dev/stdout
Sampling process 1126 each 10 msecs 300 times
We Must look for intelligent life on other planets as,
it is becoming increasingly apparent we will not find any on our own.
w1_0979.5__0979.5_0.1_T01S4hA
)
w1_0979.5__0979.5_0.1_T01S4hA_4 completed last night with a normal length run time and was sent out this morning. After it terminated its run the app was left in memory but was using 0 cpu and did not seem to be bothering any other processes. I stopped and restarted BOINC and it now seems to be running and trading CPU slots just fine. There are still two WU that were downloaded under 008 but are now actually being processed by 011. There is 1 WU that has started fresh with 011. So by sometime this afternoon all the old stuff should cleared out. But even with the spoofed app things seem to be running ok. If I see anything i'll let you know.
Regards
phil
We Must look for intelligent life on other planets as,
it is becoming increasingly apparent we will not find any on our own.
The 011 seems to be runing
)
The 011 seems to be runing very stabile and sharing the CPU on the powerbook G4 (System 382316). In fact it is running well enough that I put the "MacNN" optimized SETI Version 4.30 on the machine to see how well 011 would work with it. I will keep the other machine runing standard stuff during the Beta so as not to mess up the testing. I know a lot of folks are going to want to run some of the optimized S@H apps so I can at least test one of them for you. Besides it helps to speed up the processing so it gets back to E@H more offten and increases the number of swap outs.
So far 011 is running very well with the S@H optimized build. If I see anythiing I will let you know.
I won't know how the other system is doing until I get back to it later, but based on returns to other projects it must be running ok.
Looks like you guys may have killed another bug with 011.
Regards
Phil
We Must look for intelligent life on other planets as,
it is becoming increasingly apparent we will not find any on our own.
Successful 0.11 result:
)
Successful 0.11 result:
http://einsteinathome.org/task/7192020
Are you a musician? Join the Musicians team.
Meet the Musicians
Well when I got to the
)
Well when I got to the machine (356364) it was processing 2 S@H WU in tandum, but the 011 app was still loaded in memory. It was doing almost nothing. From time to time it would use about .5% - 1% CPU and it had about 3Mb of memory reserved. In order to get things going again I stopped Boinc (had to do it twice) and restarted. It is now running 2 E@H WU in tandum and they seem normal. The are the last 2 WU on this system from the 008 hangups. When they are done the first fresh 011 WU will start.
Here is a status dump from the app at restart.
2005-08-04 19:16:30.721 sample[1599] Couldn't start c++filt for C++ name demangling
Analysis of sampling pid 1553 every 10.000000 milliseconds
Call graph:
89 Thread_0e0b
89 _dyld_start
89 _start
89 main
89 _Z24boinc_init_graphics_implPFvvEP16BOINC_MAIN_STATE
89 boinc_init_options_graphics_impl
88 _Z24xwin_graphics_event_loopv
88 sleep
88 nanosleep
88 clock_sleep_trap
88 clock_sleep_trap
1 sleep
1 sleep
89 Thread_0f03
89 _pthread_body
89 _Z6foobarPv
89 worker
89 boincmain
89 writeFLines
89 sprintf
89 __vfprintf
89 __dtoa
89 __rv_alloc_D2A
89 calloc
89 __spin_unlock
89 _sigtramp
89 _Z12worker_timeri
89 _Z19update_app_progressdddd
89 fprintf
89 vfprintf
89 __vfprintf
89 __dtoa
89 __d2b_D2A
89 __Balloc_D2A
89 __spin_lock_relinquish
89 __spin_lock_relinquish
Total number in stack (recursive counted multiple, when >=5):
Sort by top of stack, same collapsed (when >= 5):
__spin_lock_relinquish 89
clock_sleep_trap 88
Sample analysis of process 1553 written to file /dev/stdout
Sampling process 1553 each 10 msecs 300 times
I waited to let natural app switching restart E@H and the system tried to reuse the loded app but there was no indication of activity either on the Moinc manager or the Activity monitor. At that point Boinc was running 2 E@H WU in tandum. one was working the other was stalled. the working one was using memory and CPU normally. The stalled app was only using about 85% CPU with small fluxuations and showed no activity anywhere else. The first dump below is from the stalled app and the second dump is from the working app.
2005-08-04 20:24:51.672 sample[1679] Couldn't start c++filt for C++ name demangling
Analysis of sampling pid 1553 every 10.000000 milliseconds
Call graph:
157 Thread_0e0b
157 _dyld_start
157 _start
157 main
157 _Z24boinc_init_graphics_implPFvvEP16BOINC_MAIN_STATE
157 boinc_init_options_graphics_impl
157 _Z24xwin_graphics_event_loopv
157 sleep
157 nanosleep
157 clock_sleep_trap
157 clock_sleep_trap
157 Thread_0f03
157 _pthread_body
157 _Z6foobarPv
157 worker
157 boincmain
157 writeFLines
157 sprintf
157 __vfprintf
157 __dtoa
157 __rv_alloc_D2A
157 calloc
157 __spin_unlock
157 _sigtramp
157 _Z12worker_timeri
157 _Z19update_app_progressdddd
157 fprintf
157 vfprintf
157 __vfprintf
157 __dtoa
157 __d2b_D2A
157 __Balloc_D2A
83 __spin_lock_relinquish
83 __spin_lock_relinquish
74 __spin_lock
74 __spin_lock
Total number in stack (recursive counted multiple, when >=5):
Sort by top of stack, same collapsed (when >= 5):
clock_sleep_trap 157
__spin_lock_relinquish 83
__spin_lock 74
Sample analysis of process 1553 written to file /dev/stdout
Sampling process 1553 each 10 msecs 300 times
Working app follows;
2005-08-04 20:26:17.604 sample[1761] Couldn't start c++filt for C++ name demangling
Analysis of sampling pid 1676 every 10.000000 milliseconds
Call graph:
277 Thread_0e0b
277 _dyld_start
277 _start
277 main
277 _Z24boinc_init_graphics_implPFvvEP16BOINC_MAIN_STATE
277 boinc_init_options_graphics_impl
277 _Z24xwin_graphics_event_loopv
277 sleep
277 nanosleep
277 clock_sleep_trap
277 clock_sleep_trap
277 Thread_0f03
277 _pthread_body
277 _Z6foobarPv
277 worker
179 TestLALDemod_for_cpu_type_1
173 TestLALDemod_for_cpu_type_1
5 __isfinited
5 __isfinited
1 sin
1 sin
42 sin
42 sin
33 cos
33 cos
8 0x9012a184
8 0x9012a184
7 PrintTopValues
7 qsort
7 qsort
5 qsort
4 qsort
2 qsort
1 compare
1 compare
1 qsort
1 compare
1 compare
1 qsort
1 qsort
2 compare
2 compare
3 EstimateFLines
2 ComputeOutliers
2 ComputeOutliers
2 ComputeOutliers
1 EstimateFLines
1 EstimateFLines
2 0xf1cd8
2 0xf1cd8
1 0x9012a190
1 0x9012a190
1 0x9012a194
1 0x9012a194
1 0xf1cd0
1 0xf1cd0
Total number in stack (recursive counted multiple, when >=5):
5 qsort
Sort by top of stack, same collapsed (when >= 5):
clock_sleep_trap 277
TestLALDemod_for_cpu_type_1 173
sin 43
cos 33
0x9012a184 8
__isfinited 5
Sample analysis of process 1676 written to file /dev/stdout
Sampling process 1676 each 10 msecs 300 times
We Must look for intelligent life on other planets as,
it is becoming increasingly apparent we will not find any on our own.
Well it did the same thing
)
Well it did the same thing again. it ran ok for a while then it switched from WU w1_0979.5_0979.5_0.1_T01A4hA_4, and left the app in memory. Same conditions as before. It went on to process 2 S@H WU in tandum with the E@H app still in memory. Of course I can not be certain but it looks as though the problem may be with the WU or at least the startup proceedure for the WU. It seems to hang as it returns to continue processing rather than simply staying in memory at the end.
I am also begining to suspect that this may be a Dual processor problem rather than just the same old hangup. i have a powerbook running all this stuff as well and it is not hanging up any more. The only diff is that the powerbook is not running Climate. But Climate does not seem to be the problem as the system is not switching to that app because of the nearing deadline on the E@H WUs. It might be because the Dual system can (and does) run different app on each processor that it somehow adversly effects the switch over.
Of course it could just be the WU too. I will try to process the WU out of the system tonight and see if that fixes things tomorrow.
Regards
Phil
We Must look for intelligent life on other planets as,
it is becoming increasingly apparent we will not find any on our own.
RE: ......I am also
)
I have a single processor G4 (450mHz) but am getting the same 'hanging' problem with Einstein 0.11 as with 0.08. I didn't have the problem with the earlier beta versions 0.02 and 0.05, which let BOINC switch between Einstein and SETI seamlessly. As it takes so long for my Mac to run Einstein anyway, and the hanging means nothing gets processed at all, I'm sorry to say I've suspended Einstein till my current SETI units have finished.
"Mad Moggies" is correct.
)
"Mad Moggies" is correct. Sometime during the night my powerbook hung, and the G4 dual also had a retaind copy of the app stuck in memory as well. I am still not convinced that the WUs may be involved in some way. On my systems it will habg on the same WU until it is gone. Some of them run just fine. On the dual system it just seems odd that the one WU hangs and the others do not. I just sent the problem one in and have started working the first WU that will run start to finish with 011. We'll see.
This is the result from one of the two old WUs on the Dual system. The one that has been the most trouble will not be ready until tonight.
http://einsteinathome.org/task/7160752
Regards
Phil
We Must look for intelligent life on other planets as,
it is becoming increasingly apparent we will not find any on our own.