> @Bernd or any def people
> here got a longrider.
T(om?)hanks!
> Still interested about the boinc folder?
Thanks for doing, but I think we found the problem. The WUs causing this kind of trouble seems to be analyzing the frequency range around 60Hz (you can currently tell it from the name). We are working to get this problem out of the way.
If you don't mind, keep the archive for a while in case we need it, I think right now it is of no use for us, but may become handy in the future.
> Is this related to my CPU time resetting back about an hour whenever einstein
> was paused and resumed by BOINC? Is there maybe no checkpoint built into the
> 3rd stage of analysis since it was expected to be so short? I have my
> settings set to switch between projects every 60 minutes and to remove them
> from memory when doing so.
Your analysis of the problem is correct. There is no checkpoint in the third stage of processing because it is supposed to take only a few seconds.
> In case this helps anyone:
>
> The problem appears to be in the data, not in the code. The WU will eventually
> finish, but it may take quite some time (even more than we expect in the max
> CPU time value and exceeding the deadline), and maybe also more memory than we
> expected (possibly causing more problems). We'll ty to avoid such WUs in the
> future.
I want to say this a bit differently.
The problem is in our code. For certain data sets, it is not as efficient as it could or should be. So we are in the process of fixing the code to make it efficient in all cases.
Unfortunately we can't identify the 'troublesome' data sets or cases without actually analyzing the data! So we can't easily avoid such WU. Instead we need to make the code work efficiently with any of our input data sets.
> Another not ending story in "H1_0059.9__0060.0_0.1_T17_Test02" If you can
> delete this workunit the other will thank you.
For what its worth, the problem occurs when analyzing the band
of data containing 60 Hz (power mains frequency in the USA).
I'll talk with others in our team about cancelling these WU. I am not doing it right away because looking at the results coming back (they are slow but do compplete) may help us to fix this.
0060.0_ So are you saying if I look at this part of the WU ID that there could be a potential problem with the WU taking to much time to finish ... ???
> 0060.0_ So are you saying if I look at this part of the WU ID that there could
> be a potential problem with the WU taking to much time to finish ... ???
Actually it's the __0060.0 (TWO underscores) that's the clue. I wouldn't be surprised if the __0059.9 (TWO underscores) workunits also show this behavior.
Ok, I'll have to remember that & if I continue to have the same problem on my 1 computer I can look and see if it has the __0060.0 in the WU ID so at least I'll know thats whats causing the problem ...
> @Bernd or any def people >
)
> @Bernd or any def people
> here got a longrider.
T(om?)hanks!
> Still interested about the boinc folder?
Thanks for doing, but I think we found the problem. The WUs causing this kind of trouble seems to be analyzing the frequency range around 60Hz (you can currently tell it from the name). We are working to get this problem out of the way.
If you don't mind, keep the archive for a while in case we need it, I think right now it is of no use for us, but may become handy in the future.
Thanks a lot for your help!
BM
BM
Thanks for
)
Thanks for replying,
that's good news, your will fix it soon.
Will keep the achive.
And it a pleasure to help in a modest way
goodluck!
> Is this related to my CPU
)
> Is this related to my CPU time resetting back about an hour whenever einstein
> was paused and resumed by BOINC? Is there maybe no checkpoint built into the
> 3rd stage of analysis since it was expected to be so short? I have my
> settings set to switch between projects every 60 minutes and to remove them
> from memory when doing so.
Your analysis of the problem is correct. There is no checkpoint in the third stage of processing because it is supposed to take only a few seconds.
Please see the front page news item about this.
Bruce
Director, Einstein@Home
> And it a pleasure to help
)
> And it a pleasure to help in a modest way
This was a big help. The problem has been isolated and we're working on a fix. Please see the front page news item about this.
Bruce
Director, Einstein@Home
Another not ending story in
)
Another not ending story in "H1_0059.9__0060.0_0.1_T17_Test02" If you can delete this workunit the other will thank you.
Rebirther
Germany
> In case this helps
)
> In case this helps anyone:
>
> The problem appears to be in the data, not in the code. The WU will eventually
> finish, but it may take quite some time (even more than we expect in the max
> CPU time value and exceeding the deadline), and maybe also more memory than we
> expected (possibly causing more problems). We'll ty to avoid such WUs in the
> future.
I want to say this a bit differently.
The problem is in our code. For certain data sets, it is not as efficient as it could or should be. So we are in the process of fixing the code to make it efficient in all cases.
Unfortunately we can't identify the 'troublesome' data sets or cases without actually analyzing the data! So we can't easily avoid such WU. Instead we need to make the code work efficiently with any of our input data sets.
Cheers,
Bruce
Director, Einstein@Home
> Another not ending story in
)
> Another not ending story in "H1_0059.9__0060.0_0.1_T17_Test02" If you can
> delete this workunit the other will thank you.
For what its worth, the problem occurs when analyzing the band
of data containing 60 Hz (power mains frequency in the USA).
I'll talk with others in our team about cancelling these WU. I am not doing it right away because looking at the results coming back (they are slow but do compplete) may help us to fix this.
Cheers,
Bruce
Director, Einstein@Home
0060.0_ So are you saying if
)
0060.0_ So are you saying if I look at this part of the WU ID that there could be a potential problem with the WU taking to much time to finish ... ???
> 0060.0_ So are you saying
)
> 0060.0_ So are you saying if I look at this part of the WU ID that there could
> be a potential problem with the WU taking to much time to finish ... ???
Actually it's the __0060.0 (TWO underscores) that's the clue. I wouldn't be surprised if the __0059.9 (TWO underscores) workunits also show this behavior.
Bruce
Director, Einstein@Home
Ok, I'll have to remember
)
Ok, I'll have to remember that & if I continue to have the same problem on my 1 computer I can look and see if it has the __0060.0 in the WU ID so at least I'll know thats whats causing the problem ...