Work Unit not finishing

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4332
Credit: 251759054
RAC: 35906

> @Bernd or any def people >

Message 852 in response to message 848

> @Bernd or any def people
> here got a longrider.

T(om?)hanks!

> Still interested about the boinc folder?

Thanks for doing, but I think we found the problem. The WUs causing this kind of trouble seems to be analyzing the frequency range around 60Hz (you can currently tell it from the name). We are working to get this problem out of the way.

If you don't mind, keep the archive for a while in case we need it, I think right now it is of no use for us, but may become handy in the future.

Thanks a lot for your help!

BM

BM

ric
ric
Joined: 4 Jan 05
Posts: 51
Credit: 236006
RAC: 0

Thanks for

Message 853 in response to message 852

Thanks for replying,

that's good news, your will fix it soon.

Will keep the achive.

And it a pleasure to help in a modest way

goodluck!

Bruce Allen
Bruce Allen
Moderator
Joined: 15 Oct 04
Posts: 1119
Credit: 172127663
RAC: 0

> Is this related to my CPU

Message 854 in response to message 846

> Is this related to my CPU time resetting back about an hour whenever einstein
> was paused and resumed by BOINC? Is there maybe no checkpoint built into the
> 3rd stage of analysis since it was expected to be so short? I have my
> settings set to switch between projects every 60 minutes and to remove them
> from memory when doing so.

Your analysis of the problem is correct. There is no checkpoint in the third stage of processing because it is supposed to take only a few seconds.

Please see the front page news item about this.

Bruce

Director, Einstein@Home

Bruce Allen
Bruce Allen
Moderator
Joined: 15 Oct 04
Posts: 1119
Credit: 172127663
RAC: 0

> And it a pleasure to help

Message 855 in response to message 853

> And it a pleasure to help in a modest way

This was a big help. The problem has been isolated and we're working on a fix. Please see the front page news item about this.

Bruce

Director, Einstein@Home

Rebirther
Rebirther
Joined: 4 Jan 05
Posts: 22
Credit: 31576
RAC: 0

Another not ending story in

Another not ending story in "H1_0059.9__0060.0_0.1_T17_Test02" If you can delete this workunit the other will thank you.

Rebirther
Germany

Bruce Allen
Bruce Allen
Moderator
Joined: 15 Oct 04
Posts: 1119
Credit: 172127663
RAC: 0

> In case this helps

Message 857 in response to message 845

> In case this helps anyone:
>
> The problem appears to be in the data, not in the code. The WU will eventually
> finish, but it may take quite some time (even more than we expect in the max
> CPU time value and exceeding the deadline), and maybe also more memory than we
> expected (possibly causing more problems). We'll ty to avoid such WUs in the
> future.

I want to say this a bit differently.

The problem is in our code. For certain data sets, it is not as efficient as it could or should be. So we are in the process of fixing the code to make it efficient in all cases.

Unfortunately we can't identify the 'troublesome' data sets or cases without actually analyzing the data! So we can't easily avoid such WU. Instead we need to make the code work efficiently with any of our input data sets.

Cheers,
Bruce

Director, Einstein@Home

Bruce Allen
Bruce Allen
Moderator
Joined: 15 Oct 04
Posts: 1119
Credit: 172127663
RAC: 0

> Another not ending story in

Message 858 in response to message 856

> Another not ending story in "H1_0059.9__0060.0_0.1_T17_Test02" If you can
> delete this workunit the other will thank you.

For what its worth, the problem occurs when analyzing the band
of data containing 60 Hz (power mains frequency in the USA).

I'll talk with others in our team about cancelling these WU. I am not doing it right away because looking at the results coming back (they are slow but do compplete) may help us to fix this.

Cheers,
Bruce

Director, Einstein@Home

STE\/E
STE\/E
Joined: 18 Jan 05
Posts: 135
Credit: 144880875
RAC: 21876

0060.0_ So are you saying if

0060.0_ So are you saying if I look at this part of the WU ID that there could be a potential problem with the WU taking to much time to finish ... ???

Bruce Allen
Bruce Allen
Moderator
Joined: 15 Oct 04
Posts: 1119
Credit: 172127663
RAC: 0

> 0060.0_ So are you saying

Message 860 in response to message 859

> 0060.0_ So are you saying if I look at this part of the WU ID that there could
> be a potential problem with the WU taking to much time to finish ... ???

Actually it's the __0060.0 (TWO underscores) that's the clue. I wouldn't be surprised if the __0059.9 (TWO underscores) workunits also show this behavior.

Bruce

Director, Einstein@Home

STE\/E
STE\/E
Joined: 18 Jan 05
Posts: 135
Credit: 144880875
RAC: 21876

Ok, I'll have to remember

Ok, I'll have to remember that & if I continue to have the same problem on my 1 computer I can look and see if it has the __0060.0 in the WU ID so at least I'll know thats whats causing the problem ...

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.