"Stuck" unit

senator2
senator2
Joined: 11 Nov 04
Posts: 19
Credit: 41547
RAC: 0
Topic 187197

Under version 4.65, Unit pt11_I12_f59.998_b0.104_0 has been sitting on 100% completed for several hours now and not rolling over (still counting CPU time, nothing under "total time") This is on machine 1525. When I restart BOINC it starts at 50%, quickly jumps to 100%, then repeats the process.
Might want to see if this is a reproduceable error. I'll keep a copy of the current file state, but I'm going to let it crunch on some other units to see if it's a corruption in my software.

bjacke
bjacke
Joined: 10 Nov 04
Posts: 102
Credit: 11310
RAC: 0

"Stuck" unit

Under Win 2000 Prof SP2
no problems until now -> Process @66%!

Greetings from Germany
Basti

Join Ad Astra

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4273
Credit: 245217726
RAC: 12915

Thanks for the

Thanks for the report.

Actually the new pt workunits crunch in three steps: Two runs similar to the previous ft runs (0-50%, 50-100%), then a "comparison" run that we thought to be short enough for not needing chekcpointing or a single percent of the done counter. Apparently this last step takes somewhat longer on your machine, I'll take a look at the WU and see if the reason is in the data.

BM

BM

Bruce Allen
Bruce Allen
Moderator
Joined: 15 Oct 04
Posts: 1119
Credit: 172127663
RAC: 0

> Under version 4.65, Unit

> Under version 4.65, Unit pt11_I12_f59.998_b0.104_0 has been sitting on 100%
> completed for several hours now and not rolling over (still counting CPU time,
> nothing under "total time") This is on machine 1525. When I restart BOINC
> it starts at 50%, quickly jumps to 100%, then repeats the process.
> Might want to see if this is a reproduceable error. I'll keep a copy of the
> current file state, but I'm going to let it crunch on some other units to see
> if it's a corruption in my software.

I think we have now fixed this bug. The new science code was acting badly if it didn't find any source candidates in one of the data sets.

[Added Jan 8th] We're now testing a revised app version that hopefully will fix this problem. If our testing goes well it should be available in a day or so and will revive this stuck WU.

Bruce

Director, Einstein@Home

Muggy
Muggy
Joined: 11 Nov 04
Posts: 1
Credit: 872354
RAC: 0

> > Under version 4.65, Unit

Message 718 in response to message 717

> > Under version 4.65, Unit pt11_I12_f59.998_b0.104_0 has been sitting on
> 100%
> > completed for several hours now and not rolling over (still counting CPU
> time,
> > nothing under "total time") This is on machine 1525. When I restart
> BOINC
> > it starts at 50%, quickly jumps to 100%, then repeats the process.

Same problem here with unit pt12_I12_f59.998_b0.104_9, crunching with einstein 4.71 on machine 1538, starts at just under 50%, gets to 50%, then upto 100%, and sits there.

bjacke
bjacke
Joined: 10 Nov 04
Posts: 102
Credit: 11310
RAC: 0

What are standing the new H1

What are standing the new H1 wu's for?

Greetings from Germany
Basti

Join Ad Astra

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4273
Credit: 245217726
RAC: 12915

>> Same problem here with

Message 720 in response to message 718

>> Same problem here with unit pt12_I12_f59.998_b0.104_9, crunching with einstein
> 4.71 on machine 1538, starts at just under 50%, gets to 50%, then upto 100%,
> and sits there.

Ooops - I thought we had this fixed... thanks for the report.

BM

BM

ric
ric
Joined: 4 Jan 05
Posts: 51
Credit: 236006
RAC: 0

RE: "Stuck" unit

Message 721 in response to message 720

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.