Problem with H1_0260.9 WUs?

Ertugrul Gokcen
Ertugrul Gokcen
Joined: 22 Jan 05
Posts: 15
Credit: 28823
RAC: 0
Topic 188852

Hi,

These 2 WUs, 676685 and 673808, both crashed when they were 99.5% - 99.7% complete. I had taken a backup 3 hours before the first one crashed, so I reverted to it thinking that it was a one-off problem, but it crashed again when it was almost finished. Unfortunately I didn't have a backup handy when the second one crashed, so I didn't have a chance to rerun it.

The error message for the WUs is as follows:

02/04/2005 15:27:59 - Einstein@Home - Unrecoverable error for result H1_0260.9__0261.4_0.1_T25_Test02_3 ( - exit code -1073741819 (0xc0000005))

03/04/2005 15:23:26 - Einstein@Home - Unrecoverable error for result H1_0260.9__0261.3_0.1_T25_Test02_6 ( - exit code -1073741819 (0xc0000005))

If you look at these two WUs, you will see that there are others experiencing client errors, too (3 client errors in the first one, and 4 in the second.) Is there anyone having similar problems out there? Any ideas as to what might be happening?

A WU takes ~11 hours to crunch on my laptop, so ~22 + 3 hours (for the rerun of the first WU from backup) = ~25 hours of crunching has gone to trash! I have never experienced such crashes with Einstein before, it was quite consistent and stable until yesterday. I have another H1_0260.9 WU, if this one also crashes, I'm afraid I will have to say goodbye until a new kind of H1_XXXX WU is submitted.

Any kind of help, advice, clue, tip, etc. will be greatly appreciated, especially from project owners and admins.

Best regards,

Ertugrul.

Ertugrul Gokcen
Ertugrul Gokcen
Joined: 22 Jan 05
Posts: 15
Credit: 28823
RAC: 0

Problem with H1_0260.9 WUs?


I have something rather strange going on here!

> I have another H1_0260.9 WU, if this one also crashes, I'm afraid
> I will have to say goodbye until a new kind of H1_XXXX WU is submitted.

Well, I took a backup just before this one was 99% through, and when I restarted it completed fine without any errors. But still I had the impression that those H1_0260.9 WUs were problematic, so I detached and reattached, and got a new set of WU. And guess what, that one crashed before my eyes at 99% after 11 hours of crunching, too!!! I HATE to waste WUs, so I went back to my backup which I had taken when the WU was 30% through, and reran that. But this time I stopped and took a backup at 99%, restarted, and voila, it completed without any problem!

So I have strange pattern like,

1) Let the WU finish on its own, and get a crash,
2) Babysit the WU just before it's over, and you are OK.

I don't have any problems with babysitting, I regulary backup my BOINC folder anyway, but this pattern seeems totally illogical and annoying to me.

I'm running einstein 4.79 with CC 4.25, BTW...

Any suggestions?

Ertugrul.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.