Work Unit not finishing

Yeti
Yeti
Joined: 17 Nov 04
Posts: 59
Credit: 1371204130
RAC: 71741

If I'm "lucky" and have one

If I'm "lucky" and have one again, I will zip it and inform you

Supporting BOINC, a great concept !

Toby
Toby
Joined: 18 Jan 05
Posts: 9
Credit: 112028408
RAC: 79137

> Sorry folks, I know this is

> Sorry folks, I know this is annoying. But we always stated that
> this is alpha test, no guarantees for nothing - and we are working on it.

No complaints from me!

I deleted the slot directory but left it in the recycle bin on purpose in case it might come in handy... I zipped it up and threw it up on the webserver running my stats site. Yes, I removed the authenticator in the XML file so the script kiddies can forget it :) clicky

I didn't save the einstein.phys.uwm.edu directory at the time but I don't think it has changed since last night (no new work down or uploaded) so unless BOINC deleted some files when the work unit errored out after I emptied the slot directory, it should be the same as at the time of the error. clicky2.

It was late last night when I did this and I don't remember what state the work unit was it. It may have just started over after being paused by BOINC and dropped back down to 7.5 hours of CPU time. Hope this helps.

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4332
Credit: 251757668
RAC: 35880

THANKS TOBY! A big HUG

THANKS TOBY!

A big HUG from all people in the e@h projects team!!

Edit: The Clicky2 gives a 404 - not found...
Edit#2: Not anymore - was a bit too fast?

BM

BM

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4332
Credit: 251757668
RAC: 35880

In case this helps

In case this helps anyone:

The problem appears to be in the data, not in the code. The WU will eventually finish, but it may take quite some time (even more than we expect in the max CPU time value and exceeding the deadline), and maybe also more memory than we expected (possibly causing more problems). We'll ty to avoid such WUs in the future.

BM

BM

Toby
Toby
Joined: 18 Jan 05
Posts: 9
Credit: 112028408
RAC: 79137

Is this related to my CPU

Is this related to my CPU time resetting back about an hour whenever einstein was paused and resumed by BOINC? Is there maybe no checkpoint built into the 3rd stage of analysis since it was expected to be so short? I have my settings set to switch between projects every 60 minutes and to remove them from memory when doing so.

Blizzard
Blizzard
Joined: 22 Jan 05
Posts: 5
Credit: 1148187
RAC: 0

I am running B@H v4.18, and I

I am running B@H v4.18, and I have not been able to finish a single unit. I keeps getting computation errors. I am also running S@H, P@H, CPDN, LHC@H. The following is taken from my log, hopefully this helps.

2005-01-25 16:59:55 [Einstein@Home] Unrecoverable error for result H1_0057.4__0057.8_0.1_T17_Test02_2 ( - exit code -1073741819 (0xc0000005))

ric
ric
Joined: 4 Jan 05
Posts: 51
Credit: 236006
RAC: 0

@Bernd or any def people here

Message 848 in response to message 847

@Bernd or any def people
here got a longrider.

normaly this host is doing 2 Einsteins in 10-11 hours (Intel 3.2 @3.442GHz, 512 MB, HT enabled, client 4.16 windows2000, hostID 7532, only attached to Einstein

Have saved the full boinc folder as it is (89 MB) zipped down to 33 MB.

Still interested about the boinc folder?

The only problem for transfer will be the limitation in size of the mailboxes.
Prepared 7 emails, each about 5 MB of size. This will pass mailbox restrictions. After receiving, just put all files into one directory and binary copy them back into one file (Copy file included)

Just let me know to who the 33 MB of data should be send, if there is a need..

Got some other longrider WUs, some finished after 18 Hours or more

ric

Honza
Honza
Joined: 10 Nov 04
Posts: 136
Credit: 3332354
RAC: 0

@ Bernd, ric On CPDN, one

@ Bernd, ric

On CPDN, one of our members provided his Linux machine as an ftp server dedicated to CPDN project/participants. It has no a super-fast connection but plenty of space for project data upload, screen-shots and whatever appropriate.

IMHO, it might be usefull to have such a server for Einstein as well. Sometimes core team would like to look at particular data and such a server might come handy; it may be easier for participants to provide their data...

STE\/E
STE\/E
Joined: 18 Jan 05
Posts: 135
Credit: 144880312
RAC: 21882

@ Ric, It's good to see I'm

@ Ric, It's good to see I'm not the only one thats getting these longer than normal WU runs, at least I can pretty much discount it being my Computer causing them to run longer. I didn't think it was the computer in the first place but when this stuff happens it gets you to wondering ... Friendly :)

ric
ric
Joined: 4 Jan 05
Posts: 51
Credit: 236006
RAC: 0

Honza, that's a very good

Message 851 in response to message 850

Honza,
that's a very good idea. Sure there is a ftp server there, but not opened for public. It can be done by email, but email was not build for large data transfers.
I'm very glad, the "infected" host is not a cpdn/einstein cruncher, the uncompressed file size would be about 1.2 GB (2 cpdn and a 1.5 day queue of Einstein..)

Yes PoorBoy
nobody is alone..
..wondering. Me too. As friendly as we are, first looking for the bad at own side;)

Got it on several hosts, not only on intels. But it's rare, really rare.

Those long WUs are a part of the on going emerging of this project, it's useful to adress them now and try to fix it.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.