> Sorry folks, I know this is annoying. But we always stated that
> this is alpha test, no guarantees for nothing - and we are working on it.
No complaints from me!
I deleted the slot directory but left it in the recycle bin on purpose in case it might come in handy... I zipped it up and threw it up on the webserver running my stats site. Yes, I removed the authenticator in the XML file so the script kiddies can forget it :) clicky
I didn't save the einstein.phys.uwm.edu directory at the time but I don't think it has changed since last night (no new work down or uploaded) so unless BOINC deleted some files when the work unit errored out after I emptied the slot directory, it should be the same as at the time of the error. clicky2.
It was late last night when I did this and I don't remember what state the work unit was it. It may have just started over after being paused by BOINC and dropped back down to 7.5 hours of CPU time. Hope this helps.
The problem appears to be in the data, not in the code. The WU will eventually finish, but it may take quite some time (even more than we expect in the max CPU time value and exceeding the deadline), and maybe also more memory than we expected (possibly causing more problems). We'll ty to avoid such WUs in the future.
Is this related to my CPU time resetting back about an hour whenever einstein was paused and resumed by BOINC? Is there maybe no checkpoint built into the 3rd stage of analysis since it was expected to be so short? I have my settings set to switch between projects every 60 minutes and to remove them from memory when doing so.
I am running B@H v4.18, and I have not been able to finish a single unit. I keeps getting computation errors. I am also running S@H, P@H, CPDN, LHC@H. The following is taken from my log, hopefully this helps.
2005-01-25 16:59:55 [Einstein@Home] Unrecoverable error for result H1_0057.4__0057.8_0.1_T17_Test02_2 ( - exit code -1073741819 (0xc0000005))
normaly this host is doing 2 Einsteins in 10-11 hours (Intel 3.2 @3.442GHz, 512 MB, HT enabled, client 4.16 windows2000, hostID 7532, only attached to Einstein
Have saved the full boinc folder as it is (89 MB) zipped down to 33 MB.
Still interested about the boinc folder?
The only problem for transfer will be the limitation in size of the mailboxes.
Prepared 7 emails, each about 5 MB of size. This will pass mailbox restrictions. After receiving, just put all files into one directory and binary copy them back into one file (Copy file included)
Just let me know to who the 33 MB of data should be send, if there is a need..
Got some other longrider WUs, some finished after 18 Hours or more
On CPDN, one of our members provided his Linux machine as an ftp server dedicated to CPDN project/participants. It has no a super-fast connection but plenty of space for project data upload, screen-shots and whatever appropriate.
IMHO, it might be usefull to have such a server for Einstein as well. Sometimes core team would like to look at particular data and such a server might come handy; it may be easier for participants to provide their data...
@ Ric, It's good to see I'm not the only one thats getting these longer than normal WU runs, at least I can pretty much discount it being my Computer causing them to run longer. I didn't think it was the computer in the first place but when this stuff happens it gets you to wondering ... Friendly :)
Honza,
that's a very good idea. Sure there is a ftp server there, but not opened for public. It can be done by email, but email was not build for large data transfers.
I'm very glad, the "infected" host is not a cpdn/einstein cruncher, the uncompressed file size would be about 1.2 GB (2 cpdn and a 1.5 day queue of Einstein..)
Yes PoorBoy
nobody is alone..
..wondering. Me too. As friendly as we are, first looking for the bad at own side;)
Got it on several hosts, not only on intels. But it's rare, really rare.
Those long WUs are a part of the on going emerging of this project, it's useful to adress them now and try to fix it.
If I'm "lucky" and have one
)
If I'm "lucky" and have one again, I will zip it and inform you
Supporting BOINC, a great concept !
> Sorry folks, I know this is
)
> Sorry folks, I know this is annoying. But we always stated that
> this is alpha test, no guarantees for nothing - and we are working on it.
No complaints from me!
I deleted the slot directory but left it in the recycle bin on purpose in case it might come in handy... I zipped it up and threw it up on the webserver running my stats site. Yes, I removed the authenticator in the XML file so the script kiddies can forget it :) clicky
I didn't save the einstein.phys.uwm.edu directory at the time but I don't think it has changed since last night (no new work down or uploaded) so unless BOINC deleted some files when the work unit errored out after I emptied the slot directory, it should be the same as at the time of the error. clicky2.
It was late last night when I did this and I don't remember what state the work unit was it. It may have just started over after being paused by BOINC and dropped back down to 7.5 hours of CPU time. Hope this helps.
A member of The Knights Who Say Ni!
My BOINC stats page
THANKS TOBY! A big HUG
)
THANKS TOBY!
A big HUG from all people in the e@h projects team!!
Edit: The Clicky2 gives a 404 - not found...
Edit#2: Not anymore - was a bit too fast?
BM
BM
In case this helps
)
In case this helps anyone:
The problem appears to be in the data, not in the code. The WU will eventually finish, but it may take quite some time (even more than we expect in the max CPU time value and exceeding the deadline), and maybe also more memory than we expected (possibly causing more problems). We'll ty to avoid such WUs in the future.
BM
BM
Is this related to my CPU
)
Is this related to my CPU time resetting back about an hour whenever einstein was paused and resumed by BOINC? Is there maybe no checkpoint built into the 3rd stage of analysis since it was expected to be so short? I have my settings set to switch between projects every 60 minutes and to remove them from memory when doing so.
A member of The Knights Who Say Ni!
My BOINC stats page
I am running B@H v4.18, and I
)
I am running B@H v4.18, and I have not been able to finish a single unit. I keeps getting computation errors. I am also running S@H, P@H, CPDN, LHC@H. The following is taken from my log, hopefully this helps.
2005-01-25 16:59:55 [Einstein@Home] Unrecoverable error for result H1_0057.4__0057.8_0.1_T17_Test02_2 ( - exit code -1073741819 (0xc0000005))
@Bernd or any def people here
)
@Bernd or any def people
here got a longrider.
normaly this host is doing 2 Einsteins in 10-11 hours (Intel 3.2 @3.442GHz, 512 MB, HT enabled, client 4.16 windows2000, hostID 7532, only attached to Einstein
Have saved the full boinc folder as it is (89 MB) zipped down to 33 MB.
Still interested about the boinc folder?
The only problem for transfer will be the limitation in size of the mailboxes.
Prepared 7 emails, each about 5 MB of size. This will pass mailbox restrictions. After receiving, just put all files into one directory and binary copy them back into one file (Copy file included)
Just let me know to who the 33 MB of data should be send, if there is a need..
Got some other longrider WUs, some finished after 18 Hours or more
ric
@ Bernd, ric On CPDN, one
)
@ Bernd, ric
On CPDN, one of our members provided his Linux machine as an ftp server dedicated to CPDN project/participants. It has no a super-fast connection but plenty of space for project data upload, screen-shots and whatever appropriate.
IMHO, it might be usefull to have such a server for Einstein as well. Sometimes core team would like to look at particular data and such a server might come handy; it may be easier for participants to provide their data...
@ Ric, It's good to see I'm
)
@ Ric, It's good to see I'm not the only one thats getting these longer than normal WU runs, at least I can pretty much discount it being my Computer causing them to run longer. I didn't think it was the computer in the first place but when this stuff happens it gets you to wondering ... Friendly :)
Honza, that's a very good
)
Honza,
that's a very good idea. Sure there is a ftp server there, but not opened for public. It can be done by email, but email was not build for large data transfers.
I'm very glad, the "infected" host is not a cpdn/einstein cruncher, the uncompressed file size would be about 1.2 GB (2 cpdn and a 1.5 day queue of Einstein..)
Yes PoorBoy
nobody is alone..
..wondering. Me too. As friendly as we are, first looking for the bad at own side;)
Got it on several hosts, not only on intels. But it's rare, really rare.
Those long WUs are a part of the on going emerging of this project, it's useful to adress them now and try to fix it.