06/27/05 19:02:34||Starting BOINC client version 4.43 for windows_intelx86
06/27/05 19:02:34||Data directory: D:\Program Files\BOINC
06/27/05 19:02:35|Einstein@Home|Computer ID: 307979; location: home; project prefs: default
06/27/05 19:02:35|orbit@home|Computer ID: 682; location: home; project prefs: default
06/27/05 19:02:35||General prefs: from Einstein@Home (last modified 2005-06-13 13:31:31)
06/27/05 19:02:35||General prefs: no separate prefs for home; using your defaults
06/27/05 19:02:35||Remote control not allowed; using loopback address
06/27/05 19:02:35|Einstein@Home|Resuming computation for result H1_0326.5__0326.9_0.1_T21_Fin1_2 using einstein version 4.79
06/27/05 19:02:35|orbit@home|Deferring communication with project for 14 hours, 48 minutes, and 26 seconds
06/27/05 19:02:35|Einstein@Home|Started download of h1_0326.5
06/27/05 19:02:35||schedule_cpus: must schedule
06/27/05 19:02:49|Einstein@Home|Temporarily failed download of h1_0326.5: 416
06/27/05 19:02:52|Einstein@Home|Started download of h1_0326.5
06/27/05 19:03:03|Einstein@Home|Temporarily failed download of h1_0326.5: 416
06/27/05 19:03:06|Einstein@Home|Started download of h1_0326.5
I had the same problem just now and I had to reset the project on that PC.
The reason :
It had two download tasks running on exactly the same file. (h1_0400.0)
One was downloaded successfully with the expected file size and the other downloader "wondered where those bytes all came from" and reported a file size error too with a retry every few seconds.
BOINC 4.19, Dual CPU P3s
After the reset it did download stuff successfully but still it shows is "download failed". Nothing missing but I guess I cannot allow BOINC to have two files with the same filename ;-)
There must be something damaged on server/scheduler side or in the WU XML config.
Same problem first - but then after successful(!) transfer of H1_501.0 BOINC got a request to delete H1_501.0 while it was still downloading H1_501.0 on the other download thread.
Of course the client didn't like that too much either - now there's a checksum error, 2 tasks are crunching and a few are still in "downloading" state
The story continues : After manually contacting the scheduler to report the error, it tried to delete H1_501.0
BOINC was very sad and told me it couldn't delete H1_501.0 .... but the work units are happy now and not trying to download H1_501.0 again (as it's still there of course)
___________
I guess it's the WU configuration that is wrong, the scheduler request which I saved after the first problem had this in it :
Same here, just caught one machine in an endless loop here is an excerpt
6/27/2005 8:16:21 PM|Einstein@Home|Temporarily failed download of h1_0673.0: 416
6/27/2005 8:16:23 PM|Einstein@Home|Couldn't delete file projects/einstein.phys.uwm.edu/h1_0673.0
6/27/2005 8:16:24 PM|Einstein@Home|Started download of h1_0673.0
6/27/2005 8:16:25 PM|Einstein@Home|Temporarily failed download of h1_0673.0: 416
6/27/2005 8:16:27 PM|Einstein@Home|Couldn't delete file projects/einstein.phys.uwm.edu/h1_0673.0
6/27/2005 8:16:27 PM|Einstein@Home|Started download of h1_0673.0
6/27/2005 8:16:28 PM|Einstein@Home|Temporarily failed download of h1_0673.0: 416
6/27/2005 8:16:30 PM|Einstein@Home|Couldn't delete file projects/einstein.phys.uwm.edu/h1_0673.0
6/27/2005 8:16:30 PM|Einstein@Home|Started download of h1_0673.0
6/27/2005 8:16:31 PM|Einstein@Home|Temporarily failed download of h1_0673.0: 416
It went on till I aborted transfer which appears to have killed the issue. This machine running Einstein beta and 4.45 windows.
Same here, just caught one machine in an endless loop here is an excerpt
6/27/2005 8:16:21 PM|Einstein@Home|Temporarily failed download of h1_0673.0: 416
6/27/2005 8:16:23 PM|Einstein@Home|Couldn't delete file projects/einstein.phys.uwm.edu/h1_0673.0
6/27/2005 8:16:24 PM|Einstein@Home|Started download of h1_0673.0
6/27/2005 8:16:25 PM|Einstein@Home|Temporarily failed download of h1_0673.0: 416
6/27/2005 8:16:27 PM|Einstein@Home|Couldn't delete file projects/einstein.phys.uwm.edu/h1_0673.0
6/27/2005 8:16:27 PM|Einstein@Home|Started download of h1_0673.0
6/27/2005 8:16:28 PM|Einstein@Home|Temporarily failed download of h1_0673.0: 416
6/27/2005 8:16:30 PM|Einstein@Home|Couldn't delete file projects/einstein.phys.uwm.edu/h1_0673.0
6/27/2005 8:16:30 PM|Einstein@Home|Started download of h1_0673.0
6/27/2005 8:16:31 PM|Einstein@Home|Temporarily failed download of h1_0673.0: 416
It went on till I aborted transfer which appears to have killed the issue. This machine running Einstein beta and 4.45 windows.
Shut down boinc and restart it. Usually "exit" in boincmgr will do it, but the boinc process must end. If it doesn't, use the taskmanager to "kill" it.
Theres a bug in BOINC where temporarily failed downloads keep the file open which can cause the problems you see. When boinc ends, Windows will close all the files.
all i did after the bad download was to abort it and it seems to be running ok now.
Thats good. But run Process Explorer, look at the handles for the BOINC process, and see if theres any for h1_0326.5. Or any other h1_* file.
Its fine for the einstein application to use these, but BOINC shouldn't hold on to the file. It'll cause problems later, when BOINC has to delete it. Which shouldn't be for a few weeks yet, when the scheduler decides its time to work in a different set of data.
EDIT:
The "download looping" problem is in boinc 4.43 and fixed with 4.45. Don't remember whether 4.45 fixes the "open handle" one though.
EDIT**2:
From Roberts post, I'd say the "open handle" bug isn't fixed in 4.45. Thats what happens when downloads fail like that, if BOINC leaves the file open, it can't delete the file to download it again. Thats a problem for Einstein@home, where one file is downloaded for all the WU's to use. In that case, its probably a good idea to restart BOINC.
new units not downloading
)
Any relevant message(s) from the messages tab of Boinc would be interesting to post.
Greetings from Belgium
Thierry
RE: new h1 units not
)
06/27/05 19:02:34||Starting BOINC client version 4.43 for windows_intelx86
06/27/05 19:02:34||Data directory: D:\Program Files\BOINC
06/27/05 19:02:35|Einstein@Home|Computer ID: 307979; location: home; project prefs: default
06/27/05 19:02:35|orbit@home|Computer ID: 682; location: home; project prefs: default
06/27/05 19:02:35||General prefs: from Einstein@Home (last modified 2005-06-13 13:31:31)
06/27/05 19:02:35||General prefs: no separate prefs for home; using your defaults
06/27/05 19:02:35||Remote control not allowed; using loopback address
06/27/05 19:02:35|Einstein@Home|Resuming computation for result H1_0326.5__0326.9_0.1_T21_Fin1_2 using einstein version 4.79
06/27/05 19:02:35|orbit@home|Deferring communication with project for 14 hours, 48 minutes, and 26 seconds
06/27/05 19:02:35|Einstein@Home|Started download of h1_0326.5
06/27/05 19:02:35||schedule_cpus: must schedule
06/27/05 19:02:49|Einstein@Home|Temporarily failed download of h1_0326.5: 416
06/27/05 19:02:52|Einstein@Home|Started download of h1_0326.5
06/27/05 19:03:03|Einstein@Home|Temporarily failed download of h1_0326.5: 416
06/27/05 19:03:06|Einstein@Home|Started download of h1_0326.5
kenlo
Here an excerpt from
)
Here an excerpt from proxomitron log:
+++GET 30654+++
GET /download/38/h1_0205.0 HTTP/1.0
User-Agent: BOINC client
Host: einstein.astro.gla.ac.uk:80
Range: bytes=14736000-
Accept: */*
Connection: keep-alive
+++RESP 30654+++
HTTP/1.0 416 Requested Range Not Satisfiable
Date: Mon, 27 Jun 2005 23:41:50 GMT
Server: Apache/2.0.54 (Debian GNU/Linux) DAV/2 SVN/1.1.4 mod_python/3.1.3 Python/2.3.5 PHP/4.3.10-15 mod_ssl/2.0.54 OpenSSL/0.9.7e mod_perl/1.999.21 Perl/v5.8.4
Keep-Alive: timeout=15, max=89
Connection: Keep-Alive
Transfer-Encoding: chunked
Content-Type: text/html; charset=iso-8859-1
+++CLOSE 30654+++
+++GET 30655+++
GET /download/38/h1_0205.0 HTTP/1.0
User-Agent: BOINC client
Host: einstein.astro.gla.ac.uk:80
Range: bytes=14736000-
Accept: */*
Connection: keep-alive
+++RESP 30655+++
HTTP/1.0 416 Requested Range Not Satisfiable
Date: Mon, 27 Jun 2005 23:41:54 GMT
Server: Apache/2.0.54 (Debian GNU/Linux) DAV/2 SVN/1.1.4 mod_python/3.1.3 Python/2.3.5 PHP/4.3.10-15 mod_ssl/2.0.54 OpenSSL/0.9.7e mod_perl/1.999.21 Perl/v5.8.4
Keep-Alive: timeout=15, max=88
Connection: Keep-Alive
Transfer-Encoding: chunked
Content-Type: text/html; charset=iso-8859-1
+++CLOSE 30655+++
There is some filesize wrong!
Aloha, Uli
I had the same problem just
)
I had the same problem just now and I had to reset the project on that PC.
The reason :
It had two download tasks running on exactly the same file. (h1_0400.0)
One was downloaded successfully with the expected file size and the other downloader "wondered where those bytes all came from" and reported a file size error too with a retry every few seconds.
BOINC 4.19, Dual CPU P3s
After the reset it did download stuff successfully but still it shows is "download failed". Nothing missing but I guess I cannot allow BOINC to have two files with the same filename ;-)
There must be something damaged on server/scheduler side or in the WU XML config.
After a reset I got a
)
After a reset I got a H1_501.0
Same problem first - but then after successful(!) transfer of H1_501.0 BOINC got a request to delete H1_501.0 while it was still downloading H1_501.0 on the other download thread.
Of course the client didn't like that too much either - now there's a checksum error, 2 tasks are crunching and a few are still in "downloading" state
Very weird !
The story continues : After
)
The story continues : After manually contacting the scheduler to report the error, it tried to delete H1_501.0
BOINC was very sad and told me it couldn't delete H1_501.0 .... but the work units are happy now and not trying to download H1_501.0 again (as it's still there of course)
___________
I guess it's the WU configuration that is wrong, the scheduler request which I saved after the first problem had this in it :
H1_0400.0
h1_0400.0
i.e. twice the same stuff
I would rate this as a critical problem
RE: I would rate this as
)
Same here, just caught one machine in an endless loop here is an excerpt
6/27/2005 8:16:21 PM|Einstein@Home|Temporarily failed download of h1_0673.0: 416
6/27/2005 8:16:23 PM|Einstein@Home|Couldn't delete file projects/einstein.phys.uwm.edu/h1_0673.0
6/27/2005 8:16:24 PM|Einstein@Home|Started download of h1_0673.0
6/27/2005 8:16:25 PM|Einstein@Home|Temporarily failed download of h1_0673.0: 416
6/27/2005 8:16:27 PM|Einstein@Home|Couldn't delete file projects/einstein.phys.uwm.edu/h1_0673.0
6/27/2005 8:16:27 PM|Einstein@Home|Started download of h1_0673.0
6/27/2005 8:16:28 PM|Einstein@Home|Temporarily failed download of h1_0673.0: 416
6/27/2005 8:16:30 PM|Einstein@Home|Couldn't delete file projects/einstein.phys.uwm.edu/h1_0673.0
6/27/2005 8:16:30 PM|Einstein@Home|Started download of h1_0673.0
6/27/2005 8:16:31 PM|Einstein@Home|Temporarily failed download of h1_0673.0: 416
It went on till I aborted transfer which appears to have killed the issue. This machine running Einstein beta and 4.45 windows.
RE: RE: I would rate
)
Shut down boinc and restart it. Usually "exit" in boincmgr will do it, but the boinc process must end. If it doesn't, use the taskmanager to "kill" it.
Theres a bug in BOINC where temporarily failed downloads keep the file open which can cause the problems you see. When boinc ends, Windows will close all the files.
RE: new h1 units not
)
all i did after the bad download was to abort it and it seems to be running ok now.
kenlo
RE: RE: new h1 units not
)
Thats good. But run Process Explorer, look at the handles for the BOINC process, and see if theres any for h1_0326.5. Or any other h1_* file.
Its fine for the einstein application to use these, but BOINC shouldn't hold on to the file. It'll cause problems later, when BOINC has to delete it. Which shouldn't be for a few weeks yet, when the scheduler decides its time to work in a different set of data.
EDIT:
The "download looping" problem is in boinc 4.43 and fixed with 4.45. Don't remember whether 4.45 fixes the "open handle" one though.
EDIT**2:
From Roberts post, I'd say the "open handle" bug isn't fixed in 4.45. Thats what happens when downloads fail like that, if BOINC leaves the file open, it can't delete the file to download it again. Thats a problem for Einstein@home, where one file is downloaded for all the WU's to use. In that case, its probably a good idea to restart BOINC.