I've got one (*) finished O1 all-sky task that doesn't upload any of the three result files and also doesn't seem to show up in my E@h tasks list. Other O1 tasks from this client did upload since then.
Maybe it's just coincidence, but roughly at that time I got this work unit, another (545985186 , h1_0051.70_O1C01Cl1In1__O1AS20-100T_51.8Hz_56) was downloaded without deadline. It was running in high-priority right away. I aborted that one.
(*)
318 Einstein@Home 03-03-2016 11:02 [error] Error reported by file upload server: can't open file
319 Einstein@Home 03-03-2016 11:02 Temporarily failed upload of h1_0051.85_O1C01Cl1In1__O1AS20-100T_51.95Hz_374_1_0: transient upload error
320 Einstein@Home 03-03-2016 11:02 Backing off 03:30:44 on upload of h1_0051.85_O1C01Cl1In1__O1AS20-100T_51.95Hz_374_1_0
Name h1_0051.85_O1C01Cl1In1__O1AS20-100T_51.95Hz_374_1
Application Gravitational Wave search O1 all-sky tuning 1.04 (AVX)
Workunit name h1_0051.85_O1C01Cl1In1__O1AS20-100T_51.95Hz_374
State Uploading
Received 29-02-2016 12:13
Report deadline 07-03-2016 12:13
Estimated app speed 5,91 GFLOPs/sec
Estimated task size 144.000 GFLOPs
CPU time at last checkpoint 00:00:00
CPU time 12:26:49
Elapsed time 12:26:43
Estimated time remaining 00:00:00
Fraction done 100%
Virtual memory size 0,00 MB
Working set size 0,00 MB
Copyright © 2024 Einstein@Home. All rights reserved.
Error reported by file upload server: can't open file
)
Thanks for the report. I'll see how I can fix that.
This is fixed now. Uploads
)
This is fixed now. Uploads should resume normally.
Hi Christian, the uploads are
)
Hi Christian, the uploads are still stuck. Error message changed, though:
113 Einstein@Home 03-03-2016 23:27 Started upload of h1_0051.85_O1C01Cl1In1__O1AS20-100T_51.95Hz_374_1_0
114 Einstein@Home 03-03-2016 23:27 Started upload of h1_0051.85_O1C01Cl1In1__O1AS20-100T_51.95Hz_374_1_1
115 Einstein@Home 03-03-2016 23:28 [error] Error reported by file upload server: [h1_0051.85_O1C01Cl1In1__O1AS20-100T_51.95Hz_374_1_0] locked by file_upload_handler PID=4219652
116 Einstein@Home 03-03-2016 23:28 [error] Error reported by file upload server: [h1_0051.85_O1C01Cl1In1__O1AS20-100T_51.95Hz_374_1_1] locked by file_upload_handler PID=4219652
I also have a host with an
)
I also have a host with an upload problem too.
Will try again tomorrow evening to upload.
Please try again. This should
)
Please try again. This should be fixed now.
_0 and _1 result files
)
_0 and _1 result files uploaded, _2 uploads to 100% but stays there with yet another error message. The file appears to be 655.99 K in size.
295 Einstein@Home 04-03-2016 14:32 Started upload of h1_0051.85_O1C01Cl1In1__O1AS20-100T_51.95Hz_374_1_2
296 Einstein@Home 04-03-2016 14:32 [error] Error reported by file upload server: length of file h1_0051.85_O1C01Cl1In1__O1AS20-100T_51.95Hz_374_1_2 0 bytes
297 Einstein@Home 04-03-2016 14:32 Temporarily failed upload of h1_0051.85_O1C01Cl1In1__O1AS20-100T_51.95Hz_374_1_2: transient upload error
298 Einstein@Home 04-03-2016 14:32 Backing off 03:50:49 on upload of h1_0051.85_O1C01Cl1In1__O1AS20-100T_51.95Hz_374_1_2
That's one of the strange
)
That's one of the strange cases I wanted to investigate. It turns out the result this host tries to upload was already reported and got validated. That's why any subsequent upload is prohibited. I really would like to know what happened on the Client that the file size increased and the Client thought it needs to re-upload the file again (or the part that got increased) although the result was already reported.
Can you send me a PM about the host setup? I'm most interested in the client_state.xml but would also like to have the result files.
Hi Christian, I'm about to
)
Hi Christian,
I'm about to head out for work, so I'll send you the files and logs later.
One thing, this host is a virtual machine (Linux guest on Windows host). Might be a convenient explanation, but I did not restore a snapshot and upload/report a task twice, the machine has only been started and shut down in a normal way for weeks.
Maybe this is a bad batch? The earlier mentioned task without deadline had an almost "identical" timestamp (exactly 2 hours apart, same minute, not sure which time zone in each case...).
HI think i may have a host
)
HI think i may have a host here with a similar issue.
I cannot see the task i am trying to upload in its task list, which is odd.
I suspect task may have expired and past the deadline, although the task properties say 6 March - as the host is not often powered on it's arriving getting later than usual.
client_state.xml has
[pre]
h1_0051.40_O1C01Cl1In1__O1AS20-100T_51.5Hz_472_2_0
920919.000000
6000000.000000
cf5b61bbb150e16179085ba82805c079
1
http://einstein4.aei.uni-hannover.de/EinsteinAtHome/cgi-bin/file_upload_handler
7
1456729593.816887
1457138536.049261
20.971027
0.000000
1
[/pre]
Event log with a few flags
[pre]
Fri 04 Mar 2016 22:44:01 GMT | | [http_xfer] [ID#6] HTTP: wrote 183 bytes
Fri 04 Mar 2016 22:44:01 GMT | Einstein@Home | [http] [ID#6] Info: Connection #9 to host einstein4.aei.uni-hannover.de left intact
Fri 04 Mar 2016 22:44:02 GMT | Einstein@Home | [http] [ID#7] Received header from server: HTTP/1.1 200 OK
Fri 04 Mar 2016 22:44:02 GMT | Einstein@Home | [http] [ID#7] Info: Server nginx/1.2.1 is not blacklisted
Fri 04 Mar 2016 22:44:02 GMT | Einstein@Home | [http] [ID#7] Received header from server: Server: nginx/1.2.1
Fri 04 Mar 2016 22:44:02 GMT | Einstein@Home | [http] [ID#7] Received header from server: Date: Fri, 04 Mar 2016 22:43:50 GMT
Fri 04 Mar 2016 22:44:02 GMT | Einstein@Home | [http] [ID#7] Received header from server: Content-Type: text/plain
Fri 04 Mar 2016 22:44:02 GMT | Einstein@Home | [http] [ID#7] Received header from server: Transfer-Encoding: chunked
Fri 04 Mar 2016 22:44:02 GMT | Einstein@Home | [http] [ID#7] Received header from server: Connection: keep-alive
Fri 04 Mar 2016 22:44:02 GMT | Einstein@Home | [http] [ID#7] Received header from server:
Fri 04 Mar 2016 22:44:02 GMT | | [http_xfer] [ID#7] HTTP: wrote 183 bytes
Fri 04 Mar 2016 22:44:02 GMT | Einstein@Home | [http] [ID#7] Info: Connection #10 to host einstein4.aei.uni-hannover.de left intact
Fri 04 Mar 2016 22:44:02 GMT | Einstein@Home | [file_xfer] http op done; retval 0 (Success)
Fri 04 Mar 2016 22:44:02 GMT | Einstein@Home | [error] Error reported by file upload server: length of file h1_0051.40_O1C01Cl1In1__O1AS20-100T_51.5Hz_472_2_0 0 bytes
Fri 04 Mar 2016 22:44:02 GMT | Einstein@Home | [file_xfer] parsing upload response: 1 length of file h1_0051.40_O1C01Cl1In1__O1AS20-100T_51.5Hz_472_2_0 0 bytes
Fri 04 Mar 2016 22:44:02 GMT | Einstein@Home | [file_xfer] parsing status: -127
Fri 04 Mar 2016 22:44:02 GMT | Einstein@Home | [file_xfer] http op done; retval 0 (Success)
Fri 04 Mar 2016 22:44:02 GMT | Einstein@Home | [error] Error reported by file upload server: length of file h1_0051.40_O1C01Cl1In1__O1AS20-100T_51.5Hz_472_2_1 0 bytes
Fri 04 Mar 2016 22:44:02 GMT | Einstein@Home | [file_xfer] parsing upload response: 1 length of file h1_0051.40_O1C01Cl1In1__O1AS20-100T_51.5Hz_472_2_1 0 bytes
Fri 04 Mar 2016 22:44:02 GMT | Einstein@Home | [file_xfer] parsing status: -127
Fri 04 Mar 2016 22:44:02 GMT | Einstein@Home | [file_xfer] file transfer status -127 (transient upload error)
Fri 04 Mar 2016 22:44:02 GMT | Einstein@Home | Temporarily failed upload of h1_0051.40_O1C01Cl1In1__O1AS20-100T_51.5Hz_472_2_0: transient upload error
Fri 04 Mar 2016 22:44:02 GMT | Einstein@Home | [file_xfer] project-wide xfer delay for 1078.645987 sec
Fri 04 Mar 2016 22:44:02 GMT | Einstein@Home | Backing off 03:28:53 on upload of h1_0051.40_O1C01Cl1In1__O1AS20-100T_51.5Hz_472_2_0
Fri 04 Mar 2016 22:44:02 GMT | Einstein@Home | [file_xfer] file transfer status -127 (transient upload error)
Fri 04 Mar 2016 22:44:02 GMT | Einstein@Home | Temporarily failed upload of h1_0051.40_O1C01Cl1In1__O1AS20-100T_51.5Hz_472_2_1: transient upload error
Fri 04 Mar 2016 22:44:02 GMT | Einstein@Home | [file_xfer] project-wide xfer delay for 2422.167126 sec
Fri 04 Mar 2016 22:44:02 GMT | Einstein@Home | Backing off 02:12:37 on upload of h1_0051.40_O1C01Cl1In1__O1AS20-100T_51.5Hz_472_2_1[/pre]
PM if you need any files.
Edit: stdoutae.txt
[pre]29-Feb-2016 07:06:32 [Einstein@Home] Computation for task h1_0051.40_O1C01Cl1In1__O1AS20-100T_51.5Hz_472_2 finished
29-Feb-2016 07:06:33 [Einstein@Home] Starting task h1_0051.65_O1C01Cl1In1__O1AS20-100T_51.75Hz_136_1
29-Feb-2016 07:06:34 [Einstein@Home] Started upload of h1_0051.40_O1C01Cl1In1__O1AS20-100T_51.5Hz_472_2_0
29-Feb-2016 07:06:34 [Einstein@Home] Started upload of h1_0051.40_O1C01Cl1In1__O1AS20-100T_51.5Hz_472_2_1
29-Feb-2016 07:06:36 [Einstein@Home] Error reported by file upload server: can't open file
29-Feb-2016 07:06:36 [Einstein@Home] Error reported by file upload server: can't open file
29-Feb-2016 07:06:36 [Einstein@Home] Temporarily failed upload of h1_0051.40_O1C01Cl1In1__O1AS20-100T_51.5Hz_472_2_0: transient upload error
29-Feb-2016 07:06:36 [Einstein@Home] Backing off 00:02:58 on upload of h1_0051.40_O1C01Cl1In1__O1AS20-100T_51.5Hz_472_2_0
29-Feb-2016 07:06:36 [Einstein@Home] Temporarily failed upload of h1_0051.40_O1C01Cl1In1__O1AS20-100T_51.5Hz_472_2_1: transient upload error
29-Feb-2016 07:06:36 [Einstein@Home] Backing off 00:02:35 on upload of h1_0051.40_O1C01Cl1In1__O1AS20-100T_51.5Hz_472_2_129[/pre]
This is getting more and more
)
This is getting more and more weird. For everyone else seeing those transient upload errors I would like to know the file this is happening on and the hostid of the host this is happening on.