Uploads seem to be stuck ATM

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 2139
Credit: 2752648967
RAC: 1479594

Two questions. 1) Are your

Two questions.

1) Are your uploads going through?

2) Did you look to see if the 'timeout' message appeared 60 seconds later?

Jim1348
Jim1348
Joined: 19 Jan 06
Posts: 463
Credit: 257957147
RAC: 0

Richard Haselgrove wrote:Two

Richard Haselgrove wrote:

Two questions.

1) Are your uploads going through?

2) Did you look to see if the 'timeout' message appeared 60 seconds later?

I guess I left out the good part.  Yes, it does timeout.  But no, they are not going through. 

I have about a hundred stuck in upload.

241    Einstein@Home    1/17/2019 6:36:35 PM    [http] [ID#49] Received header from server: HTTP/1.1 504 Gateway Time-out
    
242    Einstein@Home    1/17/2019 6:36:35 PM    [http] [ID#49] Received header from server: Server: nginx
    
243    Einstein@Home    1/17/2019 6:36:35 PM    [http] [ID#49] Received header from server: Date: Thu, 17 Jan 2019 23:37:08 GMT
    
244    Einstein@Home    1/17/2019 6:36:35 PM    [http] [ID#49] Received header from server: Content-Type: text/html
    
245    Einstein@Home    1/17/2019 6:36:35 PM    [http] [ID#49] Received header from server: Content-Length: 433
    
246    Einstein@Home    1/17/2019 6:36:35 PM    [http] [ID#49] Received header from server: Connection: keep-alive
    
247    Einstein@Home    1/17/2019 6:36:35 PM    [http] [ID#49] Received header from server: ETag: "593ad040-1b1"
    
248    Einstein@Home    1/17/2019 6:36:35 PM    [http] [ID#49] Received header from server:
    
249    Einstein@Home    1/17/2019 6:36:35 PM    [http] [ID#48] Received header from server: HTTP/1.1 504 Gateway Time-out
    
250    Einstein@Home    1/17/2019 6:36:35 PM    [http] [ID#48] Received header from server: Server: nginx
    
251    Einstein@Home    1/17/2019 6:36:35 PM    [http] [ID#48] Received header from server: Date: Thu, 17 Jan 2019 23:37:08 GMT
    
252    Einstein@Home    1/17/2019 6:36:35 PM    [http] [ID#48] Received header from server: Content-Type: text/html
    
253    Einstein@Home    1/17/2019 6:36:35 PM    [http] [ID#48] Received header from server: Content-Length: 433
    
254    Einstein@Home    1/17/2019 6:36:35 PM    [http] [ID#48] Received header from server: Connection: keep-alive
    
255    Einstein@Home    1/17/2019 6:36:35 PM    [http] [ID#48] Received header from server: ETag: "593ad040-1b1"
    
256    Einstein@Home    1/17/2019 6:36:35 PM    [http] [ID#48] Received header from server:
    
257    Einstein@Home    1/17/2019 6:36:35 PM    [http] [ID#49] Info:  Connection #38 to host einstein4.aei.uni-hannover.de left intact    
258    Einstein@Home    1/17/2019 6:36:35 PM    [http] [ID#48] Info:  Connection #37 to host einstein4.aei.uni-hannover.de left intact    
259    Einstein@Home    1/17/2019 6:36:36 PM    Temporarily failed upload of LATeah2009L_1052.0_0_0.0_2414836_0_0: transient HTTP error    
260    Einstein@Home    1/17/2019 6:36:36 PM    Backing off 00:03:07 on upload of LATeah2009L_1052.0_0_0.0_2414836_0_0    
261    Einstein@Home    1/17/2019 6:36:36 PM    Temporarily failed upload of LATeah2009L_1052.0_0_0.0_2414836_0_1: transient HTTP error    
262    Einstein@Home    1/17/2019 6:36:36 PM    Backing off 00:02:57 on upload of LATeah2009L_1052.0_0_0.0_2414836_0_1    
263    Einstein@Home    1/17/2019 6:46:43 PM    Computation for task LATeah2009L_1052.0_0_0.0_2357542_0 finished    

archae86
archae86
Joined: 6 Dec 05
Posts: 3145
Credit: 7023244931
RAC: 1830275

Richard Haselgrove

Richard Haselgrove wrote:
Please use the debug tools to tell the staff exactly what they need to kick Tongue Out

Richard, I turned on some of the event log debug options.  Then, with 15 tasks pending waiting out their penalty time in the transfer queue, I selected one and specified a retry.   Superficially, this initially seemed to work, including a prompt "Info:  We are completely uploaded and fine" message.  However the transfer tab did not show that upload as finishing, and 60 seconds after the initial request the message log show a gateway timeout.

I normally would not post so long a log fragment, but I truly don't know what might be useful here, so here is the complete log for two minutes, starting with my retry request, and ending with the seven-minute backoff.

Stoll9

433 Einstein@Home 1/17/2019 5:34:13 PM [fxd] starting upload, upload_offset 0
434 Einstein@Home 1/17/2019 5:34:13 PM [http] HTTP_OP::libcurl_exec(): ca-bundle 'C:\Program Files\BOINC\ca-bundle.crt'
435 Einstein@Home 1/17/2019 5:34:13 PM [http] HTTP_OP::libcurl_exec(): ca-bundle set
436 Einstein@Home 1/17/2019 5:34:13 PM Started upload of LATeah2008L_1044.0_0_0.0_1712454_0_0
437 Einstein@Home 1/17/2019 5:34:13 PM [file_xfer] URL: http://einstein4.aei.uni-hannover.de/EinsteinAtHome/cgi-bin/file_upload_handler_medium
438 Einstein@Home 1/17/2019 5:34:13 PM [fxd] starting upload, upload_offset 0
439 Einstein@Home 1/17/2019 5:34:13 PM [http] HTTP_OP::libcurl_exec(): ca-bundle 'C:\Program Files\BOINC\ca-bundle.crt'
440 Einstein@Home 1/17/2019 5:34:13 PM [http] HTTP_OP::libcurl_exec(): ca-bundle set
441 Einstein@Home 1/17/2019 5:34:13 PM Started upload of LATeah2008L_1036.0_0_0.0_7187214_1_0
442 Einstein@Home 1/17/2019 5:34:13 PM [file_xfer] URL: http://einstein4.aei.uni-hannover.de/EinsteinAtHome/cgi-bin/file_upload_handler_medium
443 Einstein@Home 1/17/2019 5:34:13 PM [http] [ID#27] Info: Connection 11 seems to be dead!
444 Einstein@Home 1/17/2019 5:34:13 PM [http] [ID#27] Info: Closing connection 11
445 Einstein@Home 1/17/2019 5:34:13 PM [http] [ID#27] Info: Connection 12 seems to be dead!
446 Einstein@Home 1/17/2019 5:34:13 PM [http] [ID#27] Info: Closing connection 12
447 Einstein@Home 1/17/2019 5:34:13 PM [http] [ID#28] Info: Found bundle for host einstein4.aei.uni-hannover.de: 0x5209160 [serially]
448 Einstein@Home 1/17/2019 5:34:13 PM [http] [ID#27] Info: Trying 130.75.116.34...
449 Einstein@Home 1/17/2019 5:34:13 PM [http] [ID#28] Info: Hostname was found in DNS cache
450 Einstein@Home 1/17/2019 5:34:13 PM [http] [ID#28] Info: Trying 130.75.116.34...
451 Einstein@Home 1/17/2019 5:34:13 PM [http] [ID#28] Info: Connected to einstein4.aei.uni-hannover.de (130.75.116.34) port 80 (#14)
452 Einstein@Home 1/17/2019 5:34:13 PM [http] [ID#28] Sent header to server: POST /EinsteinAtHome/cgi-bin/file_upload_handler_medium HTTP/1.1

453 Einstein@Home 1/17/2019 5:34:13 PM [http] [ID#28] Sent header to server: Host: einstein4.aei.uni-hannover.de

454 Einstein@Home 1/17/2019 5:34:13 PM [http] [ID#28] Sent header to server: User-Agent: BOINC client (windows_x86_64 7.14.0)

455 Einstein@Home 1/17/2019 5:34:13 PM [http] [ID#28] Sent header to server: Accept: */*

456 Einstein@Home 1/17/2019 5:34:13 PM [http] [ID#28] Sent header to server: Accept-Encoding: deflate, gzip

457 Einstein@Home 1/17/2019 5:34:13 PM [http] [ID#28] Sent header to server: Content-Type: application/x-www-form-urlencoded

458 Einstein@Home 1/17/2019 5:34:13 PM [http] [ID#28] Sent header to server: Accept-Language: en_US

459 Einstein@Home 1/17/2019 5:34:13 PM [http] [ID#28] Sent header to server: Content-Length: 830

460 Einstein@Home 1/17/2019 5:34:13 PM [http] [ID#28] Sent header to server:

461 Einstein@Home 1/17/2019 5:34:13 PM [http] [ID#28] Info: We are completely uploaded and fine
462 Einstein@Home 1/17/2019 5:34:13 PM [http] [ID#27] Info: Connected to einstein4.aei.uni-hannover.de (130.75.116.34) port 80 (#13)
463 Einstein@Home 1/17/2019 5:34:13 PM [http] [ID#27] Sent header to server: POST /EinsteinAtHome/cgi-bin/file_upload_handler_medium HTTP/1.1

464 Einstein@Home 1/17/2019 5:34:13 PM [http] [ID#27] Sent header to server: Host: einstein4.aei.uni-hannover.de

465 Einstein@Home 1/17/2019 5:34:13 PM [http] [ID#27] Sent header to server: User-Agent: BOINC client (windows_x86_64 7.14.0)

466 Einstein@Home 1/17/2019 5:34:13 PM [http] [ID#27] Sent header to server: Accept: */*

467 Einstein@Home 1/17/2019 5:34:13 PM [http] [ID#27] Sent header to server: Accept-Encoding: deflate, gzip

468 Einstein@Home 1/17/2019 5:34:13 PM [http] [ID#27] Sent header to server: Content-Type: application/x-www-form-urlencoded

469 Einstein@Home 1/17/2019 5:34:13 PM [http] [ID#27] Sent header to server: Accept-Language: en_US

470 Einstein@Home 1/17/2019 5:34:13 PM [http] [ID#27] Sent header to server: Content-Length: 919

471 Einstein@Home 1/17/2019 5:34:13 PM [http] [ID#27] Sent header to server:

472 Einstein@Home 1/17/2019 5:34:13 PM [http] [ID#27] Info: We are completely uploaded and fine
473 1/17/2019 5:34:16 PM NOTICES::write: seqno 5, refresh false, 4 notices
474 Einstein@Home 1/17/2019 5:35:13 PM [http] [ID#28] Received header from server: HTTP/1.1 504 Gateway Time-out

475 Einstein@Home 1/17/2019 5:35:13 PM [http] [ID#28] Received header from server: Server: nginx

476 Einstein@Home 1/17/2019 5:35:13 PM [http] [ID#28] Received header from server: Date: Fri, 18 Jan 2019 00:35:15 GMT

477 Einstein@Home 1/17/2019 5:35:13 PM [http] [ID#28] Received header from server: Content-Type: text/html

478 Einstein@Home 1/17/2019 5:35:13 PM [http] [ID#28] Received header from server: Content-Length: 433

479 Einstein@Home 1/17/2019 5:35:13 PM [http] [ID#28] Received header from server: Connection: keep-alive

480 Einstein@Home 1/17/2019 5:35:13 PM [http] [ID#28] Received header from server: ETag: "593ad040-1b1"

481 Einstein@Home 1/17/2019 5:35:13 PM [http] [ID#28] Received header from server:

482 Einstein@Home 1/17/2019 5:35:13 PM [http] [ID#28] Received header from server: <!DOCTYPE html>
483 Einstein@Home 1/17/2019 5:35:13 PM [http] [ID#28] Received header from server: <html>
484 Einstein@Home 1/17/2019 5:35:13 PM [http] [ID#28] Received header from server: <head>
485 Einstein@Home 1/17/2019 5:35:13 PM [http] [ID#28] Received header from server: <meta charset="UTF-8">
486 Einstein@Home 1/17/2019 5:35:13 PM [http] [ID#28] Received header from server: <title>Technical Problems</title>
487 Einstein@Home 1/17/2019 5:35:13 PM [http] [ID#28] Received header from server: <style>
488 Einstein@Home 1/17/2019 5:35:13 PM [http] [ID#28] Received header from server: body {
489 Einstein@Home 1/17/2019 5:35:13 PM [http] [ID#28] Received header from server: width: 35em;
490 Einstein@Home 1/17/2019 5:35:13 PM [http] [ID#28] Received header from server: margin: 0 auto;
491 Einstein@Home 1/17/2019 5:35:13 PM [http] [ID#28] Received header from server: font-family: Tahoma, Verdana, Arial, sans-serif;
492 Einstein@Home 1/17/2019 5:35:13 PM [http] [ID#28] Received header from server: }
493 Einstein@Home 1/17/2019 5:35:13 PM [http] [ID#28] Received header from server: </style>
494 Einstein@Home 1/17/2019 5:35:13 PM [http] [ID#28] Received header from server: </head>
495 Einstein@Home 1/17/2019 5:35:13 PM [http] [ID#28] Received header from server: <body>
496 Einstein@Home 1/17/2019 5:35:13 PM [http] [ID#28] Received header from server: <h1>This server is experiencing technical problems</h1>
497 Einstein@Home 1/17/2019 5:35:13 PM [http] [ID#28] Received header from server: <p>If you see this page, something is broken on the server. Administrators are informed and will fix the problem soon.</p>
498 Einstein@Home 1/17/2019 5:35:13 PM
499 Einstein@Home 1/17/2019 5:35:13 PM [http] [ID#28] Received header from server: </html>
500 1/17/2019 5:35:13 PM [http_xfer] [ID#28] HTTP: wrote 433 bytes
501 Einstein@Home 1/17/2019 5:35:13 PM [http] [ID#28] Info: Connection #14 to host einstein4.aei.uni-hannover.de left intact
502 Einstein@Home 1/17/2019 5:35:13 PM [http] [ID#27] Received header from server: HTTP/1.1 504 Gateway Time-out

503 Einstein@Home 1/17/2019 5:35:13 PM [http] [ID#27] Received header from server: Server: nginx

504 Einstein@Home 1/17/2019 5:35:13 PM [http] [ID#27] Received header from server: Date: Fri, 18 Jan 2019 00:35:15 GMT

505 Einstein@Home 1/17/2019 5:35:13 PM [http] [ID#27] Received header from server: Content-Type: text/html

506 Einstein@Home 1/17/2019 5:35:13 PM [http] [ID#27] Received header from server: Content-Length: 433

507 Einstein@Home 1/17/2019 5:35:13 PM [http] [ID#27] Received header from server: Connection: keep-alive

508 Einstein@Home 1/17/2019 5:35:13 PM [http] [ID#27] Received header from server: ETag: "593ad040-1b1"

509 Einstein@Home 1/17/2019 5:35:13 PM [http] [ID#27] Received header from server:

510 1/17/2019 5:35:13 PM [http_xfer] [ID#27] HTTP: wrote 433 bytes
511 Einstein@Home 1/17/2019 5:35:13 PM [http] [ID#27] Info: Connection #13 to host einstein4.aei.uni-hannover.de left intact
512 Einstein@Home 1/17/2019 5:35:14 PM [file_xfer] http op done; retval -184 (transient HTTP error)
513 Einstein@Home 1/17/2019 5:35:14 PM [file_xfer] http op done; retval -184 (transient HTTP error)
514 Einstein@Home 1/17/2019 5:35:14 PM [file_xfer] file transfer status -184 (transient HTTP error)
515 Einstein@Home 1/17/2019 5:35:14 PM Temporarily failed upload of LATeah2008L_1044.0_0_0.0_1712454_0_0: transient HTTP error
516 Einstein@Home 1/17/2019 5:35:14 PM Backing off 00:21:05 on upload of LATeah2008L_1044.0_0_0.0_1712454_0_0
517 Einstein@Home 1/17/2019 5:35:14 PM [file_xfer] file transfer status -184 (transient HTTP error)
518 Einstein@Home 1/17/2019 5:35:14 PM Temporarily failed upload of LATeah2008L_1036.0_0_0.0_7187214_1_0: transient HTTP error
519 Einstein@Home 1/17/2019 5:35:14 PM Backing off 00:33:49 on upload of LATeah2008L_1036.0_0_0.0_7187214_1_0
520 Einstein@Home 1/17/2019 5:35:14 PM [fxd] starting upload, upload_offset 0
521 Einstein@Home 1/17/2019 5:35:14 PM [http] HTTP_OP::libcurl_exec(): ca-bundle 'C:\Program Files\BOINC\ca-bundle.crt'
522 Einstein@Home 1/17/2019 5:35:14 PM [http] HTTP_OP::libcurl_exec(): ca-bundle set
523 Einstein@Home 1/17/2019 5:35:14 PM Started upload of LATeah2008L_1044.0_0_0.0_501853_1_0
524 Einstein@Home 1/17/2019 5:35:14 PM [file_xfer] URL: http://einstein4.aei.uni-hannover.de/EinsteinAtHome/cgi-bin/file_upload_handler_medium
525 Einstein@Home 1/17/2019 5:35:14 PM [fxd] starting upload, upload_offset 0
526 Einstein@Home 1/17/2019 5:35:14 PM [http] HTTP_OP::libcurl_exec(): ca-bundle 'C:\Program Files\BOINC\ca-bundle.crt'
527 Einstein@Home 1/17/2019 5:35:14 PM [http] HTTP_OP::libcurl_exec(): ca-bundle set
528 Einstein@Home 1/17/2019 5:35:14 PM Started upload of LATeah2008L_1044.0_0_0.0_501853_1_1
529 Einstein@Home 1/17/2019 5:35:14 PM [file_xfer] URL: http://einstein4.aei.uni-hannover.de/EinsteinAtHome/cgi-bin/file_upload_handler_medium
530 Einstein@Home 1/17/2019 5:35:14 PM [http] [ID#29] Info: Found bundle for host einstein4.aei.uni-hannover.de: 0x5209160 [can pipeline]
531 Einstein@Home 1/17/2019 5:35:14 PM [http] [ID#29] Info: Re-using existing connection! (#13) with host einstein4.aei.uni-hannover.de
532 Einstein@Home 1/17/2019 5:35:14 PM [http] [ID#29] Info: Connected to einstein4.aei.uni-hannover.de (130.75.116.34) port 80 (#13)
533 Einstein@Home 1/17/2019 5:35:14 PM [http] [ID#29] Sent header to server: POST /EinsteinAtHome/cgi-bin/file_upload_handler_medium HTTP/1.1

534 Einstein@Home 1/17/2019 5:35:14 PM [http] [ID#29] Sent header to server: Host: einstein4.aei.uni-hannover.de

535 Einstein@Home 1/17/2019 5:35:14 PM [http] [ID#29] Sent header to server: User-Agent: BOINC client (windows_x86_64 7.14.0)

536 Einstein@Home 1/17/2019 5:35:14 PM [http] [ID#29] Sent header to server: Accept: */*

537 Einstein@Home 1/17/2019 5:35:14 PM [http] [ID#29] Sent header to server: Accept-Encoding: deflate, gzip

538 Einstein@Home 1/17/2019 5:35:14 PM [http] [ID#29] Sent header to server: Content-Type: application/x-www-form-urlencoded

539 Einstein@Home 1/17/2019 5:35:14 PM [http] [ID#29] Sent header to server: Accept-Language: en_US

540 Einstein@Home 1/17/2019 5:35:14 PM [http] [ID#29] Sent header to server: Content-Length: 906

541 Einstein@Home 1/17/2019 5:35:14 PM [http] [ID#29] Sent header to server:

542 Einstein@Home 1/17/2019 5:35:14 PM [http] [ID#30] Info: Found bundle for host einstein4.aei.uni-hannover.de: 0x5209160 [can pipeline]
543 Einstein@Home 1/17/2019 5:35:14 PM [http] [ID#30] Info: Re-using existing connection! (#14) with host einstein4.aei.uni-hannover.de
544 Einstein@Home 1/17/2019 5:35:14 PM [http] [ID#30] Info: Connected to einstein4.aei.uni-hannover.de (130.75.116.34) port 80 (#14)
545 Einstein@Home 1/17/2019 5:35:14 PM [http] [ID#30] Sent header to server: POST /EinsteinAtHome/cgi-bin/file_upload_handler_medium HTTP/1.1

546 Einstein@Home 1/17/2019 5:35:14 PM [http] [ID#30] Sent header to server: Host: einstein4.aei.uni-hannover.de

547 Einstein@Home 1/17/2019 5:35:14 PM [http] [ID#30] Sent header to server: User-Agent: BOINC client (windows_x86_64 7.14.0)

548 Einstein@Home 1/17/2019 5:35:14 PM [http] [ID#30] Sent header to server: Accept: */*

549 Einstein@Home 1/17/2019 5:35:14 PM [http] [ID#30] Sent header to server: Accept-Encoding: deflate, gzip

550 Einstein@Home 1/17/2019 5:35:14 PM [http] [ID#30] Sent header to server: Content-Type: application/x-www-form-urlencoded

551 Einstein@Home 1/17/2019 5:35:14 PM [http] [ID#30] Sent header to server: Accept-Language: en_US

552 Einstein@Home 1/17/2019 5:35:14 PM [http] [ID#30] Sent header to server: Content-Length: 892

553 Einstein@Home 1/17/2019 5:35:14 PM [http] [ID#30] Sent header to server:

554 Einstein@Home 1/17/2019 5:35:14 PM [http] [ID#29] Info: We are completely uploaded and fine
555 Einstein@Home 1/17/2019 5:35:14 PM [http] [ID#30] Info: We are completely uploaded and fine
556 1/17/2019 5:35:16 PM NOTICES::write: seqno 5, refresh false, 4 notices
557 Einstein@Home 1/17/2019 5:35:18 PM Computation for task LATeah2008L_1044.0_0_0.0_2950641_1 finished
558 Einstein@Home 1/17/2019 5:35:18 PM Starting task LATeah2008L_1044.0_0_0.0_2949580_0
559 Einstein@Home 1/17/2019 5:36:14 PM [http] [ID#30] Received header from server: HTTP/1.1 504 Gateway Time-out

560 Einstein@Home 1/17/2019 5:36:14 PM [http] [ID#30] Received header from server: Server: nginx

561 Einstein@Home 1/17/2019 5:36:14 PM [http] [ID#30] Received header from server: Date: Fri, 18 Jan 2019 00:36:16 GMT

562 Einstein@Home 1/17/2019 5:36:14 PM [http] [ID#30] Received header from server: Content-Type: text/html

563 Einstein@Home 1/17/2019 5:36:14 PM [http] [ID#30] Received header from server: Content-Length: 433

564 Einstein@Home 1/17/2019 5:36:14 PM [http] [ID#30] Received header from server: Connection: keep-alive

565 Einstein@Home 1/17/2019 5:36:14 PM [http] [ID#30] Received header from server: ETag: "593ad040-1b1"

566 Einstein@Home 1/17/2019 5:36:14 PM [http] [ID#30] Received header from server:

567 1/17/2019 5:36:14 PM [http_xfer] [ID#30] HTTP: wrote 433 bytes
568 Einstein@Home 1/17/2019 5:36:14 PM [http] [ID#30] Info: Connection #14 to host einstein4.aei.uni-hannover.de left intact
569 Einstein@Home 1/17/2019 5:36:14 PM [file_xfer] http op done; retval -184 (transient HTTP error)
570 Einstein@Home 1/17/2019 5:36:14 PM [file_xfer] file transfer status -184 (transient HTTP error)
571 Einstein@Home 1/17/2019 5:36:14 PM Temporarily failed upload of LATeah2008L_1044.0_0_0.0_501853_1_1: transient HTTP error
572 Einstein@Home 1/17/2019 5:36:14 PM [file_xfer] project-wide xfer delay for 677.012848 sec
573 Einstein@Home 1/17/2019 5:36:14 PM Backing off 00:07:54 on upload of LATeah2008L_1044.0_0_0.0_501853_1_1
574 Einstein@Home 1/17/2019 5:36:14 PM [fxd] starting upload, upload_offset 0
575 Einstein@Home 1/17/2019 5:36:14 PM [http] HTTP_OP::libcurl_exec(): ca-bundle 'C:\Program Files\BOINC\ca-bundle.crt'
576 Einstein@Home 1/17/2019 5:36:14 PM [http] HTTP_OP::libcurl_exec(): ca-bundle set

 

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5842
Credit: 109383596146
RAC: 35952318

Richard Haselgrove

Richard Haselgrove wrote:

Another case of

....
"Upload handler medium" can't talk to the next server in the chain. We've had several of those recently, and Bernd put an automatic restart script into place. But I don't know how long we'll have to wait until it next runs.

Maybe we need to have a new 'restart the restart' script to give the restart script a kick in the pants for being overly obtuse about the matter :-).

Here is when one of my scripts first detected the problem (the 2nd item in the logs which happened on loop 8) :-


Loop Item Time     Hostname  Octet Uptime KDE   RPC Status TPI  Status
==== ==== ======== ========  ===== ====== ===   === ====== ===  ======================
  1.   1.  20:17:14 q6600-06 ( .50)  26.1d  v5 1181s     OK  37  Info: Low ticks but then OK 1st=1 2nd=37.
  8.   2.  03:15:04 styx     (  .9)  13.5d  v5 6256s   HIGH  10  Ticks OK

So at 3:15AM local time (5:15PM UTC yesterday) a host reported that it had been 6256s since it had last been able to make successful contact with the project.  This was immediately followed by every other host in the fleet reporting much the same claims - so much so that by the time I arrived on site at 6:00AM, there were many pages of such messages in the logs.  Hosts usually make contact about every hour or so.  I'm guessing the uploads probably failed about 30mins earlier, probably before 5PM UTC.

It is now around 9 hours since the problem started and uploads are still stuck.  My hosts have far too many stuck uploads to be able to request new work so I will run out shortly.  When the problem is eventually cleared, I'll be left with multi-hour backoffs so I'll need to run another script to override that.  Not a big deal for me but just a waste for many volunteers who use a set and forget strategy.   When the uploads clear and caches can be replenished, that will need to be carefully managed (particularly for anyone with a similar 'unusual' situation like mine) because of the excessive load it will place on a shared internet connection.  Once again, I have scripts to deal with this - and unfortunately, plenty of recent experience on how best to manage this - so I'm not really worried for myself.

The thing that is really starting to annoy me is why can't this be solved more efficiently?  If it's just a matter of killing and restarting some stuck process, surely it doesn't need to take this long?  And I don't think it's fair to expect a volunteer to have to know about and set the appropriate debug options in order to produce an extensive log of low level (and perhaps indecipherable to many) stuff to 'assist' in the diagnosis of this problem.  Particularly when such a log won't make any difference to a supposedly "automatic restart script" that is handling the problem anyway!!  It might be a different matter if such a report could be acted upon immediately and thereby truncate the overly long hiatus between problem commencement and ultimate rectification.

It always feels better to have a good whinge :-).

 

Cheers,
Gary.

mountkidd
mountkidd
Joined: 14 Jun 12
Posts: 175
Credit: 10891710878
RAC: 5531603

Quote: The thing that is

Quote:
The thing that is really starting to annoy me is why can't this be solved more efficiently?  If it's just a matter of killing and restarting some stuck process, surely it doesn't need to take this long? 

It's been about 11 hrs now and no one in control seems to be doing anything.  Bernd's script hasn't solved the problem!  There needs to be a more rigorous solution...

 

Saenger
Saenger
Joined: 15 Feb 05
Posts: 403
Credit: 33009522
RAC: 0

Mine just got uploaded and

Mine just got uploaded and reported, so everything seems fine now,

THX for the fast reaction.

Grüße vom Sänger

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5842
Credit: 109383596146
RAC: 35952318

With uploading now able to

With uploading now able to complete successfully, you may wish to check hosts to see if any are still in an extended backoff.  Whilst this should clear as new work is completed, you may wish to force it by using the 'retry now' function on the transfers tab of BOINC Manager.

Cheers,
Gary.

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 2139
Credit: 2752648967
RAC: 1479594

Good to see that we're back

Good to see that we're back up and running.

Looking back through my logs, the first sign of trouble was 17-Jan-2019 15:47:35 - at this time of year, my logs are in UTC, which makes things easier.

When things are running smoothly, I complete a task every 10 minutes or so - and new-ish versions of BOINC try every new upload once even when there are backoffs, so I'd have been back in business immediately if the server had come back online while I still had work cached.

But instead, three machine logs show

18/01/2019 01:20:54 | Einstein@Home | Temporarily failed upload of p2030.20170410.G39.44-01.25.N.b4s0g0.00000_291_0_0: transient HTTP error
18/01/2019 05:11:32 | Einstein@Home | Started upload of p2030.20170410.G39.44-01.25.N.b4s0g0.00000_291_0_0

17/01/2019 23:12:09 | Einstein@Home | Temporarily failed upload of p2030.20170410.G39.44-01.25.N.b4s0g0.00000_2683_0_0: transient HTTP error
18/01/2019 05:07:25 | Einstein@Home | Started upload of p2030.20170410.G39.44-01.25.N.b3s0g0.00000_2520_1_0

18/01/2019 02:05:31 | Einstein@Home | Temporarily failed upload of p2030.20170410.G39.44-01.25.N.b4s0g0.00000_1307_0_0: transient HTTP error
18/01/2019 05:40:58 | Einstein@Home | Started upload of p2030.20170410.G39.44-01.25.N.b4s0g0.00000_1298_1_0

- so I can't tell when it came back, except to say between 02:05 and 05:07. But that's a long time for the outage to last, when it's a known problem with an automatic scripted recovery in place. I think I'm going to suggest to Bernd that he sets the script to run more often.

Viking69
Viking69
Joined: 2 Mar 05
Posts: 6
Credit: 8306440
RAC: 77

My reporting seems to not be

My reporting seems to not be working :

 

2/2/2019 11:17:27 AM | Einstein@Home | Scheduler request completed
2/2/2019 11:18:09 AM | | Project communication failed: attempting access to reference site
2/2/2019 11:18:11 AM | | Internet access OK - project servers may be temporarily down.

 

 


Free your MIND, and the rest will follow!

Millenium
Millenium
Joined: 8 Oct 14
Posts: 21
Credit: 33102476
RAC: 0

Uploads stuck here too

Uploads stuck here too

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.