Hello everyone,
since yesterday (approx 6PM GMT+1) Einstein@home does not upload finished GPU tasks. A snippet of the event log follows:
01-Jun-2020 09:36:35 [Einstein@Home] Backing off 03:35:57 on upload of h1_1597.95_O2C02Cl4In0__O2MDFV2h_VelaJr1_1598.90Hz_439_2_0
01-Jun-2020 09:36:35 [Einstein@Home] Backing off 05:11:44 on upload of h1_1597.95_O2C02Cl4In0__O2MDFV2h_VelaJr1_1598.90Hz_439_2_1
01-Jun-2020 09:36:37 [---] Internet access OK - project servers may be temporarily down.
01-Jun-2020 09:40:22 [---] Project communication failed: attempting access to reference site
01-Jun-2020 09:40:22 [Einstein@Home] Backing off 04:39:54 on upload of h1_1597.95_O2C02Cl4In0__O2MDFV2h_VelaJr1_1598.90Hz_439_2_2
01-Jun-2020 09:40:22 [Einstein@Home] Backing off 05:22:56 on upload of h1_1597.95_O2C02Cl4In0__O2MDFV2h_VelaJr1_1598.90Hz_433_2_0
01-Jun-2020 09:40:23 [---] Internet access OK - project servers may be temporarily down.
01-Jun-2020 09:40:25 [---] Project communication failed: attempting access to reference site
01-Jun-2020 09:40:25 [Einstein@Home] Backing off 04:54:46 on upload of h1_1597.95_O2C02Cl4In0__O2MDFV2h_VelaJr1_1598.90Hz_433_2_1
01-Jun-2020 09:40:25 [Einstein@Home] Backing off 03:27:52 on upload of h1_1597.95_O2C02Cl4In0__O2MDFV2h_VelaJr1_1598.90Hz_433_2_2
01-Jun-2020 09:40:26 [---] Internet access OK - project servers may be temporarily down.
There is no problem in downloading work units for Einstein@home. Work units for other projects (Rosetta@home and World Community Grid) upload and download fine.
This backoff used to regard only application "Gravitational Wave search O2 Multi-Directional GPU 2.07 (GW-opencl-nvidia)", but now it also happens for "Gamma-ray pulsar binary search #1 on GPUs 1.20 (FGRPopencl1K-nvidia)".
Can someone confirm whether this is indeed a server problem?
Copyright © 2024 Einstein@Home. All rights reserved.
Hi and welcome to the
)
Hi and welcome to the forums!
My uploads work fine so I don't think it's a server issue.
You could try to set http_debug in cc_config.xml to get some more info from the event log.
The easiest way is to open Boinc Manager and then under Options select Event log options.
Http_debug is in the right column about half way down.
Post the log of one upload try and maybe I or someone else might help you.
Hi holmis, thank you for
)
Hi holmis,
thank you for your reply. Here's what I got. I hope is fine for you (and/or others) to debug.
Alessio Purple
)
Hi
Sorry for the late reply but I was kinda hoping someone with more knowledge about network traffic might stop by.
In the quote above I've only included the lines I find interesting in one way or another.
The first line talks about a bundle, I suspect this has to do with ca-certificate bundles and I notice that the number at the end differs from my log. But that could be completely normal. There has been some server changes announced in this thread and there has also been discussions about CA bundles in that thread. But your running a newer Boinc version so it shouldn't affect you.
The second line shows the same IP as in my log so we are connecting to the same server and I'm uploading just fine. Although I'm using Windows and an older Boinc version.
The third line seems a bit strange, I suspect that this might be the cause of the problems as it seems that your host sends garbled data in the header to the server.
The fourth line shows your using Linux and are running Boinc version 7.16.6, on the Boinc download page this is labeled as a development version (beta version) and might be unstable. You might want to try the recommended version to see if that works better.
The following lines show that the server seems to accept the connection regardless of the possibly garbled data in line 3 but then abruptly terminates the connection by sending a RST package to you.
To conclude the only advice I can give you is to try the stable version of Boinc and to make sure your OS is fully updated.
I've just been clicking
)
I've just been clicking 'update' manually, as they just don't upload right away for me either
brit wrote:... as they just
)
Are you talking about "uploading" or are you talking about "reporting"? They are two quite separate operations.
Uploading just gets the results safely stored on an upload server. Reporting is the much more 'costly' operation of opening the online database, grabbing all the available results from the upload server and processing the various records.
The results for completed tasks get uploaded immediately. There should be no need to use 'update' to make that happen. By design, reporting is deliberately delayed for a period (perhaps an hour or two) so that if a couple of tasks complete in that time period, they can be reported in the one set of database operations. This is useful to reduce database load. If you 'update' just to force immediate reporting, which really isn't necessary as BOINC will take care of it in due course, you are just thwarting that potential load saving strategy.
If you have the same sort of uploading problem that the OP reported, can you please follow the earlier advice about setting the http_debug flag and posting a section of your event log to show the details. If your problem is not the same and is not covered by my previous explanation, you should start a separate problem report about it.
Cheers,
Gary.
I wonder if the
)
I wonder if the 7.16 certificate issue is causing this. Supposedly it didn’t effect Einstein but who knows? It might be worthwhile upgrading to 7.16.7 which doesn’t have the expired certificate.
BOINC blog
MarkJ wrote:I wonder if the
)
Unfortunately, currently 7.16.7 is only available for Windows, whereas I use Linux CentOS 7.
I installed boinc using the official guidelines here and the current version is 7.16.6. The package manager shows only this version as available, so I cannot rollback to previous releases.
Hopefully this version will be available on Linux as well and fixes this issue.
Update. Apparently the
)
Update. Apparently the problem magically disappeared.
I've had so many queued uploads due to the backoff that E@H did not let me download any more work units (some of them even past the deadline).
A couple of days ago, all finished work units disappeared and uploaded successfully resulting in a +400000 (and something) points boost all of a sudden.
It's been a couple of days of non-stop crunching and no more backoffs: all finished tasks seamlessly update to the server.
Im on Kubuntu 20.04 with
)
Im on Kubuntu 20.04 with Boinc version 7.16.6 and starting Dec 8 at 16:35 CST all uploads have had the same issue. I have now 35 all pending with the same project backoff notice. I can not manually upload. Clicking on Retry now gives me the same error. I tried aborting upload not realizing that it deletes that result so I can not manually retry. I have ten that will finish today or tomorrow. Dont know if they will upload either.
Wade Smart wrote: Im on
)
And just like that I clicked 'retry now' and my over a dozen uploads all went up in a flash.