Tasks won't upload - Project keeps backing off

Peter Eichinger
Peter Eichinger
Joined: 14 Jun 05
Posts: 5
Credit: 31446775
RAC: 0

Ive stopped the project from

Ive stopped the project from downloading because I have a stack of projects it cant "report"   T results have uploaded but are now ready to report" but they don't do that.   (about 14 completed wu's on this PC):

 

3/04/2017 1:50:42 PM | Einstein@Home | [work_fetch] REC 2875.533 prio -0.115 can't request work: "no new tasks" requested via Manager (8651.28 sec)

3/04/2017 1:50:42 PM | | [work_fetch] --- state for CPU ---
3/04/2017 1:50:42 PM | | [work_fetch] shortfall 0.00 nidle 0.00 saturated 64315.21 busy 0.00
3/04/2017 1:50:42 PM | Einstein@Home | [work_fetch] share 0.000 blocked by project preferences
3/04/2017 1:50:42 PM | | [work_fetch] --- state for NVIDIA GPU ---
3/04/2017 1:50:42 PM | | [work_fetch] shortfall 44381.71 nidle 0.00 saturated 16098.29 busy 0.00
3/04/2017 1:50:42 PM | Einstein@Home | [work_fetch] share 0.000
3/04/2017 1:50:42 PM | | [work_fetch] ------- end work fetch state -------
3/04/2017 1:50:42 PM | | [suspend] net_susp: no; file_xfer_susp: no; reason: unknown reason
3/04/2017 1:50:43 PM | | [suspend] net_susp: no; file_xfer_susp: no; reason: unknown reason
3/04/2017 1:50:44 PM | | [suspend] net_susp: no; file_xfer_susp: no; reason: unknown reason
3/04/2017 1:50:45 PM | | [suspend] net_susp: no; file_xfer_susp: no; reason: unknown reason
3/04/2017 1:50:46 PM | | [suspend] net_susp: no; file_xfer_susp: no; reason: unknown reason
3/04/2017 1:50:47 PM | | [suspend] net_susp: no; file_xfer_susp: no; reason: unknown reason
3/04/2017 1:50:48 PM | | [suspend] net_susp: no; file_xfer_susp: no; reason: unknown reason
3/04/2017 1:50:49 PM | | [suspend] net_susp: no; file_xfer_susp: no; reason: unknown reason
3/04/2017 1:50:49 PM | | [time] dt 10.135296 susp_reason 0 gpu_susp_reason 0
3/04/2017 1:50:49 PM | | [time] w2 0.999988 on 0.994654; active 0.995855; gpu_active 0.995612; conn 0.999824, cpu_and_net_avail 0.995680
3/04/2017 1:50:50 PM | | NOTICES::write: seqno 6, refresh false, 6 notices
3/04/2017 1:50:50 PM | | [suspend] net_susp: no; file_xfer_susp: no; reason: unknown reason
3/04/2017 1:50:51 PM | | [suspend] net_susp: no; file_xfer_susp: no; reason: unknown reason
3/04/2017 1:50:52 PM | | [suspend] net_susp: no; file_xfer_susp: no; reason: unknown reason
3/04/2017 1:50:53 PM | | [suspend] net_susp: no; file_xfer_susp: no; reason: unknown reason
3/04/2017 1:50:54 PM | | [suspend] net_susp: no; file_xfer_susp: no; reason: unknown reason
3/04/2017 1:50:55 PM | | [suspend] net_susp: no; file_xfer_susp: no; reason: unknown reason
3/04/2017 1:50:56 PM | | [suspend] net_susp: no; file_xfer_susp: no; reason: unknown reason
3/04/2017 1:50:57 PM | | Re-reading cc_config.xml
3/04/2017 1:50:57 PM | | log flags: file_xfer, sched_ops, task, http_debug, http_xfer_debug, notice_debug
3/04/2017 1:51:38 PM | | Project communication failed: attempting access to reference site
3/04/2017 1:51:38 PM | | [http] HTTP_OP::init_get(): http://www.google.com/
3/04/2017 1:51:38 PM | | [http] HTTP_OP::libcurl_exec(): ca-bundle set
3/04/2017 1:51:39 PM | | [http] [ID#0] Info: Found bundle for host www.google.com: 0x43e9880 [can pipeline]
3/04/2017 1:51:39 PM | | [http] [ID#0] Info: Re-using existing connection! (#924) with host www.google.com
3/04/2017 1:51:39 PM | | [http] [ID#0] Info: Connected to www.google.com (172.217.23.36) port 80 (#924)
3/04/2017 1:51:39 PM | | [http] [ID#0] Sent header to server: GET / HTTP/1.1
3/04/2017 1:51:39 PM | | [http] [ID#0] Sent header to server: Host: www.google.com
3/04/2017 1:51:39 PM | | [http] [ID#0] Sent header to server: User-Agent: BOINC client (windows_x86_64 7.6.33)
3/04/2017 1:51:39 PM | | [http] [ID#0] Sent header to server: Accept: */*
3/04/2017 1:51:39 PM | | [http] [ID#0] Sent header to server: Accept-Encoding: deflate, gzip
3/04/2017 1:51:39 PM | | [http] [ID#0] Sent header to server: Content-Type: application/x-www-form-urlencoded
3/04/2017 1:51:39 PM | | [http] [ID#0] Sent header to server: Accept-Language: en_AU
3/04/2017 1:51:39 PM | | [http] [ID#0] Sent header to server:
3/04/2017 1:51:39 PM | | [http] [ID#0] Sent header to server:
3/04/2017 1:51:40 PM | | [http] [ID#0] Received header from server: HTTP/1.1 302 Found
3/04/2017 1:51:40 PM | | [http] [ID#0] Received header from server: Cache-Control: private
3/04/2017 1:51:40 PM | | [http] [ID#0] Received header from server: Content-Type: text/html; charset=UTF-8
3/04/2017 1:51:40 PM | | [http] [ID#0] Received header from server: Location: http://www.google.com.au/?gfe_rd=cr&ei=zM3hWKm4N-Tv8Afm46hY
3/04/2017 1:51:40 PM | | [http] [ID#0] Received header from server: Content-Length: 260
3/04/2017 1:51:40 PM | | [http] [ID#0] Received header from server: Date: Mon, 03 Apr 2017 04:21:32 GMT
3/04/2017 1:51:40 PM | | [http] [ID#0] Received header from server:
3/04/2017 1:51:40 PM | | [http] [ID#0] Received header from server: <HTML><HEAD><meta http-equiv="content-type" content="text/html;charset=utf-8">
3/04/2017 1:51:40 PM | | [http] [ID#0] Received header from server: <TITLE>302 Moved</TITLE></HEAD><BODY>
3/04/2017 1:51:40 PM | | [http] [ID#0] Received header from server: <H1>302 Moved</H1>
3/04/2017 1:51:40 PM | | [http] [ID#0] Received header from server: The document has moved
3/04/2017 1:51:40 PM | | [http] [ID#0] Received header from server: <A HREF="http://www.google.com.au/?gfe_rd=cr&amp;ei=zM3hWKm4N-Tv8Afm46hY">here</A>.
3/04/2017 1:51:40 PM | | [http] [ID#0] Received header from server: </BODY></HTML>
3/04/2017 1:51:40 PM | | [http] [ID#0] Info: Ignoring the response-body
3/04/2017 1:51:40 PM | | [http] [ID#0] Info: Connection #924 to host www.google.com left intact
3/04/2017 1:51:40 PM | | [http] [ID#0] Info: Issue another request to this URL: 'http://www.google.com.au/?gfe_rd=cr&ei=zM3hWKm4N-Tv8Afm46hY'
3/04/2017 1:51:40 PM | | [http] [ID#0] Info: Found bundle for host www.google.com.au: 0x43e9a20 [can pipeline]
3/04/2017 1:51:40 PM | | [http] [ID#0] Info: Re-using existing connection! (#925) with host www.google.com.au
3/04/2017 1:51:40 PM | | [http] [ID#0] Info: Connected to www.google.com.au (216.58.204.35) port 80 (#925)
3/04/2017 1:51:40 PM | | [http] [ID#0] Sent header to server: GET /?gfe_rd=cr&ei=zM3hWKm4N-Tv8Afm46hY HTTP/1.1
3/04/2017 1:51:40 PM | | [http] [ID#0] Sent header to server: Host: www.google.com.au
3/04/2017 1:51:40 PM | | [http] [ID#0] Sent header to server: User-Agent: BOINC client (windows_x86_64 7.6.33)
3/04/2017 1:51:40 PM | | [http] [ID#0] Sent header to server: Accept: */*
3/04/2017 1:51:40 PM | | [http] [ID#0] Sent header to server: Accept-Encoding: deflate, gzip
3/04/2017 1:51:40 PM | | [http] [ID#0] Sent header to server: Referer: http://www.google.com/
3/04/2017 1:51:40 PM | | [http] [ID#0] Sent header to server: Content-Type: application/x-www-form-urlencoded
3/04/2017 1:51:40 PM | | [http] [ID#0] Sent header to server: Accept-Language: en_AU
3/04/2017 1:51:40 PM | | [http] [ID#0] Sent header to server:
3/04/2017 1:51:40 PM | | [http] [ID#0] Received header from server: HTTP/1.1 200 OK
3/04/2017 1:51:40 PM | | [http] [ID#0] Received header from server: Date: Mon, 03 Apr 2017 04:21:33 GMT
3/04/2017 1:51:40 PM | | [http] [ID#0] Received header from server: Expires: -1
3/04/2017 1:51:40 PM | | [http] [ID#0] Received header from server: Cache-Control: private, max-age=0
3/04/2017 1:51:40 PM | | [http] [ID#0] Received header from server: Content-Type: text/html; charset=ISO-8859-1
3/04/2017 1:51:40 PM | | [http] [ID#0] Received header from server: P3P: CP="This is not a P3P policy! See https://www.google.com/support/accounts/answer/151657?hl=en for more info."
3/04/2017 1:51:40 PM | | [http] [ID#0] Received header from server: Content-Encoding: gzip
3/04/2017 1:51:40 PM | | [http] [ID#0] Received header from server: Server: gws
3/04/2017 1:51:40 PM | | [http] [ID#0] Received header from server: Content-Length: 4489
3/04/2017 1:51:40 PM | | [http] [ID#0] Received header from server: X-XSS-Protection: 1; mode=block
3/04/2017 1:51:40 PM | | [http] [ID#0] Received header from server: X-Frame-Options: SAMEORIGIN
3/04/2017 1:51:40 PM | | [http] [ID#0] Received header from server: Set-Cookie: NID=100=LqYZiZ5HgqpCLhboL70GTAq3C9cwexUCZnZWu9NpsBE4RSP2x2K6svJGEd7ceBqMotD5JoMBTxoAOhMasqjsoH9hmzulOUn4IK7jBWAlXby5Zlp7E5qJaAnc1UTTKFga; expires=Tue, 03-Oct-2017 04:21:33 GMT; path=/; domain=.google.com.au; HttpOnly
3/04/2017 1:51:40 PM | | [http] [ID#0] Received header from server:
3/04/2017 1:51:40 PM | |
3/04/2017 1:51:40 PM | | [http_xfer] [ID#0] HTTP: wrote 1381 bytes
3/04/2017 1:51:40 PM | | [http_xfer] [ID#0] HTTP: wrote 3390 bytes
3/04/2017 1:51:40 PM | | [http_xfer] [ID#0] HTTP: wrote 3829 bytes
3/04/2017 1:51:40 PM | | [http_xfer] [ID#0] HTTP: wrote 1883 bytes
3/04/2017 1:51:40 PM | | [http] [ID#0] Info: Connection #925 to host www.google.com.au left intact
3/04/2017 1:51:40 PM | | Internet access OK - project servers may be temporarily down.
3/04/2017 1:51:50 PM | | NOTICES::write: seqno 6, refresh false, 6 notices
3/04/2017 1:52:50 PM | | NOTICES::write: seqno 6, refresh false, 6 notices

 

Next step drop the project!

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5842
Credit: 109394290058
RAC: 35822592

Peter Eichinger wrote:Ive

Peter Eichinger wrote:
Ive stopped the project from downloading because I have a stack of projects it cant "report"   T results have uploaded but are now ready to report" but they don't do that.   (about 14 completed wu's on this PC):

Hi Peter,
I'm also in Australia and none of my hosts seem to be having any problem reporting completed tasks and receiving new work.  There doesn't appear to be any problem with the servers.  I've just found a host with a couple of unreported tasks and issued an 'update' and the tasks reported immediately without issue.

That leaves the most likely cause of your problem to be something to do with the machine itself or one of the other machines on the path between the two end points.  That is something for you to look into.  According to what you posted, your machine can talk to google.com.au but I don't imagine that means the link to Einstein is necessarily fully OK.  Unfortunately, what to do next is a bit beyond my pay grade :-).

I had a look at the last contact logs for both of your machines.  The entries for each seem OK and show reporting of tasks and, in one case, the receipt of new work.  Both logs have time stamps before those of the two event log snips you posted.  This means that the Einstein servers are not receiving your most recent attempts to communicate, which seems to be further evidence for a fault somewhere along the path.

 

Peter Eichinger wrote:
Next step drop the project!

I don't understand why you would think this is the best course of action.  I checked the deadlines on all 'in progress' tasks on both machines.  The earliest deadline on either is around April 12 so there is no particular urgency if you have unreported work.  I would think that any problem between you and Einstein should get fixed fairly promptly.  When the admin staff start work on Monday, someone will probably respond if they have any idea about what might be causing your problem.  Best to wait a bit longer.

In your first message, the event log showed a project reset.  That's a pretty drastic action, usually a 'last resort' sort of thing and mainly for a problem where things are scrambled on your host.  It tells your BOINC client to trash everything at your end and get a fresh copy of everything from the Einstein servers.  It can't rectify a server issue or any problem with the link between your machine and Einstein.  I would imagine you would have lost any completed but unreported tasks you had at that stage.  I don't know for sure since having been bitten by it many years ago, I avoid it like the plague.  It might be a bit less drastic these days.  Did you lose all unreported tasks when you did it?

 

Cheers,
Gary.

Holmis
Joined: 4 Jan 05
Posts: 1118
Credit: 1055935564
RAC: 0

Peter Eichinger wrote:Ive

Peter Eichinger wrote:
Ive stopped the project from downloading because I have a stack of projects it cant "report"   T results have uploaded but are now ready to report" but they don't do that.   (about 14 completed wu's on this PC):

It's OK if Einstein is set to No New Tasks, just issues an update command and post the part of the log that has [http] in it. And it really needs to be a connection attempt with Einstein, the one you posted only shows the attempt to connect to a reference site after the connection with Einstein failed.

Peter Eichinger wrote:
Next step drop the project!

As Gary said that seems a bit drastic, please let us try to help you first.

Christian Beer
Christian Beer
Joined: 9 Feb 05
Posts: 595
Credit: 118587687
RAC: 110897

I tried to find the request

I tried to find the request in our logs that lead to the Error 408 but there is nothing logged. It is most likely an HTTP error code which in this case means the request from your computer to our servers was incomplete and took too long. I also would like to see a http_debug output with an Einstein@Home connection instead of a google connection. Did you install a new firewall or AntiVirus product? They sometimes also police outgoing traffic.

Peter Eichinger
Peter Eichinger
Joined: 14 Jun 05
Posts: 5
Credit: 31446775
RAC: 0

Hi, I had the event log

Hi, I had the event log option for http_debug "ticked" so what is listed is all that appeared.  I haven't installed a new antivirus or a new firewall.   I have the Comodo firewall on one PC and have specifically included all BOINC tasks as allowed.  There haven't been any update to the firewall, nor have there been updates that would account for it.  When the BOINC manager communicates to Einstrein it takes "forever" whereas most tasks seem to respond quickly with either new workunits or a message to say "communication deferred for ... mins".   I even made sure I had nothing else running in the background.    I closed Comodo and Avira, Adobe updater, etc.  None had any effect.

Peter Eichinger
Peter Eichinger
Joined: 14 Jun 05
Posts: 5
Credit: 31446775
RAC: 0

Hi Gary, Yes, well you are a

Hi Gary,

Yes, well you are a prolific generator of results.Smile  Im still only up to 800million credits (total).  On that machine I lost all the results  although they were uploaded and listed as ready to report - but they never did. I actually completed existing tasks and deleted BOINC manager and then reinstalled it.   Everything else seems to be OK.  

Cheers  Pete

Christian Beer
Christian Beer
Joined: 9 Feb 05
Posts: 595
Credit: 118587687
RAC: 110897

That is strange. There should

That is strange. There should be at least one http attempt to https://scheduler.einsteinathome.org/EinsteinAtHome_cgi/cgi logged before "Project communication failed: attempting access to reference site". Can you reach the URL via a browser. It should be very fast and show you this text:

<scheduler_reply>
<scheduler_version>611</scheduler_version>
<master_url>http://einstein.phys.uwm.edu/</master_url>
<request_delay>60.000000</request_delay>
<message priority="low">Error in request message: no start tag </message>
<project_name>Einstein@Home</project_name>
</scheduler_reply>
Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 2139
Credit: 2752781530
RAC: 1443644

You seem to be having a

You seem to be having a problem with 'reporting' tasks, rather than 'uploading' the results - there's a two-stage process when a task finishes, with data being uploaded almost immediately, and only once that has completed does the status become 'ready to report'.

I had a problem similar to yours a couple of weeks ago, except my problem was with reaching the upload server, rather than the scheduler: apart from that, the symptoms were very similar.

I agree with Gary: you need to find a section of the message log where <http_debug> is active and an attempt to contact the Einstein servers takes place. Even if project communication is backed off, you can force a immediate attempt by using the 'update' button on the Projects tab in BOINC Manager.

I discussed my problem privately with Christian Beer, and we came to the conclusion that the University firewall might have been blocking my IP address. Certainly, restarting my router (so that my ISP issued me with a different IP address) solved the problem for a while, and allowed all completed work to be uploaded - although the problem recurred a few more times.

I eventually discovered that another machine on my network - not one currently crunching for Einstein, ironically - had picked up some malware which was generating a lot of random outbound traffic. My anti-virus had missed it, and although Malwarebytes blocked the traffic, that wouldn't remove it either. It's gone now, and I haven't been blocked by Einstein since. My IP address was perhaps put on a firewall blacklist because of the malware.

I'm not saying that your problem is exactly the same - you need BOINC's log, and possibly some network test results like tracert as well - but it's an example of the kind of thing which can go wrong.

Peter Eichinger
Peter Eichinger
Joined: 14 Jun 05
Posts: 5
Credit: 31446775
RAC: 0

Christian Beer schrieb:That

Christian Beer wrote:

That is strange. There should be at least one http attempt to https://scheduler.einsteinathome.org/EinsteinAtHome_cgi/cgi logged before "Project communication failed: attempting access to reference site". Can you reach the URL via a browser. It should be very fast and show you this text:

<scheduler_reply>
<scheduler_version>611</scheduler_version>
<master_url>http://einstein.phys.uwm.edu/</master_url>
<request_delay>60.000000</request_delay>
<message priority="low">Error in request message: no start tag </message>
<project_name>Einstein@Home</project_name>
</scheduler_reply>

Without changing anything (!!??!!) communications with the Server seem to be good again. All results have now been reported and communications seem to be fast again.     I have no idea what has happened but I can happily go about crunching again...

Darren Peets
Darren Peets
Joined: 19 Nov 09
Posts: 37
Credit: 97984355
RAC: 30679

Just thought I'd leave a note

Just thought I'd leave a note to say that downloading the suggested certificates worked for me when I noticed this problem today ("Scheduler request failed: Error 408").  So the thread is still useful.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.