WU not being received by my computer

Bruce Allen
Bruce Allen
Moderator
Joined: 15 Oct 04
Posts: 1119
Credit: 172127663
RAC: 0

Here is the relevant part of

Message 11416 in response to message 11415

Here is the relevant part of the scheduler log. The basic problem seems to be that on your end the core client timed out after 36 seconds, whereas on the server, the scheduler continued to run for 61 seconds. I am making some inquiries with the BOINC developers to try and understand if the core client has a time-out which might have caused this broken connection and hence the lost WU.

Excellent. Thanks for the log snippet.
I suspect that there isn't a timeout in the client, or if there is, it's not as short as 36secs. There is a timeout in my proxy, which is around the 30sec mark. I'll need to up this setting, but the question is "how high?". I'll try 180secs for starters.

AHA!! THIS IS THE PROBLEM. ANYONE USING A PROXY SHOULD SET A TIMEOUT OF AT LEAST 100*N SECONDS WHERE N IS THE NUMBER OF CPUS ON THEIR HOST.

(Sorry for shouting, but I am excited).

Examples:
For a one-cpu host, a proxy timeout of 100 sec.
For a two-cpu host, a proxy timeout of 200 sec.

These times are computed as follows. In making a WU for a given host, there is a delay of up to 10 secs per WU. Hence with up to 8 WU per CPU, and adding another 20 secs for insurance, we end up with the numbers above.

As a workaround on the server side, I have just modified the transitioner to shorten it's polling time from 5 secs to 1 sec. This allows me to reduce the time it takes to make a WU from 10 secs to 2 secs. In the long term, it would be better to move back to a longer time interval (reduces number of database queries per unit time) but this should help in the short term. It should reduce the typical time to make 16 WU (2 CPU box) to a bit more than 32 seconds.

PS: you might want to use ntp or some other method to set the correct time on your machine. It's off by a few minutes.

You're 100% right, sorry about that... I'll have to look into it. :)

Well, it looks like my particular problem was easy enough to find, but the implications provide some food for thought.
It seems that a slow scheduler response, combined with a proxy time-out can cause quite a bit of grief for a project. How do we make BOINC more robust in this department? Perhaps the client needs to "acknowledge" a scheduler reply?

Thanks very much for the feedback. It's much appreciated.

Thank YOU for the very detailed bug report and for reporting that PROXY TIMEOUT is a problem. Could you update this thread with some info about how to modify the timeout for your proxy server?

Cheers,
Bruce

Director, Einstein@Home

Mark Reiss
Mark Reiss
Joined: 22 Jan 05
Posts: 16
Credit: 1996687
RAC: 0

Hi all: But what do I

Message 11417 in response to message 11416

Hi all:

But what do I do to solve the problem if there is no proxy or firewall involved? Thankx in advance.

Mark Reiss - Tue. 11:35 EST

Bruce Allen
Bruce Allen
Moderator
Joined: 15 Oct 04
Posts: 1119
Credit: 172127663
RAC: 0

Hi all: But what do I do to

Message 11418 in response to message 11417

Hi all:
But what do I do to solve the problem if there is no proxy or firewall involved? Thankx in advance.

Mark Reiss - Tue. 11:35 EST

Mark, I'm confused. I thought from your prevoius posting on 9 May that your problem was fixed by upgrading the BOINC version. If you are still having problems, please (1) set the system time on your computer accurately using NTP or some other clock accurate to a second or so then (2) cut out a healthy stretch of log from when you shoud have gotten work and didn't and post it here on the forum. Please identify your machine's HOSTID and the BOINC software version you are using.

Cheers,
Bruce

Director, Einstein@Home

ralic
ralic
Joined: 8 Nov 04
Posts: 128
Credit: 695810
RAC: 0

AHA!! THIS IS THE PROBLEM.

Message 11419 in response to message 11416

AHA!! THIS IS THE PROBLEM. ANYONE USING A PROXY SHOULD SET A TIMEOUT OF AT LEAST 100*N SECONDS WHERE N IS THE NUMBER OF CPUS ON THEIR HOST.

(Sorry for shouting, but I am excited).
me too. :) It highlights one area where a device between the client and the scheduler can cause a bit of havoc.
Thank YOU for the very detailed bug report and for reporting that PROXY TIMEOUT is a problem. Could you update this thread with some info about how to modify the timeout for your proxy server?
Gladly,
I'm using FreeProxy and the timeout value is set by configuring the port that is assigned to Protocol "HTTP Proxy".
There are two timeouts "Read Timeout (seconds)" and "Connect Timeout (seconds)". I have set both to 200 seconds, as none of my hosts have more than 2 CPU's.
I have also reduced my "connect every" down to 0.5 days. It was previously set to 3 days and worked fine while there was existing work on the host. Now that the work has been exhausted, a 3 day cache setting would cause the scheduler to assign a full compliment of 8wu's per CPU, which slows the scheduler response down and could cause a timeout. (This is as a result of your explanation regarding how the scheduler works.)

I have to admit that I didn't know that the scheduler generates the work on the fly, and that the more work requested, the longer the scheduler response would be.

[edit]
I still think that the client could do with some improvement though. Specifically, I want to have a look at how the scheduler reply is parsed. As I understand it, it's an HTTP response. If the response says "Timeout", the client should display a message accordingly, instead of just "No schedulers responded", thereby giving the user some idea of what is actually going on. Similarly, if the scheduler http server responds with a 404 or 500 error, the user should be informed appropriately. I may be shooting for the sky here, but if it looks feasible, I may post some suggestions to the dev list.
[/edit]

Bruce Allen
Bruce Allen
Moderator
Joined: 15 Oct 04
Posts: 1119
Credit: 172127663
RAC: 0

One more point to add to this

Message 11420 in response to message 11419

One more point to add to this thread for the benefit of other readers. The BOINC 4.19 core client has some known proxy-server related problems. If you use 4.19 and have a proxy server or other gateway/firewall and are experiencing problems, please consider trying a more recent BOINC core client.

Cheers,
Bruce

Director, Einstein@Home

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.