Looking at the tasks, it appears they may have been sent in distinct time periods - the first period is roughly 20:30 31-May-2013 to 02:00 01-Jun-2013, and the second is from 00:30 to 01:30 02-Jun-2013
Not all hosts are affected in each period, e.g. Host 6119246 only appears to have overfetched in the first period.
As listed above, 3 hosts are 7.0.62 and one is 7.0.65, and are using 2 different venues.
Of possible relevance is that a few days ago (date not recorded) I updated my "Computing preferences", increasing my cache from 0.25 to 1 day (I did this because another of my hosts #6119565 keeps polling for work every 60 seconds, for no good reason that I can see).
...also Host 6123309: 81 pending PAS (~55 days work), all sent 18:00 to 18:45 on 1-Jun-2013.
I checked this host because I noticed that all the other hosts that have over-fetched have 2 GPU's (so does this one, albeit only 2x9600GTs which may explain why the overfetch isn't so severe).
Whether this is a significant observation or not, it seems that out of 10 of my active GPU hosts, the 5 that have 2 GPU's have over-fetched and the 5 that have a single GPU haven't. Additionally, it appears the hosts with single GPUs are polling for new work once a minute, but not getting any:
I did wonder if it was a local-only check; still, the multi-GPU hosts aren't doing it.
Also I note that the same pattern of over-fetching fits tbret's hosts - the single GPU hosts have reasonable work, but most multi-GPU hosts have too many PAS tasks.
Poking about the [url=]http://einstein.phys.uwm.edu/top_hosts.php]top hosts[/url] shows other multi-GPU hosts that appear to have over-fetched - e.g. this host has 179 'in progress' (~40 days work with a deadline of 12th June - even with 2 GPU's it surely won't make it).
Only mentioning it in case it relates to the main issue of work over-fetch; it may well be a sidetrack but it appears to only affect single-GPU hosts.
I don't think this is related to the over-fetch issue, but still noteworthy as it strains our servers without an obvious reason. as the log indicates, the client really doesn't request new work at all.
I was worried it was just my hosts, but it's not - for example this host is doing the same according to its log. Same BOINC version, different OS (windows vs. linux).
Also I have found that one of my single-GPU hosts isn't doing it - the only difference I'm aware of is that it's on 7.0.44 (all the others are 7.0.64 or .65).
I was worried it was just my hosts, but it's not - for example this host is doing the same according to its log. Same BOINC version, different OS (windows vs. linux).
Also I have found that one of my single-GPU hosts isn't doing it - the only difference I'm aware of is that it's on 7.0.44 (all the others are 7.0.64 or .65).
I had it happen on 2 machines too. The solution here was to switch to BOINC 7.1.3. There are many work fetch fixes between 7.0.64 and 7.1.3:
I'm guessing this will be the
)
I'm guessing this will be the last one of these I post.
5/28/2013 1:35:29 AM | Einstein@Home | [work_fetch] request: CPU (0.00 sec, 0.00 inst) NVIDIA (0.00 sec, 0.00 inst)
5/28/2013 1:35:29 AM | Einstein@Home | Sending scheduler request: To report completed tasks.
5/28/2013 1:35:29 AM | Einstein@Home | Reporting 636 completed tasks
5/28/2013 1:35:29 AM | Einstein@Home | Not requesting tasks: don't need
5/28/2013 1:35:40 AM | Einstein@Home | Scheduler request completed
5/28/2013 1:35:40 AM | | [work_fetch] Request work fetch: RPC complete
5/28/2013 1:35:45 AM | | [work_fetch] work fetch start
5/28/2013 1:35:45 AM | | [work_fetch] choose_project() for NVIDIA: buffer_low: no; sim_excluded_instances 0
5/28/2013 1:35:45 AM | | [work_fetch] choose_project() for CPU: buffer_low: yes; sim_excluded_instances 0
5/28/2013 1:35:45 AM | | [work_fetch] no eligible project for CPU
5/28/2013 1:35:45 AM | | [work_fetch] ------- start work fetch state -------
5/28/2013 1:35:45 AM | | [work_fetch] target work buffer: 43200.00 + 864.00 sec
5/28/2013 1:35:45 AM | | [work_fetch] --- project states ---
5/28/2013 1:35:45 AM | Einstein@Home | [work_fetch] REC 84786.533 prio -0.359908 can't req work: scheduler RPC backoff (backoff: 54.89 sec)
5/28/2013 1:35:45 AM | | [work_fetch] --- state for CPU ---
5/28/2013 1:35:45 AM | | [work_fetch] shortfall 132192.00 nidle 3.00 saturated 0.00 busy 0.00
5/28/2013 1:35:45 AM | Einstein@Home | [work_fetch] fetch share 0.000 (blocked by prefs)
5/28/2013 1:35:45 AM | | [work_fetch] --- state for NVIDIA ---
5/28/2013 1:35:45 AM | | [work_fetch] shortfall 0.00 nidle 0.00 saturated 74306.23 busy 0.00
5/28/2013 1:35:45 AM | Einstein@Home | [work_fetch] fetch share 0.000
5/28/2013 1:35:45 AM | | [work_fetch] ------- end work fetch state -------
5/28/2013 1:35:45 AM | | [work_fetch] No project chosen for work fetch
5/28/2013 1:35:45 AM | Einstein@Home | update requested by user
5/28/2013 1:35:45 AM | | [work_fetch] Request work fetch: project updated by user
5/28/2013 1:35:46 AM | | [work_fetch] Request work fetch: Backoff ended for Einstein@Home
5/28/2013 1:35:50 AM | | [work_fetch] work fetch start
5/28/2013 1:35:50 AM | | [work_fetch] choose_project() for NVIDIA: buffer_low: no; sim_excluded_instances 0
5/28/2013 1:35:50 AM | | [work_fetch] choose_project() for CPU: buffer_low: yes; sim_excluded_instances 0
5/28/2013 1:35:50 AM | | [work_fetch] no eligible project for CPU
5/28/2013 1:35:50 AM | | [work_fetch] ------- start work fetch state -------
5/28/2013 1:35:50 AM | | [work_fetch] target work buffer: 43200.00 + 864.00 sec
5/28/2013 1:35:50 AM | | [work_fetch] --- project states ---
5/28/2013 1:35:50 AM | Einstein@Home | [work_fetch] REC 84786.533 prio -1.359885 can req work
5/28/2013 1:35:50 AM | | [work_fetch] --- state for CPU ---
5/28/2013 1:35:50 AM | | [work_fetch] shortfall 132192.00 nidle 3.00 saturated 0.00 busy 0.00
5/28/2013 1:35:50 AM | Einstein@Home | [work_fetch] fetch share 0.000 (blocked by prefs)
5/28/2013 1:35:50 AM | | [work_fetch] --- state for NVIDIA ---
5/28/2013 1:35:50 AM | | [work_fetch] shortfall 0.00 nidle 0.00 saturated 74301.24 busy 0.00
5/28/2013 1:35:50 AM | Einstein@Home | [work_fetch] fetch share 1.000
5/28/2013 1:35:50 AM | | [work_fetch] ------- end work fetch state -------
5/28/2013 1:35:50 AM | | [work_fetch] No project chosen for work fetch
5/28/2013 1:35:50 AM | Einstein@Home | [work_fetch] request: CPU (0.00 sec, 0.00 inst) NVIDIA (0.00 sec, 0.00 inst)
5/28/2013 1:35:50 AM | Einstein@Home | Sending scheduler request: Requested by user.
5/28/2013 1:35:50 AM | Einstein@Home | Not requesting tasks: don't need
5/28/2013 1:35:53 AM | Einstein@Home | Scheduler request completed
5/28/2013 1:35:53 AM | | [work_fetch] Request work fetch: RPC complete
5/28/2013 1:35:58 AM | | [work_fetch] work fetch start
5/28/2013 1:35:58 AM | | [work_fetch] choose_project() for NVIDIA: buffer_low: no; sim_excluded_instances 0
5/28/2013 1:35:58 AM | | [work_fetch] choose_project() for CPU: buffer_low: yes; sim_excluded_instances 0
5/28/2013 1:35:58 AM | | [work_fetch] no eligible project for CPU
5/28/2013 1:35:58 AM | | [work_fetch] ------- start work fetch state -------
5/28/2013 1:35:58 AM | | [work_fetch] target work buffer: 43200.00 + 864.00 sec
5/28/2013 1:35:58 AM | | [work_fetch] --- project states ---
5/28/2013 1:35:58 AM | Einstein@Home | [work_fetch] REC 84786.533 prio -0.359848 can't req work: scheduler RPC backoff (backoff: 54.90 sec)
5/28/2013 1:35:58 AM | | [work_fetch] --- state for CPU ---
5/28/2013 1:35:58 AM | | [work_fetch] shortfall 132192.00 nidle 3.00 saturated 0.00 busy 0.00
5/28/2013 1:35:58 AM | Einstein@Home | [work_fetch] fetch share 0.000 (blocked by prefs)
5/28/2013 1:35:58 AM | | [work_fetch] --- state for NVIDIA ---
5/28/2013 1:35:58 AM | | [work_fetch] shortfall 0.00 nidle 0.00 saturated 74293.25 busy 0.00
5/28/2013 1:35:58 AM | Einstein@Home | [work_fetch] fetch share 0.000
5/28/2013 1:35:58 AM | | [work_fetch] ------- end work fetch state -------
5/28/2013 1:35:58 AM | | [work_fetch] No project chosen for work fetch
5/28/2013 1:36:53 AM | | [work_fetch] Request work fetch: Backoff ended for Einstein@Home
5/28/2013 1:36:53 AM | | [work_fetch] work fetch start
5/28/2013 1:36:53 AM | | [work_fetch] choose_project() for NVIDIA: buffer_low: no; sim_excluded_instances 0
5/28/2013 1:36:53 AM | | [work_fetch] choose_project() for CPU: buffer_low: yes; sim_excluded_instances 0
5/28/2013 1:36:53 AM | | [work_fetch] no eligible project for CPU
5/28/2013 1:36:53 AM | | [work_fetch] ------- start work fetch state -------
5/28/2013 1:36:53 AM | | [work_fetch] target work buffer: 43200.00 + 864.00 sec
5/28/2013 1:36:53 AM | | [work_fetch] --- project states ---
5/28/2013 1:36:53 AM | Einstein@Home | [work_fetch] REC 84792.289 prio -1.359594 can req work
5/28/2013 1:36:53 AM | | [work_fetch] --- state for CPU ---
5/28/2013 1:36:53 AM | | [work_fetch] shortfall 132192.00 nidle 3.00 saturated 0.00 busy 0.00
5/28/2013 1:36:53 AM | Einstein@Home | [work_fetch] fetch share 0.000 (blocked by prefs)
5/28/2013 1:36:53 AM | | [work_fetch] --- state for NVIDIA ---
5/28/2013 1:36:53 AM | | [work_fetch] shortfall 0.00 nidle 0.00 saturated 74238.49 busy 0.00
5/28/2013 1:36:53 AM | Einstein@Home | [work_fetch] fetch share 1.000
5/28/2013 1:36:53 AM | | [work_fetch] ------- end work fetch state -------
5/28/2013 1:36:53 AM | | [work_fetch] No project chosen for work fetch
Thanks for that. I'll tell
)
Thanks for that. I'll tell David he has some more reading material here.
Similar situation here - 4
)
Similar situation here - 4 hosts have overfetched work.
Host 6618250, 479 PAS in progress, 7.0.62
Host 6290763, 361 PAS in progress, 7.0.62
Host 6119246, 104 PAS in progress, 7.0.65
Host 6564477, 412 PAS in progress, 7.0.62
Looking at the tasks, it appears they may have been sent in distinct time periods - the first period is roughly 20:30 31-May-2013 to 02:00 01-Jun-2013, and the second is from 00:30 to 01:30 02-Jun-2013
Not all hosts are affected in each period, e.g. Host 6119246 only appears to have overfetched in the first period.
As listed above, 3 hosts are 7.0.62 and one is 7.0.65, and are using 2 different venues.
Of possible relevance is that a few days ago (date not recorded) I updated my "Computing preferences", increasing my cache from 0.25 to 1 day (I did this because another of my hosts #6119565 keeps polling for work every 60 seconds, for no good reason that I can see).
...also Host 6123309: 81
)
...also Host 6123309: 81 pending PAS (~55 days work), all sent 18:00 to 18:45 on 1-Jun-2013.
I checked this host because I noticed that all the other hosts that have over-fetched have 2 GPU's (so does this one, albeit only 2x9600GTs which may explain why the overfetch isn't so severe).
Whether this is a significant observation or not, it seems that out of 10 of my active GPU hosts, the 5 that have 2 GPU's have over-fetched and the 5 that have a single GPU haven't. Additionally, it appears the hosts with single GPUs are polling for new work once a minute, but not getting any:
RE: Additionally, it
)
I'm not even sure they really request work, see the text in above quote that was highlighted by me. The text is at least misleading.
Cheers
HB
I did wonder if it was a
)
I did wonder if it was a local-only check; still, the multi-GPU hosts aren't doing it.
Also I note that the same pattern of over-fetching fits tbret's hosts - the single GPU hosts have reasonable work, but most multi-GPU hosts have too many PAS tasks.
Poking about the [url=]http://einstein.phys.uwm.edu/top_hosts.php]top hosts[/url] shows other multi-GPU hosts that appear to have over-fetched - e.g. this host has 179 'in progress' (~40 days work with a deadline of 12th June - even with 2 GPU's it surely won't make it).
RE: RE: Additionally, it
)
I checked with 'tcpdump' - the single-GPU hosts really are hitting the server once a minute; this shows up on the "Last contact" link as well.
Only mentioning it in case it relates to the main issue of work over-fetch; it may well be a sidetrack but it appears to only affect single-GPU hosts.
RE: RE: RE:
)
I don't think this is related to the over-fetch issue, but still noteworthy as it strains our servers without an obvious reason. as the log indicates, the client really doesn't request new work at all.
Cheers
HB
I was worried it was just my
)
I was worried it was just my hosts, but it's not - for example this host is doing the same according to its log. Same BOINC version, different OS (windows vs. linux).
Also I have found that one of my single-GPU hosts isn't doing it - the only difference I'm aware of is that it's on 7.0.44 (all the others are 7.0.64 or .65).
RE: I was worried it was
)
I had it happen on 2 machines too. The solution here was to switch to BOINC 7.1.3. There are many work fetch fixes between 7.0.64 and 7.1.3:
http://boinc.berkeley.edu/dev/forum_thread.php?id=8378#49151
Time will tell.