BIG TIME OVERFETCH ! ! ! Perceus.

Neil Newell
Neil Newell
Joined: 20 Nov 12
Posts: 176
Credit: 169,699,457
RAC: 0

Thanks for the advice; I've

Thanks for the advice; I've aborted a lot of the overfetched tasks as suggested.

Must be something really odd with the scheduler, because I didn't abort all the work (I left more than can be processed before the deadline) but still got sent *more tasks* to process!

...
08-Jun-2013 10:56:37 [Einstein@Home] task PA0057_015D1_228_0 aborted by user
08-Jun-2013 10:56:43 [Einstein@Home] update requested by user
08-Jun-2013 10:56:49 [Einstein@Home] Sending scheduler request: Requested by user.
08-Jun-2013 10:56:49 [Einstein@Home] Reporting 265 completed tasks <-- Aborted 265 BRP5 tasks, leaving ~100 remaining
08-Jun-2013 10:56:49 [Einstein@Home] Requesting new tasks for NVIDIA
08-Jun-2013 10:57:02 [Einstein@Home] Scheduler request completed: got 7 new tasks
...

This happened on at least 2 of the hosts where I aborted part of the BRP5 overfetched tasks. The "Last Contact" server logs for both hosts have been saved, in case they're any use in tracking this down.

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 2,143
Credit: 2,945,427,524
RAC: 692,947

RE: Thanks for the advice;

Quote:

Thanks for the advice; I've aborted a lot of the overfetched tasks as suggested.

Must be something really odd with the scheduler, because I didn't abort all the work (I left more than can be processed before the deadline) but still got sent *more tasks* to process!

...
08-Jun-2013 10:56:37 [Einstein@Home] task PA0057_015D1_228_0 aborted by user
08-Jun-2013 10:56:43 [Einstein@Home] update requested by user
08-Jun-2013 10:56:49 [Einstein@Home] Sending scheduler request: Requested by user.
08-Jun-2013 10:56:49 [Einstein@Home] Reporting 265 completed tasks[b] logging flag (and possibly others too), and post that information about the state of their machine when these unexpected requests are made.

Claggy
Claggy
Joined: 29 Dec 06
Posts: 560
Credit: 2,699,403
RAC: 0

RE: Note that you only get

Quote:
Note that you only get sent new work when your computer asks for it.


That's not quite always the case here, if someone resets the project here, work will get resent even with NNT set. (old Boinc server software) For example:

Quote:
08/06/2013 12:20:10 | Einstein@Home | update requested by user
08/06/2013 12:20:12 | Einstein@Home | [sched_op] Starting scheduler request
08/06/2013 12:20:12 | Einstein@Home | Sending scheduler request: Requested by user.
08/06/2013 12:20:12 | Einstein@Home | Not requesting tasks: "no new tasks" requested via Manager
08/06/2013 12:20:12 | Einstein@Home | [sched_op] CPU work request: 0.00 seconds; 0.00 devices
08/06/2013 12:20:12 | Einstein@Home | [sched_op] NVIDIA work request: 0.00 seconds; 0.00 devices
08/06/2013 12:20:12 | Einstein@Home | [sched_op] ATI work request: 0.00 seconds; 0.00 devices
08/06/2013 12:20:17 | Einstein@Home | Scheduler request completed
08/06/2013 12:20:17 | Einstein@Home | [sched_op] Server version 611
08/06/2013 12:20:17 | Einstein@Home | Resent lost task h1_0505.55_S6GC1__S6BucketLVEa_505.663682292Hz_1001_1
08/06/2013 12:20:17 | Einstein@Home | Resent lost task h1_0493.20_S6GC1__S6BucketLVEa_493.313682292Hz_356_1
08/06/2013 12:20:17 | Einstein@Home | Resent lost task h1_0508.80_S6GC1__S6BucketLVEa_508.913682292Hz_1467_0
08/06/2013 12:20:17 | Einstein@Home | Resent lost task h1_0508.80_S6GC1__S6BucketLVEa_508.913682292Hz_1466_0
08/06/2013 12:20:17 | Einstein@Home | Resent lost task h1_0493.20_S6GC1__S6BucketLVEa_493.313682292Hz_355_1
08/06/2013 12:20:17 | Einstein@Home | Resent lost task h1_0493.20_S6GC1__S6BucketLVEa_493.313682292Hz_354_1
08/06/2013 12:20:17 | Einstein@Home | Resent lost task h1_0485.45_S6GC1__S6BucketLVEa_485.563682292Hz_399_2
08/06/2013 12:20:17 | Einstein@Home | Resent lost task h1_0505.55_S6GC1__S6BucketLVEa_505.663682292Hz_1000_0
08/06/2013 12:20:17 | Einstein@Home | Project requested delay of 60 seconds
08/06/2013 12:20:17 | Einstein@Home | [sched_op] estimated total CPU task duration: 111062 seconds
08/06/2013 12:20:17 | Einstein@Home | [sched_op] estimated total NVIDIA task duration: 0 seconds
08/06/2013 12:20:17 | Einstein@Home | [sched_op] estimated total ATI task duration: 0 seconds
08/06/2013 12:20:17 | Einstein@Home | [sched_op] Deferring communication for 1 min 0 sec
08/06/2013 12:20:17 | Einstein@Home | [sched_op] Reason: requested by project

Edit: I appreciate you did say New work.

Claggy

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 2,143
Credit: 2,945,427,524
RAC: 692,947

RE: RE: Note that you

Quote:
Quote:
Note that you only get sent new work when your computer asks for it.

That's not quite always the case here, if someone resets the project here, work will get resent even with NNT set.

Claggy


True, but if I may dare say so, not relevant to the particular problem we're trying to track down via this thread.

I see no sign of a reset in that log snippet, but maybe I should have said "you only got sent new work when your computer asked for it."

Neil Newell
Neil Newell
Joined: 20 Nov 12
Posts: 176
Credit: 169,699,457
RAC: 0

RE: To get to the bottom

Quote:

To get to the bottom of this, we're going to have to ask people to set the logging flag (and possibly others too), and post that information about the state of their machine when these unexpected requests are made.

I'll set the work_fetch_debug flag later (never used it before); as I've still got too much work, I'll see if I can reproduce the issue.

The procedure I followed was simply to select a large number of BRP5 tasks and hit 'abort'; subsequently I hit 'update' to report them in (at which point I got the extra work). I hadn't set NNT, on the assumption I shouldn't get any more work with over 90 BRP5s still queued.

Note that I only got sent 7 BRP5 tasks (unlike the previous work overfetch where I received hundreds) so is this the same issue? That particular host has the cache set to 1.0 day and can do about 7 tasks a day (which is what it fetched, despite the 90+ queued).

Here's a snippet of the server log. Not knowing much about it, the line that stands out is:

2013-06-08 09:57:01.5356 [PID=24960] [send] CUDA: req 173160.00 sec, 2.00 instances; est delay 0.00

As it's a dual-GPU host, that's just over a days work (2x86,400s).

....couple of hundred aborted task reports...
2013-06-08 09:57:00.9726 [PID=24960]    [handle] cpu time 0.000000 credit/sec 0.004662, claimed credit 0.000000
2013-06-08 09:57:00.9726 [PID=24960]    [handle] [RESULT#383183043] [WU#166319358]: client_state 6 exit_status 203; setting outcome ERROR
2013-06-08 09:57:01.5355 [PID=24960]    [send] effective_ncpus 2 max_jobs_on_host_cpu 999999 max_jobs_on_host 999999
2013-06-08 09:57:01.5355 [PID=24960]    [send] effective_ngpus 2 max_jobs_on_host_gpu 999999
2013-06-08 09:57:01.5355 [PID=24960]    [send] Not using matchmaker scheduling; Not using EDF sim
2013-06-08 09:57:01.5355 [PID=24960]    [send] CPU: req 0.00 sec, 0.00 instances; est delay 0.00
2013-06-08 09:57:01.5356 [PID=24960]    [send] CUDA: req 173160.00 sec, 2.00 instances; est delay 0.00
2013-06-08 09:57:01.5356 [PID=24960]    [send] work_req_seconds: 0.00 secs
2013-06-08 09:57:01.5356 [PID=24960]    [send] available disk 23.75 GB, work_buf_min 0
2013-06-08 09:57:01.5356 [PID=24960]    [send] active_frac 1.000000 on_frac 0.999950 DCF 1.445824
2013-06-08 09:57:01.5469 [PID=24960]    [send] [HOST#6564477] is reliable
2013-06-08 09:57:01.5470 [PID=24960]    [send] set_trust: random choice for error rate 0.000010: yes
2013-06-08 09:57:01.5470 [PID=24960]    [mixed] sending non-locality work first
2013-06-08 09:57:01.5611 [PID=24960]    [version] Don't need CPU jobs, skipping version 104 for hsgamma_FGRP2 ()
2013-06-08 09:57:01.5612 [PID=24960]    [version] no app version available: APP#21 (hsgamma_FGRP2) PLATFORM#1 (i686-pc-linux-gnu) min_version 0
2013-06-08 09:57:01.5612 [PID=24960]    [version] Checking plan class 'BRP4cuda32nv270'
2013-06-08 09:57:01.5617 [PID=24960]    [version] reading plan classes from file '/BOINC/projects/EinsteinAtHome/plan_class_spec.xml'
2013-06-08 09:57:01.5617 [PID=24960]    [version] parsed project prefs setting 'gpu_util_brp': 0.500000
2013-06-08 09:57:01.5617 [PID=24960]    [version] plan class ok
2013-06-08 09:57:01.5617 [PID=24960]    [version] Checking plan class 'opencl-ati'
2013-06-08 09:57:01.5617 [PID=24960]    [version] parsed project prefs setting 'gpu_util_brp': 0.500000
2013-06-08 09:57:01.5617 [PID=24960]    [version] No ATI devices found
2013-06-08 09:57:01.5617 [PID=24960]    [version] Best version of app einsteinbinary_BRP5 is ID 422 (34.86 GFLOPS)
2013-06-08 09:57:01.5631 [PID=24960] [debug]   Sorted list of URLs follows [host timezone: UTC+3600]
2013-06-08 09:57:01.5632 [PID=24960] [debug]   zone=+03600 url=http://einstein2.aei.uni-hannover.de
2013-06-08 09:57:01.5632 [PID=24960] [debug]   zone=-21600 url=http://einstein-dl2.phys.uwm.edu
2013-06-08 09:57:01.5632 [PID=24960] [debug]   zone=-21600 url=http://einstein-dl4.phys.uwm.edu
2013-06-08 09:57:01.5632 [PID=24960] [debug]   zone=-28800 url=http://einstein.ligo.caltech.edu
2013-06-08 09:57:01.5635 [PID=24960]    [send] [HOST#6564477] Sending app_version einsteinbinary_BRP5 1 133 BRP4cuda32nv270; 34.86 GFL1.5475 [PID=25020]    [locality] work generator says no work remaining for trigger h1_0452.15_S6GC1
2013-06-08 09:57:01.5664 [PID=24960]    [send] est. duration for WU 166885627: unscaled 19506.22 scaled 28203.96
2013-06-08 09:57:01.5664 [PID=24960]    [HOST#6564477] Sending [RESULT#384480126 PA0069_00971_204_1] (est. dur. 28203.96 seconds)
2013-06-08 09:57:01.5674 [PID=24960]    [version] Checking plan class 'BRP4SSE'
2013-06-08 09:57:01.5674 [PID=24960]    [version] project prefs setting 'also_run_cpu' (1.000000) prevents using plan class.
2013-06-08 09:57:01.5674 [PID=24960]    [version] no app version available: APP#19 (einsteinbinary_BRP4) PLATFORM#1 (i686-pc-linux-gnu) min_version 0
2013-06-08 09:57:01.5675 [PID=24960]    [send] est. duration for WU 166873249: unscaled 19506.22 scaled 28203.96
2013-06-08 09:57:01.5675 [PID=24960]    [send] [WU#166873249] meets deadline: 28203.96 + 28203.96 < 1209600
2013-06-08 09:57:01.5687 [PID=24960]    [send] [HOST#6564477] Sending app_version einsteinbinary_BRP5 1 133 BRP4cuda32nv270; 34.86 GFLOPS
2013-06-08 09:57:01.5705 [PID=24960]    [send] est. duration for WU 166873249: unscaled 19506.22 scaled 28203.96
....more downloads....
5913 [PID=24960]    [send] don't need more work
2013-06-08 09:57:01.5913 [PID=24960]    [mixed] sending locality work second
2013-06-08 09:57:01.5914 [PID=24960]    [locality] [HOST#6564477] removing file earth_09_11 from file_infos list
2013-06-08 09:57:01.5914 [PID=24960]    [locality] [HOST#6564477] removing file sun_09_11 from file_infos list
2013-06-08 09:57:01.5915 [PID=24960]    [locality] [HOST#6564477] removing file S6GC1_T60h_v1_Segments.seg from file_infos list
2013-06-08 09:57:01.5915 [PID=24960]    [locality] [HOST#6564477] removing file l1_0499.50_S6GC1 from file_infos list
2013-06-08 09:57:01.5915 [PID=24960]    [locality] [HOST#6564477] removing file l1_0499.55_S6GC1 from file_infos list
2013-06-08 09:57:01.5916 [PID=24960]    [locality] [HOST#6564477] removing file l1_0499.60_S6GC1 from file_infos list
2013-06-08 09:57:01.5916 [PID=24960]    [locality] [HOST#6564477] removing file l1_0499.65_S6GC1 from file_infos list
2013-06-08 09:57:01.5916 [PID=24960]    [locality] [HOST#6564477] removing file l1_0499.70_S6GC1 from file_infos list
2013-06-08 09:57:01.5916 [PID=24960]    [locality] [HOST#6564477] removing file l1_0499.75_S6GC1 from file_infos list
2013-06-08 09:57:01.5916 [PID=24960]    [locality] [HOST#6564477] removing file JPLEPH.405 from file_infos list
2013-06-08 09:57:01.5916 [PID=24960]    [locality] [HOST#6564477] removing file rand_PAS.bank.v3 from file_infos list
2013-06-08 09:57:01.5916 [PID=24960]    [locality] [HOST#6564477] has file h1_0499.50_S6GC1
2013-06-08 09:57:01.5916 [PID=24960]    [locality] [HOST#6564477] has file h1_0499.55_S6GC1
2013-06-08 09:57:01.5916 [PID=24960]    [locality] [HOST#6564477] has file h1_0499.60_S6GC1
2013-06-08 09:57:01.5916 [PID=24960]    [locality] [HOST#6564477] has file h1_0499.65_S6GC1
2013-06-08 09:57:01.5917 [PID=24960]    [locality] [HOST#6564477] has file h1_0499.70_S6GC1
2013-06-08 09:57:01.5917 [PID=24960]    [locality] [HOST#6564477] has file h1_0499.75_S6GC1
2013-06-08 09:57:01.5917 [PID=24960]    [send] don't need more work
2013-06-08 09:57:01.5917 [PID=24960]    [send] don't need more work
2013-06-08 09:57:01.5927 [PID=24960]    Sending reply to [HOST#6564477]: 7 results, delay req 60.00
2013-06-08 09:57:01.5932 [PID=24960]    Scheduler ran 1.900 seconds

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 2,143
Credit: 2,945,427,524
RAC: 692,947

Well, the good news is that

Well, the good news is that host 6564477 hasn't contacted the server again since you took that log, so it's conceivable that the problem is no longer active - BOINC may genuinely have felt that you had aborted a little too much work and needed a top-up (though that seems unlikely, on the figures you've given)

Looking at the tasks listed for the computer, and perhaps in particular at the block you aborted, I'm getting the feeling that the tasks were issued - not all in one go - but in a sequence of separate requests just over a minute apart. I'd need to scrape all 25 pages or whatever, and sort them into date order to be certain, but that's what I found when I did the analysis on tbret's machine.

If confirmed, the problem isn't with the server allocating too much work, or with the client asking for too much at once. Instead, it's with the client asking for a reasonable amount again, and again, and again - failing to take account of the fact that the amount of work it has cached is growing with every request.

Neil Newell
Neil Newell
Joined: 20 Nov 12
Posts: 176
Credit: 169,699,457
RAC: 0

That does seem quite

That does seem quite possible; it's as if that host thought it had *no work*, so it downloaded a day's worth to fill its cache (even though it had over 12 days work to do, with only 6 or 7 days until the deadline).

The fix for the "once a minute" polling was as per Bikeman's message - simply selecting "Reload config file" stopped it on all my hosts that were doing it.

As Host 6564477 still has too much work to do before the tasks timeout, I experimented further by aborting a single queued task and updating, then aborting about 20 tasks and updating. Neither test resulted in the download of new work; unlike the earlier case, the client didn't ask for any.

2013-06-08 16:39:23.8794 [PID=14036] [send] CPU: req 0.00 sec, 0.00 instances; est delay 0.00
2013-06-08 16:39:23.8794 [PID=14036] [send] CUDA: req 0.00 sec, 0.00 instances; est delay 0.00
2013-06-08 16:39:23.8794 [PID=14036] [send] work_req_seconds: 0.00 secs

Jacob Klein
Jacob Klein
Joined: 22 Jun 11
Posts: 45
Credit: 114,028,547
RAC: 0

I've spent a little time,

I've spent a little time, trying to recreate the "client requests unlimited work" issue, and I believe I have made some (limited) progress.

First of all, I'd like to point out something that may have been overlooked before, in one of the lines of the scheduler request that Richard posted earlier. Specifically, in this line:

Quote:
2013-05-27 00:00:41.2146 [PID=16149] [send] CUDA: req 87264.00 sec, 1.00 instances; est delay 0.00

... The amount BOINC is requesting is (87264.00 secs = exactly 1.01 days)... most likely from a cache preference setting of 1.0d + 0.01d.
But that's not the thing I'd like to point out.

BOINC actually also believes that a GPU is currently idle.
I believe this is an important factor in determining the cause.

When I set out to do my testing, I wanted to mimic tbret's setup as much as possible. I noticed he had 3 GTX 560's, which I don't have, but I was able to build a setup that's fairly close, which had:
- Cache settings 1.0d + 0.1d
- Max % processors: 25% (I have 8 CPUs, and this left 2 CPUs, matching his scheduler request)
- Only work while non-BOINC processor usage is flag
- app_config: 0.5 ... since he said he was running 2 tasks per GPU
- My 3 GPUs:
Device 0: GTX 660 Ti
Device 1: GTX 460
Device 2: GTS 240
- Einstein project preferences: Do not use CPU (so my work fetch would show "(blocked by prefs)" for CPU tasks, just like his work fetch log did)

.... and I tested.
- For the first test, I just let it go and watched. It correctly filled the buffer, and correctly asked for work when needed.
- I tested making subtle cache setting changes. They appeared to work correctly, making RPC work requests only when saturation was less than min_buf.
- I tested running a CPU-intensive app. I did notice that, even though BOINC says it is suspending work because "CPU is busy", these Einstein apps don't suspend right away, but I believe that is a separate BOINC API issue that Oliver is fixing separately. But, otherwise, this too was working as expected.

Then it dawned on me. What if something happened to make a device unavailable?

So, as a test, while BOINC was happily running 6 tasks (2 per GPU)... I went to Windows Device Manager, and disabled Device 2, the GTS 240. Sure, this isn't a legit scenario that you guys did to trigger the issue, but perhaps it might be similar to something like a Windows Update nVidia Driver update that auto-installs (By the way, guys that had this unlimited-work-fetch issue, can you PLEASE check to see if you have any Windows Update installs around the same time as the issues started?)

Back to this test, though... Some interesting chaos ensued. :)
The 2 tasks that were running on that device resulted in Computation Error, and then BOINC tried to assign the remaining unstarted tasks to the now-disabled device, and they too resulted in computation errors.

Then, while reporting these results, there were some unexpected backoffs in the log (I'll post it below), and the project even reverted to a "Master fetch pending" state, too. I honestly don't understand why that happened, but it's a bit irrelevant.

The most important part was:
I was able to get BOINC to request work for an "idle instance", for a large amount of secs, for a device that wasn't even available anymore.

I had to abandon my testing because of a 64-job-quota-per-day limit, and so, I might resume testing in a day or two.
I believe my next test will be:
Install older nVidia drivers, get BOINC running happily on 6 tasks with a full buffer, and then update the drivers. I'm all about the chaos. :)

What do you guys think?

08-Jun-2013 21:27:53 [---] Starting BOINC client version 7.0.64 for windows_x86_64
08-Jun-2013 21:27:53 [---] log flags: file_xfer, sched_ops, task, checkpoint_debug, scrsave_debug, unparsed_xml
08-Jun-2013 21:27:53 [---] log flags: work_fetch_debug
08-Jun-2013 21:27:53 [---] Libraries: libcurl/7.25.0 OpenSSL/1.0.1 zlib/1.2.6
08-Jun-2013 21:27:53 [---] Data directory: C:\ProgramData\BOINC
08-Jun-2013 21:27:53 [---] Running under account Jacob
08-Jun-2013 21:27:53 [---] Processor: 8 GenuineIntel Intel(R) Core(TM) i7 CPU 965 @ 3.20GHz [Family 6 Model 26 Stepping 4]
08-Jun-2013 21:27:53 [---] Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss htt tm pni ssse3 cx16 sse4_1 sse4_2 popcnt syscall nx lm vmx tm2 pbe
08-Jun-2013 21:27:53 [---] OS: Microsoft Windows 8: Professional with Media Center x64 Edition, (06.02.9200.00)
08-Jun-2013 21:27:53 [---] Memory: 11.99 GB physical, 27.99 GB virtual
08-Jun-2013 21:27:53 [---] Disk: 277.28 GB total, 164.53 GB free
08-Jun-2013 21:27:53 [---] Local time is UTC -4 hours
08-Jun-2013 21:27:53 [---] CUDA: NVIDIA GPU 0: GeForce GTX 660 Ti (driver version 320.18, CUDA version 5.50, compute capability 3.0, 3072MB, 2784MB available, 3021 GFLOPS peak)
08-Jun-2013 21:27:53 [---] CUDA: NVIDIA GPU 1: GeForce GTX 460 (driver version 320.18, CUDA version 5.50, compute capability 2.1, 1024MB, 945MB available, 1025 GFLOPS peak)
08-Jun-2013 21:27:53 [---] CUDA: NVIDIA GPU 2: GeForce GTS 240 (driver version 320.18, CUDA version 5.50, compute capability 1.1, 1024MB, 960MB available, 544 GFLOPS peak)
08-Jun-2013 21:27:53 [---] OpenCL: NVIDIA GPU 0: GeForce GTX 660 Ti (driver version 320.18, device version OpenCL 1.1 CUDA, 3072MB, 2784MB available, 3021 GFLOPS peak)
08-Jun-2013 21:27:53 [---] OpenCL: NVIDIA GPU 1: GeForce GTX 460 (driver version 320.18, device version OpenCL 1.1 CUDA, 1024MB, 945MB available, 1025 GFLOPS peak)
08-Jun-2013 21:27:53 [---] OpenCL: NVIDIA GPU 2: GeForce GTS 240 (driver version 320.18, device version OpenCL 1.0 CUDA, 1024MB, 960MB available, 544 GFLOPS peak)
08-Jun-2013 21:27:53 [Einstein@Home] Found app_config.xml
08-Jun-2013 21:27:53 [---] Config: report completed tasks immediately
08-Jun-2013 21:27:53 [---] Config: use all coprocessors
08-Jun-2013 21:27:53 [Einstein@Home] URL http://einstein.phys.uwm.edu/; Computer ID 7428747; resource share 1
08-Jun-2013 21:27:53 [---] General prefs: from http://boincsimap.org/boincsimap/ (last modified 02-Jun-2013 23:35:39)
08-Jun-2013 21:27:53 [---] Host location: none
08-Jun-2013 21:27:53 [---] General prefs: using your defaults
08-Jun-2013 21:27:53 [---] Reading preferences override file
08-Jun-2013 21:27:53 [---] Preferences:
08-Jun-2013 21:27:53 [---] max memory usage when active: 6139.49MB
08-Jun-2013 21:27:53 [---] max memory usage when idle: 11051.09MB
08-Jun-2013 21:27:53 [---] max disk usage: 164.91GB
08-Jun-2013 21:27:53 [---] max CPUs used: 2
08-Jun-2013 21:27:53 [---] (to change preferences, visit a project web site or select Preferences in the Manager)
08-Jun-2013 21:27:53 [---] [work_fetch] Request work fetch: Prefs update
08-Jun-2013 21:27:53 [---] [work_fetch] Request work fetch: Startup
08-Jun-2013 21:27:53 [---] Not using a proxy
...
...
6/8/2013 9:55:39 PM | Einstein@Home | [checkpoint] result PA0052_005A1_201_4 checkpointed
6/8/2013 9:55:39 PM | Einstein@Home | [checkpoint] result PA0069_01151_387_1 checkpointed
6/8/2013 9:55:39 PM | Einstein@Home | [checkpoint] result PA0069_01161_78_1 checkpointed
6/8/2013 9:55:39 PM | Einstein@Home | [checkpoint] result PA0069_01171_129_1 checkpointed
6/8/2013 9:55:52 PM | Einstein@Home | [checkpoint] result PA0069_00441_21_4 checkpointed
6/8/2013 9:55:55 PM | Einstein@Home | [checkpoint] result PA0069_011A1_360_0 checkpointed
6/8/2013 9:56:00 PM | | [work_fetch] work fetch start
6/8/2013 9:56:00 PM | | [work_fetch] choose_project() for NVIDIA: buffer_low: no; sim_excluded_instances 0
6/8/2013 9:56:00 PM | | [work_fetch] choose_project() for CPU: buffer_low: yes; sim_excluded_instances 0
6/8/2013 9:56:00 PM | | [work_fetch] no eligible project for CPU
6/8/2013 9:56:00 PM | | [work_fetch] ------- start work fetch state -------
6/8/2013 9:56:00 PM | | [work_fetch] target work buffer: 86400.00 + 8640.00 sec
6/8/2013 9:56:00 PM | | [work_fetch] --- project states ---
6/8/2013 9:56:00 PM | Einstein@Home | [work_fetch] REC 18467.741 prio -1.061454 can req work
6/8/2013 9:56:00 PM | | [work_fetch] --- state for CPU ---
6/8/2013 9:56:00 PM | | [work_fetch] shortfall 190080.00 nidle 2.00 saturated 0.00 busy 0.00
6/8/2013 9:56:00 PM | Einstein@Home | [work_fetch] fetch share 0.000 (blocked by prefs)
6/8/2013 9:56:00 PM | | [work_fetch] --- state for NVIDIA ---
6/8/2013 9:56:00 PM | | [work_fetch] shortfall 0.00 nidle 0.00 saturated 99371.17 busy 0.00
6/8/2013 9:56:00 PM | Einstein@Home | [work_fetch] fetch share 1.000
6/8/2013 9:56:00 PM | | [work_fetch] ------- end work fetch state -------
6/8/2013 9:56:00 PM | | [work_fetch] No project chosen for work fetch
(GPU Device 2 disabled from Windows Device Manager)
6/8/2013 9:56:27 PM | | [work_fetch] Request work fetch: application exited
6/8/2013 9:56:27 PM | | [work_fetch] Request work fetch: application exited
6/8/2013 9:56:27 PM | Einstein@Home | Computation for task PA0069_01161_78_1 finished
6/8/2013 9:56:27 PM | Einstein@Home | Output file PA0069_01161_78_1_0 for task PA0069_01161_78_1 absent
6/8/2013 9:56:27 PM | Einstein@Home | Output file PA0069_01161_78_1_1 for task PA0069_01161_78_1 absent
6/8/2013 9:56:27 PM | Einstein@Home | Output file PA0069_01161_78_1_2 for task PA0069_01161_78_1 absent
6/8/2013 9:56:27 PM | Einstein@Home | Computation for task PA0069_00441_21_4 finished
6/8/2013 9:56:27 PM | Einstein@Home | Output file PA0069_00441_21_4_0 for task PA0069_00441_21_4 absent
6/8/2013 9:56:27 PM | Einstein@Home | Output file PA0069_00441_21_4_1 for task PA0069_00441_21_4 absent
6/8/2013 9:56:27 PM | Einstein@Home | Output file PA0069_00441_21_4_2 for task PA0069_00441_21_4 absent
6/8/2013 9:56:27 PM | Einstein@Home | Starting task PA0069_01161_336_0 using einsteinbinary_BRP5 version 133 (BRP4cuda32nv301) in slot 2
6/8/2013 9:56:27 PM | Einstein@Home | Starting task PA0069_010A1_210_0 using einsteinbinary_BRP5 version 133 (BRP4cuda32nv301) in slot 4
6/8/2013 9:56:30 PM | | [work_fetch] Request work fetch: application exited
6/8/2013 9:56:30 PM | Einstein@Home | Computation for task PA0069_01161_336_0 finished
6/8/2013 9:56:30 PM | Einstein@Home | Output file PA0069_01161_336_0_0 for task PA0069_01161_336_0 absent
6/8/2013 9:56:30 PM | Einstein@Home | Output file PA0069_01161_336_0_1 for task PA0069_01161_336_0 absent
6/8/2013 9:56:30 PM | Einstein@Home | Output file PA0069_01161_336_0_2 for task PA0069_01161_336_0 absent
6/8/2013 9:56:30 PM | Einstein@Home | Starting task PA0069_01021_375_0 using einsteinbinary_BRP5 version 133 (BRP4cuda32nv301) in slot 2
6/8/2013 9:56:30 PM | | [work_fetch] work fetch start
6/8/2013 9:56:30 PM | | [work_fetch] choose_project() for NVIDIA: buffer_low: yes; sim_excluded_instances 0
6/8/2013 9:56:30 PM | | [work_fetch] no eligible project for NVIDIA
6/8/2013 9:56:30 PM | | [work_fetch] choose_project() for CPU: buffer_low: yes; sim_excluded_instances 0
6/8/2013 9:56:30 PM | | [work_fetch] no eligible project for CPU
6/8/2013 9:56:30 PM | | [work_fetch] ------- start work fetch state -------
6/8/2013 9:56:30 PM | | [work_fetch] target work buffer: 86400.00 + 8640.00 sec
6/8/2013 9:56:30 PM | | [work_fetch] --- project states ---
6/8/2013 9:56:30 PM | Einstein@Home | [work_fetch] REC 18532.820 prio -0.054935 can't req work: scheduler RPC backoff (backoff: 441.53 sec)
6/8/2013 9:56:30 PM | | [work_fetch] --- state for CPU ---
6/8/2013 9:56:30 PM | | [work_fetch] shortfall 190080.00 nidle 2.00 saturated 0.00 busy 0.00
6/8/2013 9:56:30 PM | Einstein@Home | [work_fetch] fetch share 0.000 (blocked by prefs)
6/8/2013 9:56:30 PM | | [work_fetch] --- state for NVIDIA ---
6/8/2013 9:56:30 PM | | [work_fetch] shortfall 12975.13 nidle 0.00 saturated 81780.44 busy 0.00
6/8/2013 9:56:30 PM | Einstein@Home | [work_fetch] fetch share 0.000
6/8/2013 9:56:30 PM | | [work_fetch] ------- end work fetch state -------
6/8/2013 9:56:30 PM | | [work_fetch] No project chosen for work fetch
6/8/2013 9:56:32 PM | | [work_fetch] Request work fetch: application exited
6/8/2013 9:56:32 PM | Einstein@Home | Computation for task PA0069_010A1_210_0 finished
6/8/2013 9:56:32 PM | Einstein@Home | Output file PA0069_010A1_210_0_0 for task PA0069_010A1_210_0 absent
6/8/2013 9:56:32 PM | Einstein@Home | Output file PA0069_010A1_210_0_1 for task PA0069_010A1_210_0 absent
6/8/2013 9:56:32 PM | Einstein@Home | Output file PA0069_010A1_210_0_2 for task PA0069_010A1_210_0 absent
6/8/2013 9:56:32 PM | Einstein@Home | Starting task PA0059_005B1_300_2 using einsteinbinary_BRP5 version 133 (BRP4cuda32nv301) in slot 4
6/8/2013 9:56:34 PM | | [work_fetch] Request work fetch: application exited
6/8/2013 9:56:34 PM | Einstein@Home | Computation for task PA0069_01021_375_0 finished
6/8/2013 9:56:34 PM | Einstein@Home | Output file PA0069_01021_375_0_0 for task PA0069_01021_375_0 absent
6/8/2013 9:56:34 PM | Einstein@Home | Output file PA0069_01021_375_0_1 for task PA0069_01021_375_0 absent
6/8/2013 9:56:34 PM | Einstein@Home | Output file PA0069_01021_375_0_2 for task PA0069_01021_375_0 absent
6/8/2013 9:56:34 PM | Einstein@Home | Starting task PA0069_011A1_57_0 using einsteinbinary_BRP5 version 133 (BRP4cuda32nv301) in slot 2
6/8/2013 9:56:35 PM | | [work_fetch] Request work fetch: application exited
6/8/2013 9:56:35 PM | Einstein@Home | Computation for task PA0059_005B1_300_2 finished
6/8/2013 9:56:35 PM | Einstein@Home | Output file PA0059_005B1_300_2_0 for task PA0059_005B1_300_2 absent
6/8/2013 9:56:35 PM | Einstein@Home | Output file PA0059_005B1_300_2_1 for task PA0059_005B1_300_2 absent
6/8/2013 9:56:35 PM | Einstein@Home | Output file PA0059_005B1_300_2_2 for task PA0059_005B1_300_2 absent
6/8/2013 9:56:35 PM | Einstein@Home | Starting task PA0069_00941_48_2 using einsteinbinary_BRP5 version 133 (BRP4cuda32nv301) in slot 4
6/8/2013 9:56:36 PM | | [work_fetch] work fetch start
6/8/2013 9:56:36 PM | | [work_fetch] choose_project() for NVIDIA: buffer_low: yes; sim_excluded_instances 0
6/8/2013 9:56:36 PM | | [work_fetch] no eligible project for NVIDIA
6/8/2013 9:56:36 PM | | [work_fetch] choose_project() for CPU: buffer_low: yes; sim_excluded_instances 0
6/8/2013 9:56:36 PM | | [work_fetch] no eligible project for CPU
6/8/2013 9:56:36 PM | | [work_fetch] ------- start work fetch state -------
6/8/2013 9:56:36 PM | | [work_fetch] target work buffer: 86400.00 + 8640.00 sec
6/8/2013 9:56:36 PM | | [work_fetch] --- project states ---
6/8/2013 9:56:36 PM | Einstein@Home | [work_fetch] REC 18537.742 prio -0.050050 can't req work: scheduler RPC backoff (backoff: 1961.00 sec)
6/8/2013 9:56:36 PM | | [work_fetch] --- state for CPU ---
6/8/2013 9:56:36 PM | | [work_fetch] shortfall 190080.00 nidle 2.00 saturated 0.00 busy 0.00
6/8/2013 9:56:36 PM | Einstein@Home | [work_fetch] fetch share 0.000 (blocked by prefs)
6/8/2013 9:56:36 PM | | [work_fetch] --- state for NVIDIA ---
6/8/2013 9:56:36 PM | | [work_fetch] shortfall 32805.61 nidle 0.00 saturated 75212.23 busy 0.00
6/8/2013 9:56:36 PM | Einstein@Home | [work_fetch] fetch share 0.000
6/8/2013 9:56:36 PM | | [work_fetch] ------- end work fetch state -------
6/8/2013 9:56:36 PM | | [work_fetch] No project chosen for work fetch
6/8/2013 9:56:38 PM | | [work_fetch] Request work fetch: application exited
6/8/2013 9:56:38 PM | Einstein@Home | Computation for task PA0069_011A1_57_0 finished
6/8/2013 9:56:38 PM | Einstein@Home | Output file PA0069_011A1_57_0_0 for task PA0069_011A1_57_0 absent
6/8/2013 9:56:38 PM | Einstein@Home | Output file PA0069_011A1_57_0_1 for task PA0069_011A1_57_0 absent
6/8/2013 9:56:38 PM | Einstein@Home | Output file PA0069_011A1_57_0_2 for task PA0069_011A1_57_0 absent
6/8/2013 9:56:38 PM | Einstein@Home | Starting task PA0058_00221_129_3 using einsteinbinary_BRP5 version 133 (BRP4cuda32nv301) in slot 2
6/8/2013 9:56:39 PM | | [work_fetch] Request work fetch: application exited
6/8/2013 9:56:39 PM | Einstein@Home | [checkpoint] result PA0052_005A1_201_4 checkpointed
6/8/2013 9:56:39 PM | Einstein@Home | [checkpoint] result PA0069_01151_387_1 checkpointed
6/8/2013 9:56:39 PM | Einstein@Home | Computation for task PA0069_00941_48_2 finished
6/8/2013 9:56:39 PM | Einstein@Home | Output file PA0069_00941_48_2_0 for task PA0069_00941_48_2 absent
6/8/2013 9:56:39 PM | Einstein@Home | Output file PA0069_00941_48_2_1 for task PA0069_00941_48_2 absent
6/8/2013 9:56:39 PM | Einstein@Home | Output file PA0069_00941_48_2_2 for task PA0069_00941_48_2 absent
6/8/2013 9:56:39 PM | Einstein@Home | Starting task PA0069_00221_378_3 using einsteinbinary_BRP5 version 133 (BRP4cuda32nv301) in slot 4
6/8/2013 9:56:40 PM | Einstein@Home | [checkpoint] result PA0069_01171_129_1 checkpointed
6/8/2013 9:56:41 PM | | [work_fetch] Request work fetch: application exited
6/8/2013 9:56:41 PM | Einstein@Home | Computation for task PA0058_00221_129_3 finished
6/8/2013 9:56:41 PM | Einstein@Home | Output file PA0058_00221_129_3_0 for task PA0058_00221_129_3 absent
6/8/2013 9:56:41 PM | Einstein@Home | Output file PA0058_00221_129_3_1 for task PA0058_00221_129_3 absent
6/8/2013 9:56:41 PM | Einstein@Home | Output file PA0058_00221_129_3_2 for task PA0058_00221_129_3 absent
6/8/2013 9:56:41 PM | Einstein@Home | Starting task PA0069_011A1_354_0 using einsteinbinary_BRP5 version 133 (BRP4cuda32nv301) in slot 2
6/8/2013 9:56:41 PM | | [work_fetch] work fetch start
6/8/2013 9:56:41 PM | | [work_fetch] choose_project() for NVIDIA: buffer_low: yes; sim_excluded_instances 0
6/8/2013 9:56:41 PM | | [work_fetch] no eligible project for NVIDIA
6/8/2013 9:56:41 PM | | [work_fetch] choose_project() for CPU: buffer_low: yes; sim_excluded_instances 0
6/8/2013 9:56:41 PM | | [work_fetch] no eligible project for CPU
6/8/2013 9:56:41 PM | | [work_fetch] ------- start work fetch state -------
6/8/2013 9:56:41 PM | | [work_fetch] target work buffer: 86400.00 + 8640.00 sec
6/8/2013 9:56:41 PM | | [work_fetch] --- project states ---
6/8/2013 9:56:41 PM | Einstein@Home | [work_fetch] REC 18543.657 prio -0.040284 can't req work: scheduler RPC backoff (backoff: 13623.80 sec)
6/8/2013 9:56:41 PM | | [work_fetch] --- state for CPU ---
6/8/2013 9:56:41 PM | | [work_fetch] shortfall 190080.00 nidle 2.00 saturated 0.00 busy 0.00
6/8/2013 9:56:41 PM | Einstein@Home | [work_fetch] fetch share 0.000 (blocked by prefs)
6/8/2013 9:56:41 PM | | [work_fetch] --- state for NVIDIA ---
6/8/2013 9:56:41 PM | | [work_fetch] shortfall 78260.42 nidle 0.00 saturated 56701.06 busy 0.00
6/8/2013 9:56:41 PM | Einstein@Home | [work_fetch] fetch share 0.000
6/8/2013 9:56:41 PM | | [work_fetch] ------- end work fetch state -------
6/8/2013 9:56:41 PM | | [work_fetch] No project chosen for work fetch
6/8/2013 9:56:42 PM | | [work_fetch] Request work fetch: application exited
6/8/2013 9:56:42 PM | Einstein@Home | Computation for task PA0069_00221_378_3 finished
6/8/2013 9:56:42 PM | Einstein@Home | Output file PA0069_00221_378_3_0 for task PA0069_00221_378_3 absent
6/8/2013 9:56:42 PM | Einstein@Home | Output file PA0069_00221_378_3_1 for task PA0069_00221_378_3 absent
6/8/2013 9:56:42 PM | Einstein@Home | Output file PA0069_00221_378_3_2 for task PA0069_00221_378_3 absent
6/8/2013 9:56:42 PM | Einstein@Home | Starting task PA0057_009B1_273_2 using einsteinbinary_BRP5 version 133 (BRP4cuda32nv301) in slot 4
6/8/2013 9:56:45 PM | | [work_fetch] Request work fetch: application exited
6/8/2013 9:56:45 PM | Einstein@Home | Computation for task PA0069_011A1_354_0 finished
6/8/2013 9:56:45 PM | Einstein@Home | Output file PA0069_011A1_354_0_0 for task PA0069_011A1_354_0 absent
6/8/2013 9:56:45 PM | Einstein@Home | Output file PA0069_011A1_354_0_1 for task PA0069_011A1_354_0 absent
6/8/2013 9:56:45 PM | Einstein@Home | Output file PA0069_011A1_354_0_2 for task PA0069_011A1_354_0 absent
6/8/2013 9:56:45 PM | Einstein@Home | Starting task PA0052_02111_255_5 using einsteinbinary_BRP5 version 133 (BRP4cuda32nv301) in slot 2
6/8/2013 9:56:46 PM | | [work_fetch] Request work fetch: application exited
6/8/2013 9:56:46 PM | Einstein@Home | Computation for task PA0057_009B1_273_2 finished
6/8/2013 9:56:46 PM | Einstein@Home | Output file PA0057_009B1_273_2_0 for task PA0057_009B1_273_2 absent
6/8/2013 9:56:46 PM | Einstein@Home | Output file PA0057_009B1_273_2_1 for task PA0057_009B1_273_2 absent
6/8/2013 9:56:46 PM | Einstein@Home | Output file PA0057_009B1_273_2_2 for task PA0057_009B1_273_2 absent
6/8/2013 9:56:46 PM | Einstein@Home | Starting task PA0069_010D1_228_0 using einsteinbinary_BRP5 version 133 (BRP4cuda32nv301) in slot 4
6/8/2013 9:56:46 PM | | [work_fetch] work fetch start
6/8/2013 9:56:46 PM | | [work_fetch] choose_project() for NVIDIA: buffer_low: yes; sim_excluded_instances 0
6/8/2013 9:56:46 PM | | [work_fetch] no eligible project for NVIDIA
6/8/2013 9:56:46 PM | | [work_fetch] choose_project() for CPU: buffer_low: yes; sim_excluded_instances 0
6/8/2013 9:56:46 PM | | [work_fetch] no eligible project for CPU
6/8/2013 9:56:46 PM | | [work_fetch] ------- start work fetch state -------
6/8/2013 9:56:46 PM | | [work_fetch] target work buffer: 86400.00 + 8640.00 sec
6/8/2013 9:56:46 PM | | [work_fetch] --- project states ---
6/8/2013 9:56:46 PM | Einstein@Home | [work_fetch] REC 18548.570 prio -0.035400 can't req work: master URL fetch pending (backoff: 140.17 sec)
6/8/2013 9:56:46 PM | | [work_fetch] --- state for CPU ---
6/8/2013 9:56:46 PM | | [work_fetch] shortfall 190080.00 nidle 2.00 saturated 0.00 busy 0.00
6/8/2013 9:56:46 PM | Einstein@Home | [work_fetch] fetch share 0.000 (blocked by prefs)
6/8/2013 9:56:46 PM | | [work_fetch] --- state for NVIDIA ---
6/8/2013 9:56:46 PM | | [work_fetch] shortfall 103340.81 nidle 0.00 saturated 50141.06 busy 0.00
6/8/2013 9:56:46 PM | Einstein@Home | [work_fetch] fetch share 0.000
6/8/2013 9:56:46 PM | | [work_fetch] ------- end work fetch state -------
6/8/2013 9:56:46 PM | | [work_fetch] No project chosen for work fetch
6/8/2013 9:56:48 PM | | [work_fetch] Request work fetch: application exited
6/8/2013 9:56:48 PM | Einstein@Home | Computation for task PA0052_02111_255_5 finished
6/8/2013 9:56:48 PM | Einstein@Home | Output file PA0052_02111_255_5_0 for task PA0052_02111_255_5 absent
6/8/2013 9:56:48 PM | Einstein@Home | Output file PA0052_02111_255_5_1 for task PA0052_02111_255_5 absent
6/8/2013 9:56:48 PM | Einstein@Home | Output file PA0052_02111_255_5_2 for task PA0052_02111_255_5 absent
6/8/2013 9:56:48 PM | Einstein@Home | Starting task PA0059_01471_156_3 using einsteinbinary_BRP5 version 133 (BRP4cuda32nv301) in slot 2
6/8/2013 9:56:50 PM | | [work_fetch] Request work fetch: application exited
6/8/2013 9:56:50 PM | Einstein@Home | Computation for task PA0069_010D1_228_0 finished
6/8/2013 9:56:50 PM | Einstein@Home | Output file PA0069_010D1_228_0_0 for task PA0069_010D1_228_0 absent
6/8/2013 9:56:50 PM | Einstein@Home | Output file PA0069_010D1_228_0_1 for task PA0069_010D1_228_0 absent
6/8/2013 9:56:50 PM | Einstein@Home | Output file PA0069_010D1_228_0_2 for task PA0069_010D1_228_0 absent
6/8/2013 9:56:50 PM | Einstein@Home | Starting task PA0069_01031_258_1 using einsteinbinary_BRP5 version 133 (BRP4cuda32nv301) in slot 4
6/8/2013 9:56:52 PM | | [work_fetch] Request work fetch: application exited
6/8/2013 9:56:52 PM | Einstein@Home | Computation for task PA0059_01471_156_3 finished
6/8/2013 9:56:52 PM | Einstein@Home | Output file PA0059_01471_156_3_0 for task PA0059_01471_156_3 absent
6/8/2013 9:56:52 PM | Einstein@Home | Output file PA0059_01471_156_3_1 for task PA0059_01471_156_3 absent
6/8/2013 9:56:52 PM | Einstein@Home | Output file PA0059_01471_156_3_2 for task PA0059_01471_156_3 absent
6/8/2013 9:56:52 PM | Einstein@Home | Starting task PA0069_01191_300_1 using einsteinbinary_BRP5 version 133 (BRP4cuda32nv301) in slot 2
6/8/2013 9:56:52 PM | | [work_fetch] work fetch start
6/8/2013 9:56:52 PM | | [work_fetch] choose_project() for NVIDIA: buffer_low: yes; sim_excluded_instances 0
6/8/2013 9:56:52 PM | | [work_fetch] no eligible project for NVIDIA
6/8/2013 9:56:52 PM | | [work_fetch] choose_project() for CPU: buffer_low: yes; sim_excluded_instances 0
6/8/2013 9:56:52 PM | | [work_fetch] no eligible project for CPU
6/8/2013 9:56:52 PM | | [work_fetch] ------- start work fetch state -------
6/8/2013 9:56:52 PM | | [work_fetch] target work buffer: 86400.00 + 8640.00 sec
6/8/2013 9:56:52 PM | | [work_fetch] --- project states ---
6/8/2013 9:56:52 PM | Einstein@Home | [work_fetch] REC 18554.473 prio -0.025633 can't req work: master URL fetch pending (backoff: 1249.61 sec)
6/8/2013 9:56:52 PM | | [work_fetch] --- state for CPU ---
6/8/2013 9:56:52 PM | | [work_fetch] shortfall 190080.00 nidle 2.00 saturated 0.00 busy 0.00
6/8/2013 9:56:52 PM | Einstein@Home | [work_fetch] fetch share 0.000 (blocked by prefs)
6/8/2013 9:56:52 PM | | [work_fetch] --- state for NVIDIA ---
6/8/2013 9:56:52 PM | | [work_fetch] shortfall 153493.06 nidle 0.00 saturated 31622.13 busy 0.00
6/8/2013 9:56:52 PM | Einstein@Home | [work_fetch] fetch share 0.000
6/8/2013 9:56:52 PM | | [work_fetch] ------- end work fetch state -------
6/8/2013 9:56:52 PM | | [work_fetch] No project chosen for work fetch
6/8/2013 9:56:53 PM | | [work_fetch] Request work fetch: application exited
6/8/2013 9:56:53 PM | Einstein@Home | Computation for task PA0069_01031_258_1 finished
6/8/2013 9:56:53 PM | Einstein@Home | Output file PA0069_01031_258_1_0 for task PA0069_01031_258_1 absent
6/8/2013 9:56:53 PM | Einstein@Home | Output file PA0069_01031_258_1_1 for task PA0069_01031_258_1 absent
6/8/2013 9:56:53 PM | Einstein@Home | Output file PA0069_01031_258_1_2 for task PA0069_01031_258_1 absent
6/8/2013 9:56:53 PM | Einstein@Home | Starting task PA0069_00941_246_1 using einsteinbinary_BRP5 version 133 (BRP4cuda32nv301) in slot 4
6/8/2013 9:56:56 PM | | [work_fetch] Request work fetch: application exited
6/8/2013 9:56:56 PM | Einstein@Home | [checkpoint] result PA0069_011A1_360_0 checkpointed
6/8/2013 9:56:56 PM | Einstein@Home | Computation for task PA0069_01191_300_1 finished
6/8/2013 9:56:56 PM | Einstein@Home | Output file PA0069_01191_300_1_0 for task PA0069_01191_300_1 absent
6/8/2013 9:56:56 PM | Einstein@Home | Output file PA0069_01191_300_1_1 for task PA0069_01191_300_1 absent
6/8/2013 9:56:56 PM | Einstein@Home | Output file PA0069_01191_300_1_2 for task PA0069_01191_300_1 absent
6/8/2013 9:56:56 PM | Einstein@Home | Starting task PA0069_01261_285_0 using einsteinbinary_BRP5 version 133 (BRP4cuda32nv301) in slot 2
6/8/2013 9:56:57 PM | | [work_fetch] Request work fetch: application exited
6/8/2013 9:56:57 PM | Einstein@Home | Computation for task PA0069_00941_246_1 finished
6/8/2013 9:56:57 PM | Einstein@Home | Output file PA0069_00941_246_1_0 for task PA0069_00941_246_1 absent
6/8/2013 9:56:57 PM | Einstein@Home | Output file PA0069_00941_246_1_1 for task PA0069_00941_246_1 absent
6/8/2013 9:56:57 PM | Einstein@Home | Output file PA0069_00941_246_1_2 for task PA0069_00941_246_1 absent
6/8/2013 9:56:57 PM | Einstein@Home | Starting task PA0069_01241_222_1 using einsteinbinary_BRP5 version 133 (BRP4cuda32nv301) in slot 4
6/8/2013 9:56:57 PM | | [work_fetch] work fetch start
6/8/2013 9:56:57 PM | | [work_fetch] choose_project() for NVIDIA: buffer_low: yes; sim_excluded_instances 0
6/8/2013 9:56:57 PM | | [work_fetch] no eligible project for NVIDIA
6/8/2013 9:56:57 PM | | [work_fetch] choose_project() for CPU: buffer_low: yes; sim_excluded_instances 0
6/8/2013 9:56:57 PM | | [work_fetch] no eligible project for CPU
6/8/2013 9:56:57 PM | | [work_fetch] ------- start work fetch state -------
6/8/2013 9:56:57 PM | | [work_fetch] target work buffer: 86400.00 + 8640.00 sec
6/8/2013 9:56:57 PM | | [work_fetch] --- project states ---
6/8/2013 9:56:57 PM | Einstein@Home | [work_fetch] REC 18559.362 prio -0.020749 can't req work: master URL fetch pending (backoff: 9796.72 sec)
6/8/2013 9:56:57 PM | | [work_fetch] --- state for CPU ---
6/8/2013 9:56:57 PM | | [work_fetch] shortfall 190080.00 nidle 2.00 saturated 0.00 busy 0.00
6/8/2013 9:56:57 PM | Einstein@Home | [work_fetch] fetch share 0.000 (blocked by prefs)
6/8/2013 9:56:57 PM | | [work_fetch] --- state for NVIDIA ---
6/8/2013 9:56:57 PM | | [work_fetch] shortfall 178573.59 nidle 0.00 saturated 25069.89 busy 0.00
6/8/2013 9:56:57 PM | Einstein@Home | [work_fetch] fetch share 0.000
6/8/2013 9:56:57 PM | | [work_fetch] ------- end work fetch state -------
6/8/2013 9:56:57 PM | | [work_fetch] No project chosen for work fetch
6/8/2013 9:56:59 PM | | [work_fetch] Request work fetch: application exited
6/8/2013 9:56:59 PM | Einstein@Home | Computation for task PA0069_01261_285_0 finished
6/8/2013 9:56:59 PM | Einstein@Home | Output file PA0069_01261_285_0_0 for task PA0069_01261_285_0 absent
6/8/2013 9:56:59 PM | Einstein@Home | Output file PA0069_01261_285_0_1 for task PA0069_01261_285_0 absent
6/8/2013 9:56:59 PM | Einstein@Home | Output file PA0069_01261_285_0_2 for task PA0069_01261_285_0 absent
6/8/2013 9:56:59 PM | Einstein@Home | Starting task PA0069_01351_339_0 using einsteinbinary_BRP5 version 133 (BRP4cuda32nv301) in slot 2
6/8/2013 9:57:00 PM | | [work_fetch] Request work fetch: application exited
6/8/2013 9:57:00 PM | Einstein@Home | Computation for task PA0069_01241_222_1 finished
6/8/2013 9:57:00 PM | Einstein@Home | Output file PA0069_01241_222_1_0 for task PA0069_01241_222_1 absent
6/8/2013 9:57:00 PM | Einstein@Home | Output file PA0069_01241_222_1_1 for task PA0069_01241_222_1 absent
6/8/2013 9:57:00 PM | Einstein@Home | Output file PA0069_01241_222_1_2 for task PA0069_01241_222_1 absent
6/8/2013 9:57:00 PM | Einstein@Home | Starting task PA0069_01351_147_1 using einsteinbinary_BRP5 version 133 (BRP4cuda32nv301) in slot 4
6/8/2013 9:57:03 PM | | [work_fetch] Request work fetch: application exited
6/8/2013 9:57:03 PM | Einstein@Home | Computation for task PA0069_01351_339_0 finished
6/8/2013 9:57:03 PM | Einstein@Home | Output file PA0069_01351_339_0_0 for task PA0069_01351_339_0 absent
6/8/2013 9:57:03 PM | Einstein@Home | Output file PA0069_01351_339_0_1 for task PA0069_01351_339_0 absent
6/8/2013 9:57:03 PM | Einstein@Home | Output file PA0069_01351_339_0_2 for task PA0069_01351_339_0 absent
6/8/2013 9:57:03 PM | Einstein@Home | Starting task PA0069_012A1_129_1 using einsteinbinary_BRP5 version 133 (BRP4cuda32nv301) in slot 2
6/8/2013 9:57:03 PM | | [work_fetch] work fetch start
6/8/2013 9:57:03 PM | | [work_fetch] choose_project() for NVIDIA: buffer_low: yes; sim_excluded_instances 0
6/8/2013 9:57:03 PM | | [work_fetch] no eligible project for NVIDIA
6/8/2013 9:57:03 PM | | [work_fetch] choose_project() for CPU: buffer_low: yes; sim_excluded_instances 0
6/8/2013 9:57:03 PM | | [work_fetch] no eligible project for CPU
6/8/2013 9:57:03 PM | | [work_fetch] ------- start work fetch state -------
6/8/2013 9:57:03 PM | | [work_fetch] target work buffer: 86400.00 + 8640.00 sec
6/8/2013 9:57:03 PM | | [work_fetch] --- project states ---
6/8/2013 9:57:03 PM | Einstein@Home | [work_fetch] REC 18565.380 prio -0.010982 can't req work: master URL fetch pending (backoff: 80.31 sec)
6/8/2013 9:57:03 PM | | [work_fetch] --- state for CPU ---
6/8/2013 9:57:03 PM | | [work_fetch] shortfall 190080.00 nidle 2.00 saturated 0.00 busy 0.00
6/8/2013 9:57:03 PM | Einstein@Home | [work_fetch] fetch share 0.000 (blocked by prefs)
6/8/2013 9:57:03 PM | | [work_fetch] --- state for NVIDIA ---
6/8/2013 9:57:03 PM | | [work_fetch] shortfall 228726.05 nidle 0.00 saturated 6542.74 busy 0.00
6/8/2013 9:57:03 PM | Einstein@Home | [work_fetch] fetch share 0.000
6/8/2013 9:57:03 PM | | [work_fetch] ------- end work fetch state -------
6/8/2013 9:57:03 PM | | [work_fetch] No project chosen for work fetch
6/8/2013 9:57:04 PM | | [work_fetch] Request work fetch: application exited
6/8/2013 9:57:04 PM | Einstein@Home | Computation for task PA0069_01351_147_1 finished
6/8/2013 9:57:04 PM | Einstein@Home | Output file PA0069_01351_147_1_0 for task PA0069_01351_147_1 absent
6/8/2013 9:57:04 PM | Einstein@Home | Output file PA0069_01351_147_1_1 for task PA0069_01351_147_1 absent
6/8/2013 9:57:04 PM | Einstein@Home | Output file PA0069_01351_147_1_2 for task PA0069_01351_147_1 absent
6/8/2013 9:57:04 PM | Einstein@Home | Starting task PA0069_01351_75_0 using einsteinbinary_BRP5 version 133 (BRP4cuda32nv301) in slot 4
6/8/2013 9:57:06 PM | | [work_fetch] Request work fetch: application exited
6/8/2013 9:57:06 PM | Einstein@Home | Computation for task PA0069_012A1_129_1 finished
6/8/2013 9:57:06 PM | Einstein@Home | Output file PA0069_012A1_129_1_0 for task PA0069_012A1_129_1 absent
6/8/2013 9:57:06 PM | Einstein@Home | Output file PA0069_012A1_129_1_1 for task PA0069_012A1_129_1 absent
6/8/2013 9:57:06 PM | Einstein@Home | Output file PA0069_012A1_129_1_2 for task PA0069_012A1_129_1 absent
6/8/2013 9:57:07 PM | | [work_fetch] Request work fetch: application exited
6/8/2013 9:57:07 PM | Einstein@Home | Computation for task PA0069_01351_75_0 finished
6/8/2013 9:57:07 PM | Einstein@Home | Output file PA0069_01351_75_0_0 for task PA0069_01351_75_0 absent
6/8/2013 9:57:07 PM | Einstein@Home | Output file PA0069_01351_75_0_1 for task PA0069_01351_75_0 absent
6/8/2013 9:57:07 PM | Einstein@Home | Output file PA0069_01351_75_0_2 for task PA0069_01351_75_0 absent
6/8/2013 9:57:09 PM | | [work_fetch] work fetch start
6/8/2013 9:57:09 PM | | [work_fetch] choose_project() for NVIDIA: buffer_low: yes; sim_excluded_instances 0
6/8/2013 9:57:09 PM | | [work_fetch] no eligible project for NVIDIA
6/8/2013 9:57:09 PM | | [work_fetch] choose_project() for CPU: buffer_low: yes; sim_excluded_instances 0
6/8/2013 9:57:09 PM | | [work_fetch] no eligible project for CPU
6/8/2013 9:57:09 PM | | [work_fetch] ------- start work fetch state -------
6/8/2013 9:57:09 PM | | [work_fetch] target work buffer: 86400.00 + 8640.00 sec
6/8/2013 9:57:09 PM | | [work_fetch] --- project states ---
6/8/2013 9:57:09 PM | Einstein@Home | [work_fetch] REC 18569.891 prio -0.006098 can't req work: master URL fetch pending (backoff: 942.18 sec)
6/8/2013 9:57:09 PM | | [work_fetch] --- state for CPU ---
6/8/2013 9:57:09 PM | | [work_fetch] shortfall 190080.00 nidle 2.00 saturated 0.00 busy 0.00
6/8/2013 9:57:09 PM | Einstein@Home | [work_fetch] fetch share 0.000 (blocked by prefs)
6/8/2013 9:57:09 PM | | [work_fetch] --- state for NVIDIA ---
6/8/2013 9:57:09 PM | | [work_fetch] shortfall 253807.60 nidle 1.00 saturated 0.00 busy 0.00
6/8/2013 9:57:09 PM | Einstein@Home | [work_fetch] fetch share 0.000
6/8/2013 9:57:09 PM | | [work_fetch] ------- end work fetch state -------
6/8/2013 9:57:09 PM | | [work_fetch] No project chosen for work fetch
6/8/2013 9:57:39 PM | Einstein@Home | [checkpoint] result PA0052_005A1_201_4 checkpointed
6/8/2013 9:57:39 PM | Einstein@Home | [checkpoint] result PA0069_01151_387_1 checkpointed
6/8/2013 9:57:40 PM | Einstein@Home | [checkpoint] result PA0069_01171_129_1 checkpointed
6/8/2013 9:57:56 PM | Einstein@Home | [checkpoint] result PA0069_011A1_360_0 checkpointed
6/8/2013 9:58:09 PM | | [work_fetch] work fetch start
6/8/2013 9:58:09 PM | | [work_fetch] choose_project() for NVIDIA: buffer_low: yes; sim_excluded_instances 0
6/8/2013 9:58:09 PM | | [work_fetch] no eligible project for NVIDIA
6/8/2013 9:58:09 PM | | [work_fetch] choose_project() for CPU: buffer_low: yes; sim_excluded_instances 0
6/8/2013 9:58:09 PM | | [work_fetch] no eligible project for CPU
6/8/2013 9:58:09 PM | | [work_fetch] ------- start work fetch state -------
6/8/2013 9:58:09 PM | | [work_fetch] target work buffer: 86400.00 + 8640.00 sec
6/8/2013 9:58:09 PM | | [work_fetch] --- project states ---
6/8/2013 9:58:09 PM | Einstein@Home | [work_fetch] REC 18603.984 prio -0.006077 can't req work: master URL fetch pending (backoff: 882.17 sec)
6/8/2013 9:58:09 PM | | [work_fetch] --- state for CPU ---
6/8/2013 9:58:09 PM | | [work_fetch] shortfall 190080.00 nidle 2.00 saturated 0.00 busy 0.00
6/8/2013 9:58:09 PM | Einstein@Home | [work_fetch] fetch share 0.000 (blocked by prefs)
6/8/2013 9:58:09 PM | | [work_fetch] --- state for NVIDIA ---
6/8/2013 9:58:09 PM | | [work_fetch] shortfall 253914.68 nidle 1.00 saturated 0.00 busy 0.00
6/8/2013 9:58:09 PM | Einstein@Home | [work_fetch] fetch share 0.000
6/8/2013 9:58:09 PM | | [work_fetch] ------- end work fetch state -------
6/8/2013 9:58:09 PM | | [work_fetch] No project chosen for work fetch
6/8/2013 9:58:39 PM | Einstein@Home | [checkpoint] result PA0052_005A1_201_4 checkpointed
6/8/2013 9:58:40 PM | Einstein@Home | [checkpoint] result PA0069_01151_387_1 checkpointed
6/8/2013 9:58:40 PM | Einstein@Home | [checkpoint] result PA0069_01171_129_1 checkpointed
6/8/2013 9:58:56 PM | Einstein@Home | [checkpoint] result PA0069_011A1_360_0 checkpointed
6/8/2013 9:59:09 PM | | [work_fetch] work fetch start
6/8/2013 9:59:09 PM | | [work_fetch] choose_project() for NVIDIA: buffer_low: yes; sim_excluded_instances 0
6/8/2013 9:59:09 PM | | [work_fetch] no eligible project for NVIDIA
6/8/2013 9:59:09 PM | | [work_fetch] choose_project() for CPU: buffer_low: yes; sim_excluded_instances 0
6/8/2013 9:59:09 PM | | [work_fetch] no eligible project for CPU
6/8/2013 9:59:09 PM | | [work_fetch] ------- start work fetch state -------
6/8/2013 9:59:09 PM | | [work_fetch] target work buffer: 86400.00 + 8640.00 sec
6/8/2013 9:59:09 PM | | [work_fetch] --- project states ---
6/8/2013 9:59:09 PM | Einstein@Home | [work_fetch] REC 18638.001 prio -0.006056 can't req work: master URL fetch pending (backoff: 822.15 sec)
6/8/2013 9:59:09 PM | | [work_fetch] --- state for CPU ---
6/8/2013 9:59:09 PM | | [work_fetch] shortfall 190080.00 nidle 2.00 saturated 0.00 busy 0.00
6/8/2013 9:59:09 PM | Einstein@Home | [work_fetch] fetch share 0.000 (blocked by prefs)
6/8/2013 9:59:09 PM | | [work_fetch] --- state for NVIDIA ---
6/8/2013 9:59:09 PM | | [work_fetch] shortfall 254021.34 nidle 1.00 saturated 0.00 busy 0.00
6/8/2013 9:59:09 PM | Einstein@Home | [work_fetch] fetch share 0.000
6/8/2013 9:59:09 PM | | [work_fetch] ------- end work fetch state -------
6/8/2013 9:59:09 PM | | [work_fetch] No project chosen for work fetch
(I clicked 'Update')
6/8/2013 9:59:36 PM | Einstein@Home | update requested by user
6/8/2013 9:59:36 PM | | [work_fetch] Request work fetch: project updated by user
6/8/2013 9:59:37 PM | | [work_fetch] Request work fetch: Backoff ended for Einstein@Home
6/8/2013 9:59:39 PM | Einstein@Home | Fetching scheduler list
6/8/2013 9:59:40 PM | Einstein@Home | [checkpoint] result PA0052_005A1_201_4 checkpointed
6/8/2013 9:59:40 PM | Einstein@Home | [checkpoint] result PA0069_01151_387_1 checkpointed
6/8/2013 9:59:40 PM | Einstein@Home | [checkpoint] result PA0069_01171_129_1 checkpointed
6/8/2013 9:59:40 PM | Einstein@Home | Master file download succeeded
6/8/2013 9:59:40 PM | | [work_fetch] Request work fetch: Master fetch complete
6/8/2013 9:59:45 PM | | [work_fetch] work fetch start
6/8/2013 9:59:45 PM | | [work_fetch] choose_project() for NVIDIA: buffer_low: yes; sim_excluded_instances 0
6/8/2013 9:59:45 PM | Einstein@Home | [work_fetch] set_request() for NVIDIA: ninst 3 nused_total 2.000000 nidle_now 1.000000 fetch share 1.000000 req_inst 1.000000 req_secs 254085.783157
6/8/2013 9:59:45 PM | | [work_fetch] ------- start work fetch state -------
6/8/2013 9:59:45 PM | | [work_fetch] target work buffer: 86400.00 + 8640.00 sec
6/8/2013 9:59:45 PM | | [work_fetch] --- project states ---
6/8/2013 9:59:45 PM | Einstein@Home | [work_fetch] REC 18638.001 prio -1.006044 can req work
6/8/2013 9:59:45 PM | | [work_fetch] --- state for CPU ---
6/8/2013 9:59:45 PM | | [work_fetch] shortfall 190080.00 nidle 2.00 saturated 0.00 busy 0.00
6/8/2013 9:59:45 PM | Einstein@Home | [work_fetch] fetch share 0.000 (blocked by prefs)
6/8/2013 9:59:45 PM | | [work_fetch] --- state for NVIDIA ---
6/8/2013 9:59:45 PM | | [work_fetch] shortfall 254085.78 nidle 1.00 saturated 0.00 busy 0.00
6/8/2013 9:59:45 PM | Einstein@Home | [work_fetch] fetch share 1.000
6/8/2013 9:59:45 PM | | [work_fetch] ------- end work fetch state -------
6/8/2013 9:59:45 PM | Einstein@Home | [work_fetch] request: CPU (0.00 sec, 0.00 inst) NVIDIA (254085.78 sec, 1.00 inst)
6/8/2013 9:59:45 PM | Einstein@Home | Sending scheduler request: Requested by user.
6/8/2013 9:59:45 PM | Einstein@Home | Reporting 24 completed tasks
6/8/2013 9:59:45 PM | Einstein@Home | Requesting new tasks for NVIDIA
6/8/2013 9:59:48 PM | Einstein@Home | Scheduler request completed: got 0 new tasks
6/8/2013 9:59:48 PM | Einstein@Home | No work sent
6/8/2013 9:59:48 PM | Einstein@Home | (reached daily quota of 64 tasks)
6/8/2013 9:59:48 PM | Einstein@Home | Project has no jobs available
6/8/2013 9:59:48 PM | Einstein@Home | [work_fetch] backing off NVIDIA 883 sec
6/8/2013 9:59:48 PM | | [work_fetch] Request work fetch: RPC complete

tbret
tbret
Joined: 12 Mar 05
Posts: 2,115
Credit: 4,861,241,466
RAC: 64,570

RE: can you PLEASE check

Quote:

can you PLEASE check to see if you have any Windows Update installs around the same time as the issues started?)

I cannot say with certainty that in every case that one of my machines started the over-fetch that no Windows updates had been applied "around the same time."

I can say, with complete certainty, that in *all* cases this was not true. A couple of those computers have Windows updates turned completely off.

What I am really not sure-of is on what date I updated the NVIDIA drivers and installed the current version of BOINC.

Fortunately with five computers, different OS versions, etc, I can report that nothing went "apparently" wrong with all of them at once. You know what I mean - I didn't have to restart them all due to driver crashes or what-have-you.

Unfortunately, I just went into my "radio blackout" two hours while those computers are on the other side of the moon. (Ok, so really the power to the router is interrupted this time every night; the computers are remote and I have it set up so the router reboots this time every day... it's a long, long story.)

By the time the power comes back up, I will be asleep. I will try to remember to look tomorrow.

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 2,143
Credit: 2,945,427,524
RAC: 692,947

RE: ... the power to the

Quote:
... the power to the router is interrupted this time every night; the computers are remote and I have it set up so the router reboots this time every day...


That could be an interesting extra variable that Jacob could replicate in his test environment. He was asking me yesterday (off-board) if I knew of any 'trigger' action which might cause BOINC to 'flip' into this alternative mode. I didn't, but suddenly losing comms might be one.

Tbret, could you describe your networking setup in a little more detail, please? I'm particularly interested in DNS.

With a normal home router and DHCP, the normal default setting is for the attached computers to be set to use the router as their primary (often only) DNS server - DNS address == gateway address. So when the router goes off, DNS goes off, and any attempt to reach a url gets "can't resolve hostname".

In a business environment - which I believe your remote machines are in - do you have a separate server running a local DNS server (that would required if you are running Active Directory on the domain, for example)? In that scenario, it is likely that any attempt by BOINC to contact, say, Einstein would successfully resolve the url into a (cached) IP address, but fail at the second stage with a 'transient HTTP error' when trying to contact the server on that IP address.

The reason I mention this is that I believe that the DNS lookup client used by BOINC is 'blocking' - once BOINC has issued a DNS lookup request, and while it is waiting for a reply, no other communications can take place - even between the Manager and the core client. Normally, of course, the delay is milliseconds at most, and makes no difference at all - but if the DNS service were to die at a critical moment, then the core client's unresponsiveness could extend to a macroscopic interval, and that could be the trigger that Jacob was seeking. One possible connection between the comms theory, and the 'missing GPU' theory would be if you could find any evidence of a GPU task being terminated with a 'no heartbeat' error around the time of the router powerdown: killing the app like that can in turn lead to a driver restart, and a driver restart would interrupt tasks running on the other GPUs too...

Once all this research is over, a possible workround might be to use BOINC's internal networking scheduler to tell it not to even attempt to contact projects during the router power-down, and a suitable guard period on either side (I imagine that the router is in a pretty indeterminate state during reboot - it might appear to be available, and accept DNS lookup requests - but not be able to serve them from internal cache, and have to wait until it has re-established contact with an external forwarder).

But please don't make any changes to your network or BOINC configuration until we've got as much data as we can. For reasons which I'll explain to Jacob privately, I'd really like to get to whatever the underlying cause of this problem is.

Edit - it would be helpful if you would tell us the timings of your network blackout, either in local time and timezone, or in UTC, so that we can try and compare them with the server log evidence.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.