Can't get CPU tasks with client 7.0.27

RvP_LaN
RvP_LaN
Joined: 15 Apr 07
Posts: 12
Credit: 8150236
RAC: 0
Topic 196318

Hello,

Since I update to Boinc client 7.0.25, now 7.0.27, my Windows XP 64bits hosts seems to never get anymore CPU tasks...
On the Boinc client, I reset my minimum work to 5 days + 0.1 day reserve.
I didn't change my Einstein's prefs.
But, in the Boinc log, I notice this new remark: "no work sent, see scheduler log on ..."

So I follow the link for the host log, and there, some parameters seem weird to me and I don't know how to fix that.

[send] CPU: req 0.00 sec, 0.00 instances; est delay 0.00
[send] CUDA: req 440640.00 sec, 1.00 instances; est delay 0.00

The client seems to only ask for GPU work.

[version] Don't need CPU jobs, skipping version 113 for einstein_S6LV1 (SSE2)
[version] Don't need CPU jobs, skipping version 23 for hsgamma_FGRP1 ()

Why it's so? Why wouldn't it need CPU jobs???

[version] parsed project prefs setting 'also_run_cpu' : true : 1.000000
[version] project prefs setting 'also_run_cpu' (1.000000) prevents using plan class.

This one is super tricky for me to understand...
I try to set/unset this into Einstein's prefs: no change. No work is sent.
In my understanding 'also_run_cpu' should be set to 'true/1', but on the pref's setting, if I quote "Run CPU versions of applications for which GPU versions are available", the resulting value of 'also_run_cpu' in the local xml file is 0???

I also resetted the project, since it's a common advice when moving to new client version, but with no change.

If my GPU is detected, but eventually lack what's needed for GPU tasks, why the scheduler doesn't ask for CPU tasks anymore?

Bikeman (Heinz-Bernd Eggenstein)
Bikeman (Heinz-...
Moderator
Joined: 28 Aug 06
Posts: 3522
Credit: 691540657
RAC: 241468

Can't get CPU tasks with client 7.0.27

Hi

You could tell your BOINC client to write out log messages about the CPU scheduling decisions: create a file cc_config.xml in the BOINC data directory with this in it:

[pre]

1

[/pre]

See http://boinc.berkeley.edu/wiki/Client_configuration for more debugging flags.

Cheers
HB

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4274
Credit: 245308202
RAC: 11885

The numeric value coding of

The numeric value coding of "" is confusing, partly because of the oddities of how BOINC handles project-specific preferences and defaults, and partly because it is incorrectly named.

After unchecking the checkbox beside "Run CPU versions of applications for which GPU versions are available" and saving your preferences on the web page, you have to update the Einstein@Home project (preferences) on you Client. Did you do that? If not, try that.

BM

BM

Bikeman (Heinz-Bernd Eggenstein)
Bikeman (Heinz-...
Moderator
Joined: 28 Aug 06
Posts: 3522
Credit: 691540657
RAC: 241468

But anyway, setting this

But anyway, setting this value to true or false would not explain no CPU task getting scheduled at all. It's just a setting that affects searches trhat support both GPU and CPU, like the BRP4 search. The Gravitational Wave search and the Fermi LAT gamma ray pulsar search both don't have GPU versions, so they should not be affected by this setting.

Cheers
HB

RvP_LaN
RvP_LaN
Joined: 15 Apr 07
Posts: 12
Credit: 8150236
RAC: 0

Hi, RE: You could tell

Hi,

Quote:
You could tell your BOINC client to write out log messages about the CPU scheduling decisions: create a file cc_config.xml in the BOINC data directory with this in it

Thx for pointing this out. In fact, before posting to Einstein's forum, I did my preliminary searches at Boinc's forum. There is a lot a talk about new scheduler... First thing I did was to create this cc_config.xml file with sched debug options in it.

What commonly comes back with the new scheduler is to leave it alone for a week or more (!!!) and let it decide "what's good to your host"!!! Huh, ok. But not so ok.

Even if I'm the first to promote that "common user" should let the client crunch and shouldn't dive into geek or skilled people configuration; even if I don't watch all week my hosts crunching; from time to time, when I check, I'm happy to see that all projects with available WU send them to my hosts. I also know that there always an adaptation period between major versions of Boinc's client, but since I crunch, nothing like that with absolut no response from projects while there's job available...

Since my first post, nothing from Einstein, but Albert@home send me 5 WUs! As they have the same config items in account_*.xml. By comparing and filtering the debug infos between Einstein and Albert, I think I get it. It was described in Boinc's forum.

The debug option give this (it's filtered for just Einstein@home"):

[work_fetch] ------- start work fetch state -------
[work_fetch] target work buffer: 432000.00 + 8640.00 sec
[work_fetch] REC 128.046 priority -1.306516
[work_fetch] CPU: shortfall 0.00 nidle 0.00 saturated 472108.93 busy 0.00
[work_fetch] CPU: fetch share 0.071 rsc backoff (dt 0.00, inc 0.00)
[work_fetch] NVIDIA: shortfall 440640.00 nidle 1.00 saturated 0.00 busy 0.00
[work_fetch] NVIDIA: fetch share 0.000 rsc backoff (dt 48182.15, inc 76800.00)
[work_fetch] ------- end work fetch state -------


Numbers that I'm able to recognize are "targer work buffer", according to my work reserve settings: 5 days x 24h x 3600s = 432000; 0.1 day reserve = 8640s. Then the "saturated" amount of 472108.93 which slowly but surely goes down.

For Albert, as soon as this number goes below the "target work buffer", WU are sent:

[work_fetch] ------- start work fetch state -------
[work_fetch] target work buffer: 432000.00 + 8640.00 sec
[work_fetch] REC 61.051 priority -0.628076
[work_fetch] CPU: shortfall 8854.01 nidle 0.00 saturated 431785.99 busy 109008.03
[work_fetch] CPU: fetch share 0.109 rsc backoff (dt 0.00, inc 0.00)
[work_fetch] NVIDIA: shortfall 440640.00 nidle 1.00 saturated 0.00 busy 0.00
[work_fetch] NVIDIA: fetch share 0.000 rsc backoff (dt 3818.89, inc 9600.00)
[work_fetch] ------- end work fetch state -------
[work_fetch] request: CPU (8854.01 sec, 0.00 inst) NVIDIA (0.00 sec, 0.00 inst)
Sending scheduler request: To fetch work.
Requesting new tasks for CPU
[work_fetch] Request work fetch: Backoff ended for EDGeS@Home
Scheduler request completed: got 1 new tasks
[work_fetch] Request work fetch: RPC complete
Started download of p2030.20111210.G37.27-00.02.S.b6s0g0.00000_1080.bin4
Started download of p2030.20111210.G37.27-00.02.S.b6s0g0.00000_1081.bin4


Other projects continue to send more than twenty WUs at a time... So, with the settings about minimun work and minimum+reserve work, I'm wondering if the scheduler estimates that with projects having already sent 20 WUs, it is saturate of GLOBAL work, and don't ask for more jobs. Yeah ok, but what about other projects?!?

I don't understand yet if this new scheduler is as "fair" than with 6.x versions... Don't misinterpret my words: I'm sure that Boinc's developpers do care about equality between projects. But with previous versions, it seems that ALL project where getting jobs inside the work reserve settings. With this new one, it seems that EACH project threshold for getting jobs is the minimum days work reserve??? I would rather have understand this as: minimum work reserve for ALL projects? Wouldn't I?

So does it mean that it's better to set minimum work reserve to ONE day?

Cheers

RvP_LaN
RvP_LaN
Joined: 15 Apr 07
Posts: 12
Credit: 8150236
RAC: 0

Update for previous

Update for previous post...
Finally I really don't get it about the new scheduler, the minimum work reserve and the "saturate" notion...

I wondered what will happen if I add more work reserve, in order for the "target work buffer" value to be over the "saturate" amount calculated for Einstein. So I put 5.5 days work min + 0.5 days reserve.

And in the sched debug log, a few seconds after modifying the settings, we can see that the scheduler decided, to raise the "saturate" value for Einstein (from 473851 to 496902)... So Einstein stays "backoff"... In the meanwhile, raising the work reserve amount allow some OTHER projects to get WUs.

[work_fetch] ------- start work fetch state -------
[work_fetch] target work buffer: 475200.00 + 43200.00 sec
[work_fetch] REC 123.142 priority -1.206303
[work_fetch] CPU: shortfall 82231.37 nidle 0.00 saturated 473851.35 busy 9087.08
[work_fetch] CPU: fetch share 0.105 rsc backoff (dt 0.00, inc 0.00)
[work_fetch] NVIDIA: shortfall 518400.00 nidle 1.00 saturated 0.00 busy 0.00
[work_fetch] NVIDIA: fetch share 0.000 rsc backoff (dt 42862.21, inc 86400.00)
[work_fetch] ------- end work fetch state -------
[work_fetch] Request work fetch: RPC complete
[work_fetch] work fetch start
[work_fetch] ------- start work fetch state -------
[work_fetch] target work buffer: 475200.00 + 43200.00 sec
[work_fetch] REC 123.142 priority -1.174195
[work_fetch] CPU: shortfall 47428.46 nidle 0.00 saturated 496902.03 busy 8973.89
[work_fetch] CPU: fetch share 0.117 rsc backoff (dt 0.00, inc 0.00)
[work_fetch] NVIDIA: shortfall 518400.00 nidle 1.00 saturated 0.00 busy 0.00
[work_fetch] NVIDIA: fetch share 0.000 rsc backoff (dt 42856.11, inc 86400.00)
[work_fetch] ------- end work fetch state -------


And I still have this message from Einstein server:
"no work sent, see scheduler log on ..." (on several hosts with client 7.0.2x)

Any idea is welcome!
Cheers

Gundolf Jahn
Gundolf Jahn
Joined: 1 Mar 05
Posts: 1079
Credit: 341280
RAC: 0

RE: And I still have this

Quote:
And I still have this message from Einstein server:
"no work sent, see scheduler log on ..." (on several hosts with client 7.0.2x)


And what does the scheduler log tell you? (We can't see it, since your computers are hidden.)

Gruß,
Gundolf

Computer sind nicht alles im Leben. (Kleiner Scherz)

RvP_LaN
RvP_LaN
Joined: 15 Apr 07
Posts: 12
Credit: 8150236
RAC: 0

Hi, RE: And what

Hi,

Quote:
And what does the scheduler log tell you?


I copied extracts and show them in the first message of the post.
This comes from the link proposed by Einstein's server in the Boinc standard log.
But here follows the full version:

2012-05-12 03:57:18.8209 [PID=13987]   Request: [USER#xxxxx] [HOST#4279374] [IP xxx.xxx.xxx.22] client 7.0.27
2012-05-12 03:57:18.8311 [PID=13987]    [send] effective_ncpus 4 max_jobs_on_host_cpu 999999 max_jobs_on_host 999999
2012-05-12 03:57:18.8311 [PID=13987]    [send] effective_ngpus 1 max_jobs_on_host_gpu 999999
2012-05-12 03:57:18.8311 [PID=13987]    [send] Not using matchmaker scheduling; Not using EDF sim
2012-05-12 03:57:18.8311 [PID=13987]    [send] CPU: req 0.00 sec, 0.00 instances; est delay 0.00
2012-05-12 03:57:18.8311 [PID=13987]    [send] CUDA: req 129600.00 sec, 0.38 instances; est delay 0.00
2012-05-12 03:57:18.8311 [PID=13987]    [send] work_req_seconds: 0.00 secs
2012-05-12 03:57:18.8311 [PID=13987]    [send] available disk 1.52 GB, work_buf_min 86400
2012-05-12 03:57:18.8312 [PID=13987]    [send] active_frac 0.999941 on_frac 0.996915 DCF 1.000000
2012-05-12 03:57:18.8318 [PID=13987]    [send] [HOST#4279374] is reliable
2012-05-12 03:57:18.8318 [PID=13987]    [send] set_trust: random choice for error rate 0.008100: no
2012-05-12 03:57:18.9497 [PID=13987]    [version] Checking plan class 'SSE2'
2012-05-12 03:57:18.9501 [PID=13987]    [version] reading plan classes from file '../plan_class_spec.xml'
2012-05-12 03:57:18.9501 [PID=13987]    [version] Don't need CPU jobs, skipping version 113 for einstein_S6LV1 (SSE2)
2012-05-12 03:57:18.9501 [PID=13987]    [version] no app version available: APP#20 (einstein_S6LV1) PLATFORM#2 (windows_intelx86) min_version 0
2012-05-12 03:57:18.9625 [PID=13987]    [version] Checking plan class 'BRP4SSE'
2012-05-12 03:57:18.9625 [PID=13987]    [version] parsed project prefs setting 'also_run_cpu' : true : 0.000000
2012-05-12 03:57:18.9625 [PID=13987]    [version] Don't need CPU jobs, skipping version 122 for einsteinbinary_BRP4 (BRP4SSE)
2012-05-12 03:57:18.9626 [PID=13987]    [version] Checking plan class 'BRP4cuda32'
2012-05-12 03:57:18.9626 [PID=13987]    [version] parsed project prefs setting 'gpu_util_brp' : true : 1.000000
2012-05-12 03:57:18.9626 [PID=13987]    [version] GPU RAM required max: 314572800.000000, supplied: 268107776.000000
2012-05-12 03:57:18.9626 [PID=13987]    [version] Checking plan class 'BRP4cuda32nv301'
2012-05-12 03:57:18.9626 [PID=13987]    [version] parsed project prefs setting 'gpu_util_brp' : true : 1.000000
2012-05-12 03:57:18.9626 [PID=13987]    [version] driver version required min: -30100, supplied: 28558
2012-05-12 03:57:18.9626 [PID=13987]    [version] no app version available: APP#19 (einsteinbinary_BRP4) PLATFORM#2 (windows_intelx86) min_version 0
2012-05-12 03:57:18.9626 [PID=13987]    [version] Don't need CPU jobs, skipping version 23 for hsgamma_FGRP1 ()
2012-05-12 03:57:18.9626 [PID=13987]    [version] no app version available: APP#17 (hsgamma_FGRP1) PLATFORM#2 (windows_intelx86) min_version 0
2012-05-12 03:57:18.9661 [PID=13987]    [send] [HOST#4279374] is looking for work from a non-preferred application
2012-05-12 03:57:18.9715 [PID=13987] [debug]   [HOST#4279374] MSG(high) No work sent
2012-05-12 03:57:18.9716 [PID=13987] [debug]   [HOST#4279374] MSG(high) see scheduler log messages on http://einstein.phys.uwm.edu//host_sched_logs/4279/4279374
2012-05-12 03:57:18.9716 [PID=13987]    Sending reply to [HOST#4279374]: 0 results, delay req 60.00
2012-05-12 03:57:18.9719 [PID=13987]    Scheduler ran 0.158 seconds


I still wonder why there's "cpu req(uest) 0s" and "don't need CPU jobs".
That's why I tried to explore the debug sched from client.
I really feel that the Boinc's scheduler don't ask for CPU job...
I even try to disable GPU in the cc_config.xml to be sure that it falls back for CPU jobs. But yet with no luck...

Following different leads, I test other settings and set my work buffer to min=0.20d and reserver=0.20d. No more high priority jobs, but still no job from projects which usually sent WU very regularly.

Cheers

Claggy
Claggy
Joined: 29 Dec 06
Posts: 560
Credit: 2694028
RAC: 0

RE: Hi, RE: And what

Quote:

Hi,

Quote:
And what does the scheduler log tell you?

I copied extracts and show them in the first message of the post.
This comes from the link proposed by Einstein's server in the Boinc standard log.
But here follows the full version:

Quote:
2012-05-12 03:57:18.8209 [PID=13987] Request: [USER#xxxxx] [HOST#4279374] [IP xxx.xxx.xxx.22] client 7.0.27
2012-05-12 03:57:18.8311 [PID=13987] [send] effective_ncpus 4 max_jobs_on_host_cpu 999999 max_jobs_on_host 999999
2012-05-12 03:57:18.8311 [PID=13987] [send] effective_ngpus 1 max_jobs_on_host_gpu 999999
2012-05-12 03:57:18.8311 [PID=13987] [send] Not using matchmaker scheduling; Not using EDF sim
2012-05-12 03:57:18.8311 [PID=13987] [send] CPU: req 0.00 sec, 0.00 instances; est delay 0.00
2012-05-12 03:57:18.8311 [PID=13987] [send] CUDA: req 129600.00 sec, 0.38 instances; est delay 0.00
2012-05-12 03:57:18.8311 [PID=13987] [send] work_req_seconds: 0.00 secs
2012-05-12 03:57:18.8311 [PID=13987] [send] available disk 1.52 GB, work_buf_min 86400
2012-05-12 03:57:18.8312 [PID=13987] [send] active_frac 0.999941 on_frac 0.996915 DCF 1.000000
2012-05-12 03:57:18.8318 [PID=13987] [send] [HOST#4279374] is reliable
2012-05-12 03:57:18.8318 [PID=13987] [send] set_trust: random choice for error rate 0.008100: no
2012-05-12 03:57:18.9497 [PID=13987] [version] Checking plan class 'SSE2'
2012-05-12 03:57:18.9501 [PID=13987] [version] reading plan classes from file '../plan_class_spec.xml'
2012-05-12 03:57:18.9501 [PID=13987] [version] Don't need CPU jobs, skipping version 113 for einstein_S6LV1 (SSE2)
2012-05-12 03:57:18.9501 [PID=13987] [version] no app version available: APP#20 (einstein_S6LV1) PLATFORM#2 (windows_intelx86) min_version 0
2012-05-12 03:57:18.9625 [PID=13987] [version] Checking plan class 'BRP4SSE'
2012-05-12 03:57:18.9625 [PID=13987] [version] parsed project prefs setting 'also_run_cpu' : true : 0.000000
2012-05-12 03:57:18.9625 [PID=13987] [version] Don't need CPU jobs, skipping version 122 for einsteinbinary_BRP4 (BRP4SSE)
2012-05-12 03:57:18.9626 [PID=13987] [version] Checking plan class 'BRP4cuda32'
2012-05-12 03:57:18.9626 [PID=13987] [version] parsed project prefs setting 'gpu_util_brp' : true : 1.000000
2012-05-12 03:57:18.9626 [PID=13987] [version] GPU RAM required max: 314572800.000000, supplied: 268107776.000000
2012-05-12 03:57:18.9626 [PID=13987] [version] Checking plan class 'BRP4cuda32nv301'
2012-05-12 03:57:18.9626 [PID=13987] [version] parsed project prefs setting 'gpu_util_brp' : true : 1.000000
2012-05-12 03:57:18.9626 [PID=13987] [version] driver version required min: -30100, supplied: 28558
2012-05-12 03:57:18.9626 [PID=13987] [version] no app version available: APP#19 (einsteinbinary_BRP4) PLATFORM#2 (windows_intelx86) min_version 0
2012-05-12 03:57:18.9626 [PID=13987] [version] Don't need CPU jobs, skipping version 23 for hsgamma_FGRP1 ()
2012-05-12 03:57:18.9626 [PID=13987] [version] no app version available: APP#17 (hsgamma_FGRP1) PLATFORM#2 (windows_intelx86) min_version 0
2012-05-12 03:57:18.9661 [PID=13987] [send] [HOST#4279374] is looking for work from a non-preferred application
2012-05-12 03:57:18.9715 [PID=13987] [debug] [HOST#4279374] MSG(high) No work sent
2012-05-12 03:57:18.9716 [PID=13987] [debug] [HOST#4279374] MSG(high) see scheduler log messages on http://einstein.phys.uwm.edu//host_sched_logs/4279/4279374
2012-05-12 03:57:18.9716 [PID=13987] Sending reply to [HOST#4279374]: 0 results, delay req 60.00
2012-05-12 03:57:18.9719 [PID=13987] Scheduler ran 0.158 seconds

I still wonder why there's "cpu req(uest) 0s" and "don't need CPU jobs".
That's why I tried to explore the debug sched from client.
I really feel that the Boinc's scheduler don't ask for CPU job...
I even try to disable GPU in the cc_config.xml to be sure that it falls back for CPU jobs. But yet with no luck...

Following different leads, I test other settings and set my work buffer to min=0.20d and reserver=0.20d. No more high priority jobs, but still no job from projects which usually sent WU very regularly.

Cheers


Your 256Mb 8600GT on host 4279374 doesn't have enough memory for the BRP4cuda32 class plan, and doesn't have new enough drivers for the BRP4cuda32nv301 class plan, even if it did, you'll then find it doesn't have enough memory for that app eithier,

Is your cache full of CPU work from other projects?

Claggy

5pot
5pot
Joined: 8 Apr 12
Posts: 107
Credit: 7577619
RAC: 0

Have you tries deselecting

Have you tries deselecting gpu tasks under Einstein preferences? Maybe this would try and "force" boinc to grab CPU tasks for this project.

Gundolf Jahn
Gundolf Jahn
Joined: 1 Mar 05
Posts: 1079
Credit: 341280
RAC: 0

RE: Maybe this would try

Quote:
Maybe this would try and "force" boinc to grab CPU tasks for this project.


Not if the client isn't asking for them (as is the case).

Gruß,
Gundolf
[edit]That works if the client asks for both but the feeder doesn't have enough tasks ready to satisfy the request, as it often happens over at SETI.[/edit]

Computer sind nicht alles im Leben. (Kleiner Scherz)

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.