Application selection

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4273
Credit: 245219476
RAC: 12969

In accordance to what has

In accordance to what has been discussed here I will on Wednesday (or maybe Thursday)

- disable or remove "If no work for selected applications is available, accept work from other applications?" from the project settings

- run a DB query that removes the corresponding tag from people's project preferences (it never really worked as it should)

- run another query that adds the three new applications (BRP5, FGRP2 and S6BucketLVE) to the settings of all the people that ever made a manual selection of Apps. Sorry for the delay - people who already opted out intentionally on FGRP2 or S6BucketLVE will have to do this again.

An explicit opt-out of getting added new Apps as these are published will not be implemented. Whenever a new App is published, it will be added to the selection.

BM

BM

Horacio
Horacio
Joined: 3 Oct 11
Posts: 205
Credit: 80557243
RAC: 0

Thanks it's good to know that

Thanks it's good to know that we will need to check the settings in the next days.

Quote:
- disable or remove "If no work for selected applications is available, accept work from other applications?" from the project settings


Doubt: How will this work?
I guess, it is going to be like if it were set at no... due to the mixed schedulling thing, but I'm not sure if I've understood it well...

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4273
Credit: 245219476
RAC: 12969

RE: Thanks it's good to

Quote:

Thanks it's good to know that we will need to check the settings in the next days.

Quote:
- disable or remove "If no work for selected applications is available, accept work from other applications?" from the project settings

Doubt: How will this work?
I guess, it is going to be like if it were set at no... due to the mixed schedulling thing, but I'm not sure if I've understood it well...

When set to "yes", a couple of things did happen, most of which were unexpected and confusing.

Thus effectively it will be set to "no" without the possibility for the participant to change it.

Actually I jut did exactly that (removed the settings option and also scratched the setting from the DB).

BM

BM

Horacio
Horacio
Joined: 3 Oct 11
Posts: 205
Credit: 80557243
RAC: 0

Thanks! I've got it, now.

Thanks! I've got it, now.

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4273
Credit: 245219476
RAC: 12969

RE: - run another query

Quote:
- run another query that adds the three new applications (BRP5, FGRP2 and S6BucketLVE) to the settings of all the people that ever made a manual selection of Apps. Sorry for the delay - people who already opted out intentionally on FGRP2 or S6BucketLVE will have to do this again.

Done.

BM

BM

|MatMan|
|MatMan|
Joined: 22 Jan 05
Posts: 24
Credit: 249005261
RAC: 0

While you are working on the

While you are working on the app selection how about the ability to choose per app which resource (CPU, AMD-GPU, nVidia-GPU, Intel-GPU) it is allowed to use? The motivation behind this that e.g. the AMD-GPU app does not run very well if the CPU load is high which is not the case for the nVidia-GPU app. So it can be more efficient to run only certain types of apps on certain resources.
I know this is a special case as I'm mixing GPUs (nVidia and AMD) in some of my systems which is not common for the average user (but it might be if we get an Intel-GPU app). And I'm also well aware that this is most probably too much to ask :D

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5845
Credit: 109973503676
RAC: 29775493

RE: RE: - run another

Quote:
Quote:
- run another query that adds the three new applications (BRP5, FGRP2 and S6BucketLVE) to the settings of all the people that ever made a manual selection of Apps. Sorry for the delay - people who already opted out intentionally on FGRP2 or S6BucketLVE will have to do this again.

Done.

BM


I got caught by this by (for ongoing personal reasons) not being able to pay sufficient attention to understand the full implications of what was about to happen.

I have monthly bandwidth problems that have been largely alleviated by the much appreciated compression of the BRP4 data files. That allowed me to add quite a few more GPUs without going too horrendously over my limits :-). I also try to keep a lid on monthly downloads by using one of the available 'venues' to 'exclude' S6BucketLVE from the CPUs of all my hosts that have capable GPUs, opting for the 'less download intensive' FGRP2 tasks instead.

Of course, being 'otherwise occupied', I didn't really pick up immediately on the full significance of "people who already opted out intentionally on FGRP2 or S6BucketLVE will have to do this again."

The upshot of this was that all the hosts in the 'venue' where LVE had been deselected were now turned back on again. Consequently, large numbers of copies of the LVE executable and large numbers of sets of large data files were being downloaded. Fortunately, I had preferences set to allow network access only in my ISP's 'off-peak' period where I have a much larger monthly allowance. This happens to be in the dead of the night here so I only got to see it the next morning.

So, having worked out what was happening, I changed the preferences for the venue in question to once again exclude LVE. That seemed to work OK as the hosts I looked at seemed to be getting only FGRP2 tasks from that point onwards. Then the 'off-peak' period ended and there were no further work requests during the 'peak' period when network access was again disabled.

As I compose this, I'm now in the next 'off-peak' period and new work is being requested to fill caches. There is no problem with BRP4 tasks but I'm not getting any CPU work at all. Here are the app selection preferences for this particular 'venue'.

Run only the selected applications
Binary Radio Pulsar Search (Arecibo): yes
Binary Radio Pulsar Search (Perseus Arm Survey): yes
Gravitational Wave S6 LineVeto search (extended): no
Gamma-ray pulsar search #2: yes

Here is the full scheduler log for a work request for this host which is a member of the 'venue' in question.

2013-03-28 21:42:17.2055 [PID=24852]   Request: [USER#xxxxx] [HOST#1598090] [IP xxx.xxx.xxx.214] client 6.12.43
2013-03-28 21:42:17.2169 [PID=24852]    [send] effective_ncpus 4 max_jobs_on_host_cpu 999999 max_jobs_on_host 999999
2013-03-28 21:42:17.2169 [PID=24852]    [send] effective_ngpus 1 max_jobs_on_host_gpu 999999
2013-03-28 21:42:17.2169 [PID=24852]    [send] Not using matchmaker scheduling; Not using EDF sim
2013-03-28 21:42:17.2169 [PID=24852]    [send] CPU: req 295770.58 sec, 0.00 instances; est delay 0.00
2013-03-28 21:42:17.2170 [PID=24852]    [send] CUDA: req 0.00 sec, 0.00 instances; est delay 0.00
2013-03-28 21:42:17.2170 [PID=24852]    [send] work_req_seconds: 295770.58 secs
2013-03-28 21:42:17.2170 [PID=24852]    [send] available disk 7.90 GB, work_buf_min 864
2013-03-28 21:42:17.2170 [PID=24852]    [send] active_frac 0.999964 on_frac 0.994101 DCF 1.041759
2013-03-28 21:42:17.2740 [PID=24852]    [send] [HOST#1598090] is reliable
2013-03-28 21:42:17.2741 [PID=24852]    [send] set_trust: random choice for error rate 0.000010: yes
2013-03-28 21:42:17.4621 [PID=24852]    [version]: App 'einstein_S6BucketLVE' (22) not selected
2013-03-28 21:42:17.4751 [PID=24852]    [version] Checking plan class 'BRP4cuda32nv270'
2013-03-28 21:42:17.4756 [PID=24852]    [version] reading plan classes from file '/BOINC/projects/EinsteinAtHome/plan_class_spec.xml'
2013-03-28 21:42:17.4756 [PID=24852]    [version] parsed project prefs setting 'gpu_util_brp': 0.500000
2013-03-28 21:42:17.4756 [PID=24852]    [version] plan class ok
2013-03-28 21:42:17.4756 [PID=24852]    [version] Don't need CUDA jobs, skipping version 133 for einsteinbinary_BRP4 (BRP4cuda32nv270)
2013-03-28 21:42:17.4756 [PID=24852]    [version] Checking plan class 'opencl-ati'
2013-03-28 21:42:17.4756 [PID=24852]    [version] parsed project prefs setting 'gpu_util_brp': 0.500000
2013-03-28 21:42:17.4757 [PID=24852]    [version] No ATI devices found
2013-03-28 21:42:17.4757 [PID=24852]    [version] no app version available: APP#19 (einsteinbinary_BRP4) PLATFORM#1 (i686-pc-linux-gnu) min_version 0
2013-03-28 21:42:17.4819 [PID=24852] [debug]   [HOST#1598090] MSG(high) No work sent
2013-03-28 21:42:17.4820 [PID=24852] [debug]   [HOST#1598090] MSG(high) No work is available for Binary Radio Pulsar Search (Perseus Arm Survey)
2013-03-28 21:42:17.4820 [PID=24852] [debug]   [HOST#1598090] MSG(high) No work is available for Gamma-ray pulsar search #2
2013-03-28 21:42:17.4820 [PID=24852] [debug]   [HOST#1598090] MSG(high) see scheduler log messages on http://einstein.phys.uwm.edu//host_sched_logs/1598/1598090
2013-03-28 21:42:17.4820 [PID=24852]    Sending reply to [HOST#1598090]: 0 results, delay req 60.00
2013-03-28 21:42:17.4823 [PID=24852]    Scheduler ran 0.284 seconds

I'm certainly no expert in interpreting scheduler log files but the above clearly shows that there is a shortfall of CPU work (295770.58 sec) and that LVE is excluded as per prefs. It doesn't make any comment about FGRP2 being deselected so presumably the scheduler knows that it is selected.

So why wasn't the work request filled with FGRP2 tasks?

The log suggests that there was no work available but the server status page suggests quite differently. There were plenty of available tasks at the time.

I've checked a number of GPU hosts in this venue and all are having the same problem. Seems like there might be a bug somewhere....

EDIT:

On thinking about this further, it might be useful for me to document certain times when particular behaviour was recorded. For instance, the time when the first LVE tasks were downloaded which should correspond to just after the time when the above quoted database query (that reset the LVE app selection to 'yes') was run. On one host I looked at, an FGRP2 task was received at 15:46 UTC on 27-03-13. An LVE task was then received at 16:20 UTC so I presume the query was run somewhere in that 34 minute interval. You posted your "Done." message at 16:17 so everything fits very nicely.

Another LVE task was received at 17:18 UTC. After that, there were a series of 19 FGRP2 tasks, the first at 22:24 UTC and the last at 10:09 UTC on 28-03-13. I had noticed the problem with receiving LVE tasks and had worked out that I needed to turn off the selection of the LVE app around 22:00 UTC. This fits quite nicely with the resumption of FGRP2 tasks at 22:24. The next 19 FGRP2 tasks show that things were operating normally until at least 10:09 UTC, which is almost 12 hours later. I've made no changes on this particular host and no preference changes on the website which would affect it, subsequent to the disabling of LVE tasks at around 22:00 UTC on 27-03-13. I'm guessing that my failure to get any further FGRP2 tasks must have been caused by something that happened server-side sometime shortly after 10:09 UTC on 28-03-13.

At the moment, all affected hosts have sufficient tasks cached so that none will actually run out of the desired CPU work for at least another 24-48 hours. The bulk have 3 day caches and some have even more. A few have less.

I know it's Easter but I'm hoping it might be something 'simple' to fix and if that could happen, it certainly would be appreciated.

Cheers,
Gary.

DanNeely
DanNeely
Joined: 4 Sep 05
Posts: 1364
Credit: 3562358667
RAC: 162

RE: An explicit opt-out

Quote:

An explicit opt-out of getting added new Apps as these are published will not be implemented. Whenever a new App is published, it will be added to the selection.

BM

As an enhancement, could you send messages out to the Boinc client Notices tab when you do so that people who are avoiding a project for some reason can get notice about the need to update their preferences.

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5845
Credit: 109973503676
RAC: 29775493

RE: I know it's Easter but

Quote:
I know it's Easter but I'm hoping it might be something 'simple' to fix and if that could happen, it certainly would be appreciated.

Hi Bernd,

Thanks very much for fixing this - whatever it was :-).

I saw you post a message in the 'News' forum and I wondered if you might visit this thread as well. A short time later, I noticed a 'Server shut down for maintenance' message so I figured something might be about to happen! Then the FGRP2 tasks started flooding in! It's a bit hard to imagine that was just a coincidence :-).

Whatever you did, it was very much appreciated!

Cheers,
Gary.

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5845
Credit: 109973503676
RAC: 29775493

Hi Bernd, Well, it seems

Hi Bernd,

Well, it seems that the weirdness continues.

I did observe quite a few machines getting FGRP2 tasks at the time I last posted. I've recently had a look at a couple of hosts and they are back to asking for FGRP2 work but not getting any. A quick look at the server status page shows zero FGRP2 tasks ready to send and the work generator not running.

Then I noticed a host actually get one task. It was a resend (_2) rather than a freshly generated new task, so that explains why.

The status page has just refreshed (04:45 UTC) but still no FGRP2 work. If this is likely to continue until next Tuesday, please advise so I can start switching hosts to LVE in an orderly manner before they run out.

Cheers,
Gary.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.