Fermi LAT Gamma-ray pulsar search #3 "FGRP3"

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5848
Credit: 110006939332
RAC: 24657537

In the message two before

In the message two before yours, A.M. noted that each task used about 600MB. I guess that's why you had problems running 5x (factor 0.2).

I note that you have no FGRP3 CPU tasks on that host. In this earlier message, Bernd said

Quote:
Currently FGRP is not at all affected by the 'Run CPU versions of applications for which GPU versions are available' setting, which is a bug we'll hope to fix in the next app version release.


I'm guessing that you have used that setting to exclude the FGRP3 CPU tasks which must mean the bug he referred to is now fixed. Please let us know if you are doing it some other way.

I have 10 hosts with 2GB HD7850 GPUs which have been running BRP5 tasks 4x and FGRP3 CPU tasks on the available CPU cores. They have all started getting FGRP3 GPU tasks which will take another day or two to reach the front of the queue. I'm actually using app_config.xml on these hosts to control GPU utilization so I've edited this file on each one to add similar code for FGRP3. I've found that I can suspend two of the running BRP5s and get two FGRP3 tasks to take their place. I assumed it might work as 2x600 + 2x300 is still less than 2GB. So far it has worked on multiple machines without any problem. Obviously running 4x is going to be problematic when a whole batch of FGRP3 tasks rises to the top of the queue. I'm certainly not going to try to micromanage each host and I don't want to run less than 4x so I think my only option is to disable FGRP3 completely on them.

I also have other hosts with 1GB cards that are running BRP5 2x and 3x. At the moment they are excluded from FGRP3 GPU tasks because of the 2GB limit but I'll have the same problem when Bernd lowers the memory limit from 2GB to 1GB next week.

Cheers,
Gary.

Holmis
Joined: 4 Jan 05
Posts: 1118
Credit: 1055935564
RAC: 0

@Gary Have you thought about

@Gary
Have you thought about using the option in the app_config.xml to limit the number of FGRP3 task?
As long as there are other GPU tasks in the queue that should still let you run x4 total.

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5848
Credit: 110006939332
RAC: 24657537

I already had a set to 4

I already had a set to 4 (temporarily, anyway) while I was trying to work out what to do. I had no intention of allowing more than two to start but I set it to 4 in the (probably mistaken) belief that I should set it the same as I was using for BRP5. I somehow thought that to get 4 tasks in total running I would have to set both to 4.

But now that you've prompted me to actually think about it, if I set it to 1 and leave on 0.25 and on 0.5 (where they currently are), and leave the BRP5 options at 4, 0.25 and 0.5 respectively, I should always have 4 tasks running, one of which (but not more) could be a FGRP3 task. Is that how it will work? I've never had different GPU task types running before so I'm not quite sure what to expect.

I'll go try some experiments now so thanks very much for the tip.

EDIT: That works quite nicely! I suspended the older unstarted BRP5 tasks leaving a group of FGRP3 followed by more BRP5. There were 4 running BRP5. I started suspending those one at a time. The first suspension allowed a FGRP3 to start. I watched it crunch through several checkpoints. I suspended the second BRP5. This time a BRP5 task that was newer than other FGRP3 tasks that were available was started so the of 1 was doing its job.

There was a slight gotcha. When the FGRP3 GPU task started, a FGRP3 CPU task stopped. I set to 2 and the CPU task restarted. I wonder if this could allow 2 GPU tasks to run and stop a running CPU task. I'd really like to avoid having 2 FGRP3 GPU tasks running. That really does start to impact on the BRP5 tasks. Maybe I'll end up doing what Jeroen has done and get rid of FRGP3 CPU tasks completely on these hosts.

I'm sure glad that FGRP3 has been limited to 2GB cards for the moment :-).

Cheers,
Gary.

Holmis
Joined: 4 Jan 05
Posts: 1118
Credit: 1055935564
RAC: 0

Ah, forgot that the also

Ah, forgot that the also affects CPU tasks.

To really take control you would have to run/upgrade to a quite recent (possibly beta, I'm not quite sure) version of Boinc and use the and/or
options in app_config.xml. That should allow for different settings for CPU and GPU tasks. I haven't tested that so can't really help with the how, I just know the option is there with newer versions of Boinc.

For anyone interested in app_config.xml refer to http://boinc.berkeley.edu/wiki/Client_configuration for documentation on it's use, it's at the end of the page.

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 2140
Credit: 2773079709
RAC: 880851

RE: Ah, forgot that the

Quote:

Ah, forgot that the also affects CPU tasks.

To really take control you would have to run/upgrade to a quite recent (possibly beta, I'm not quite sure) version of Boinc and use the and/or
options in app_config.xml. That should allow for different settings for CPU and GPU tasks. I haven't tested that so can't really help with the how, I just know the option is there with newer versions of Boinc.

For anyone interested in app_config.xml refer to http://boinc.berkeley.edu/wiki/Client_configuration for documentation on it's use, it's at the end of the page.


The extra controls for app_config.xml are available in the newly-recommended v7.2.39, though I'd still treat it as beta: I think there are a few more fixes in the pipeline, after they've all finished playing with their Android cellphones.

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5848
Credit: 110006939332
RAC: 24657537

RE: RE: To really take

Quote:
Quote:
To really take control you would have to run/upgrade to a quite recent (possibly beta, I'm not quite sure) version of Boinc and use the and/or options in app_config.xml...

The extra controls for app_config.xml are available in the newly-recommended v7.2.39, though I'd still treat it as beta: I think there are a few more fixes in the pipeline, after they've all finished playing with their Android cellphones.


Thanks to both of you for your input.

Claggy had alerted me elsewhere to the newer controls in app_config. The HD7850 hosts were built over a period. The older ones have 7.1.3 and the newer ones have 7.2.33. I'll need to go to 7.2.39 to get these new controls.

The Linux distro I use doesn't have the latest BOINCs in the repo so I always install manually from Berkeley. It's not hard - just time consuming. I tend not to upgrade BOINC unless I really need to.

My machines are spread over essentially three locations - my home and two at an industrial complex about 30 mins away. Last Friday afternoon, I happened to be at the complex when there was a very slight power fluctuation, followed about 20secs later by a somewhat bigger fluctuation which took everything out. The large bulk of machines have no keyboard, mouse or screen so they don't handle rebooting all that well :-). About two hours later, I had most of the machines restarted when there was a further and even more savage power drop which took everything out once again.

However, I did learn something very useful. For such events, I usually go to each machine, attach the peripherals, hit the reset and wait for the reboot. When attaching the screen, it's usually blank so .... This time, I tried hitting which is the sequence for restarting X. I was pleasantly surprised to see the monitor spring to life and the KDE desktop load up. Saves a lot of time compared to a full restart and the large majority of machines could be handled this way. Obviously underneath it all, Linux had already booted but the startup sequence had stalled when X tried to run with no monitor attached. Restarting X allowed the monitor to be detected and the full sequence to complete.

As always with these events, some hardware and OS damage may occur. There's usually a couple of machines that are cranky about something or other. This time I'm still rebuilding a machine where the root partition was completely trashed. Fortunately, these power outages are quite rare normally.

Cheers,
Gary.

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 2140
Credit: 2773079709
RAC: 880851

Sounds like you need Santa to

Sounds like you need Santa to drop you off a bunch of dummy video plugs? Let X boot with those, replace with a live video cable when you need to see anything?

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5848
Credit: 110006939332
RAC: 24657537

That's a good idea!

That's a good idea! Unfortunately I don't have any lying around to try so I'll make one up when I get a bit of spare time. I suspect it may not work as the startup sequence seems to probe the monitor in some fashion and complain/refuse to run anyway if it doesn't like what it finds.

If it does work, I'd need more than 80 :-). I think I might end up putting up the (quite rare) power failures. It's pretty easy to hit :-).

Cheers,
Gary.

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 2140
Credit: 2773079709
RAC: 880851

I can't speak for Linux, but

I can't speak for Linux, but in Windows a simple passive plug is usually enough to get video displayed - though usually in a fail-safe default resolution (SVGA 800x600 for Windows 98, XGA 1024x768 for Windows XP and above). It's only the higher resolutions and bespoke refresh rates that need any digital 'probing' of the monitor.

But I've still got a bunch of D-15 dummy plugs I wired up in the days when Apple hard-wired the cables for their proprietary monitors, so the old Macs (we're talking System 7 days) could select the right resolution at startup.

Holmis
Joined: 4 Jan 05
Posts: 1118
Credit: 1055935564
RAC: 0

About the setting, I just

About the setting, I just added it to my own app_config.xml, setting it to 2 and by mistake inserted it after the tag then after doing a "advanced -> read config files" I have 2 CPU FGRP3 tasks and 2 GPU FGRP3 tasks running. The goal was to only limit the GPU to max 2 at a time and I seem to have accomplished that.

So like this it appears that it only affects the GPU tasks:
[pre]
hsgamma_FGRP3

2
0.25
1

[/pre]
It's not compliant to the documentation but if it works it works! =)

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.