Fermi LAT Gamma-ray pulsar search #3 "FGRP3"

Gary Roberts

Moderator

Joined: 9 Feb 05

Posts: 5872

Credit: 117559186677

RAC: 35363411

Thanks very much for

25 Feb 2014 4:33:30 UTC

Message 119775 in response to message 119774

(moderation:

)

Thanks very much for reporting this.

I wonder if could be used within the variant of . It's not listed as one of the possible elements so I assume not. If it could be, it seems that you could just ditch .... altogether.

I'm thinking ahead to when 1GB cards are allowed to do FGRP3. You wouldn't want to try to run 2 of these concurrently but you could run a BRP5 and a FGRP3 or two BRP5s. You could even run three BRP5s and get an efficiency gain if the card was good enough. If you could set separately for each , you should be able to cover various possible combinations - 1xFGRP3+1xBRP5, 2xBRP5, 3xBRP5.

Cheers,
Gary.

archae86

Joined: 6 Dec 05

Posts: 3157

Credit: 7221314931

RAC: 973924

openCL versions for Nvidia of

25 Feb 2014 5:05:22 UTC

Message 119776

(moderation:

)

openCL versions for Nvidia of FGRP3 work showed up on both of my two machines with GTX660 cards late February 21 UTC time, but I just spotted them a few hours ago. I pushed them into service artificially by putting some Perseus work on hold.

1. I got wildly different behavior on the two machines, with one advancing at a steady, moderate rate (though GPU utilization far below my accustomed 97%), which while it did not keep the GPU very busy, at least made moderate progress, while the other essentially stalled out for extensive periods. Given the difference of my two hosts, this obviously is dependent on some user host hardware or software configuration issues.

2. on investigation, I found that the CPU-side support task: hsgamma_FGRP3_1.09_windows_x86_64__FGRPopencl-nvidia.exe was running at "Idle" priority in Windows terms on the machine getting stalls.

The question of the right priority for the CPU support tasks of GPU jobs is somewhat vexed. Commonly, for cases where the CPU job runs in very short bursts, each of which may be causing the precious GPU to stall, and the maximum CPU consumption is at most a 25% or so of a core, people at SETI and (I think) here raise the priority of this support task even to "above normal" as a way to keep their GPU busier. On the other hand, in this case the CPU task is doing much more of the work, and on some systems the interactive user performance impact of an "above normal" setting might be pretty toxic.

Neverthess, I used Process Lasso to raise the priority class of the support task from idle to "above normal". I was rewarded by the indicated CPU consumption rising from 0% to 12% (where 12.5% would mean it was getting a full virtual CPU on this 4-core hyperthreaded machine) and progress picking up greatly. The host where progress was already good turned out to be one where I had made this setting before starting the first job.

What with the high reserved GPU memory requirement, the high reserved and actual CPU consumption, priority issues, and the like, it seems to me that under current conditions this application will run extremely inefficiently on a significant fraction of hosts, many of which may be surprised even to find the tasks on their machine, as not all may have caught on to the changeover from "opt-in" to "you must opt out after the new thing is offered, or it will just come to you". I'd predict we may see more of the heavy-hitters who may have been lured to use the web page application selection move back to app_info based control.

Sunny129

Joined: 5 Dec 05

Posts: 162

Credit: 160342159

RAC: 0

my HD 7970 host seems to have

25 Feb 2014 6:52:41 UTC

Message 119777

(moderation:

)

my HD 7970 host seems to have received 20 of the new GPU test tasks...gotta get to bed though b/c its late here. we'll see if they validate (or at least complete without error) in the morning...

Gary Roberts

Moderator

Joined: 9 Feb 05

Posts: 5872

Credit: 117559186677

RAC: 35363411

RE: About the setting, I

25 Feb 2014 8:00:10 UTC

Message 119778 in response to message 119774

(moderation:

)

Quote:

About the setting, I just added it to my own app_config.xml, setting it to 2 and by mistake inserted it after the tag ....

As a result of a problem, I've just tried this and I have a different experience.

The machine is a 6 core with a HD7850. Prior to the FGRP3 GPU app it was set for using 5 cores and running 4x on the GPU. This was controlled previously with app_config (BRP5) set to max=4, CPU=0.5, GPU=0.25. Running 4 BRP5 automatically reserved 2 cores leaving 3 for CPU tasks.

Yesterday I added FGRP3 to the file also using the same values. I anticipated I would need the max=4 to allow the 3 CPU tasks to continue and that this would limit the FGRP3 GPU involvement to just 1 at a time.

The FGRP3 GPU tasks on board have now risen to the top of the queue and what has started to happen is that a second one is crunching and a CPU task has been relegated to 'waiting to run' status. So actually running are 2xBRP5, 2xFGRP3 GPU and 2xFGRP3 CPU. At least the max=4 is working :-).

So I edited app_config.xml to change to max=1 and I put it below the tag like you indicated. When I reread the config file, all that happened was that the 'waiting' CPU task started running - shock, horror!! :-).

I now had 4 GPU tasks and 3 CPU tasks all running. The max=1 is certainly being ignored. I'm assuming that putting it below is effectively hiding it. Could you explain your previous observation if was effectively unset and that tasks would just start in cache order (oldest first)?

Cheers,
Gary.

Holmis

Joined: 4 Jan 05

Posts: 1118

Credit: 1055935564

RAC: 0

RE: I now had 4 GPU tasks

25 Feb 2014 11:35:12 UTC

Message 119779 in response to message 119778

(moderation:

)

Quote:

I now had 4 GPU tasks and 3 CPU tasks all running. The max=1 is certainly being ignored. I'm assuming that putting it below is effectively hiding it. Could you explain your previous observation if was effectively unset and that tasks would just start in cache order (oldest first)?

Sorry about that, I seem to have jumped to a faulty conclusion and you're right about the task order in the cache.
When I got up this morning I had 4 FGRP3 tasks running on my 660Ti and the computer was acting sluggish with delays opening windows and the cursor make small jumps when moved around.

Lesson to be had is to not test new things and dish out advice right before going to bed! =)

archae86

Joined: 6 Dec 05

Posts: 3157

Credit: 7221314931

RAC: 973924

A few additional comments on

25 Feb 2014 18:52:37 UTC

Message 119780

(moderation:

)

A few additional comments on the opencl-nvidia FGRP3 jobs as running on my two hosts with GTX660 cards.

I think people may find these jobs to increase in performance with allowance of simultaneous tasks on the GPU more than they are used to. This makes sense, as the appreciable remaining workload left to the CPU means the GPU is not very busy at all. Even with three tasks sharing the GPU I'm showing an average GPU load (as reported by GPU-Z) of about 63% during the first 99% of reported progress. As mine are 2 GB memory cards, and GPU-Z reports 1609 MB static and 31 MB dynamic memory consumption, I don't think my card is likely to support four simultaneous. Were the memory available, I think it likely this would be yet more productive by an appreciable amount.

The GPU does not seem to be employed during the final phase which starts at about 99% completion ("spindown"?). Quite likely that means this phase is not accelerated compared to the pure CPU version of these same tasks. As the first 99% is accelerated (on this particular host by about a factor of 5), this will make the oddity of the last 1% taking a long and unpredictable time yet more pronunced on a fractional basis. Three is a tiny sample size, but on my first three jobs running as 3-up (with no other BOINC job active on this 4-core machine), the first 99 percent took almost exactly 99 minutes, with the remaining time to finish varying from 6 to 18 minutes. Happily this variability should desynchronize the jobs, which I suspect will gave a slight efficiency gain.

Holmis

Joined: 4 Jan 05

Posts: 1118

Credit: 1055935564

RAC: 0

RE: The GPU does not seem

25 Feb 2014 19:55:12 UTC

Message 119781 in response to message 119780

(moderation:

)

Quote:

The GPU does not seem to be employed during the final phase which starts at about 99% completion ("spindown"?).

Quoted from Bernd's initial post in this thread:

Quote:

- improved coherent (follow-up) stage: There will only be a single, though deeper, follow-up done at the end of each task. Previously, we had a less sensitive follow-up after every 11 sky points. However, the run-time of the new follow-up depends on the outcome of the previous semicoherent stage and is not predictable - it may vary between 30s and 30m. There is currently no checkpointing during that stage.

Adding to your observations of the number of task to run simultaneously, I accidentally ran x4 on my 660Ti and it worked but caused my desktop environment to act sluggish, delays when opening windows explorer, mouse cursor not moving smoothly. Do note that I'm also running tasks on my Intel HD 4000 and that probably puts more stress on my system.

Sunny129

Joined: 5 Dec 05

Posts: 162

Credit: 160342159

RAC: 0

RE: I think people may find

26 Feb 2014 0:21:45 UTC

Message 119782 in response to message 119780

(moderation:

)

Quote:

I think people may find these jobs to increase in performance with allowance of simultaneous tasks on the GPU more than they are used to. This makes sense, as the appreciable remaining workload left to the CPU means the GPU is not very busy at all. Even with three tasks sharing the GPU I'm showing an average GPU load (as reported by GPU-Z) of about 63% during the first 99% of reported progress. As mine are 2 GB memory cards, and GPU-Z reports 1609 MB static and 31 MB dynamic memory consumption, I don't think my card is likely to support four simultaneous. Were the memory available, I think it likely this would be yet more productive by an appreciable amount.

i concur. the GPU utilization induced by a new FGRP3 tasks is substantially lower than that of a BRP5 or a BRP4G task. while the latter will easily put my 7970 at 90%+ utilization running 3 at a time, the former has my 7970 at only ~55% utilization, and i'm running them 5 at a time! VRAM consumption is currently at ~2320MB, i'm sure i could run 6 at a time without a problem. i know we should expect more calculations on the GPU and less on the CPU as the application matures over time, but i wonder if we should expect the FGRP3 tasks to eventually utilize our GPUs as well as the BRP tasks do...

Neil Newell

Joined: 20 Nov 12

Posts: 176

Credit: 169699457

RAC: 0

RE: Sounds like you need

26 Feb 2014 17:40:21 UTC

Message 119783 in response to message 119771

(moderation:

)

Quote:

Sounds like you need Santa to drop you off a bunch of dummy video plugs? Let X boot with those, replace with a live video cable when you need to see anything?

Pretty sure they aren't required on X/linux, at least for BOINC; all but one of my linux hosts are headless, from 9600GTs to GTX580s (with a couple of AMDs in there) and they all restart automagically after power cuts. Curiously the only one that gives trouble is one that actually has monitors attached (I'd assumed it was some sort of race condition, as it's my fastest host - but now I'm wondering if it's because there ARE monitors atached!).

Basically dummy plugs are just 75 ohm terminators, which is the standard for analogue video connections. Generally more important on systems designed for mobile, because where power consumption is important it's worth not powering up the video drivers if there's nothing connected. Anything even vaguely modern will use EDID, but again X starts fine without EDID (though I vaguely recall there may be an X setting for that).

Xandro BA

Joined: 23 Jul 13

Posts: 49

Credit: 4522731

RAC: 0

Odd: Got a new wu FGRP 3 1.11

27 Feb 2014 13:57:35 UTC

Message 119784

(moderation:

)

Odd: Got a new wu FGRP 3 1.11 (had 1.09 before). It ran for 52 minutes to 7% or so and then dropped to .793%. wu: 184763722. No explanation for the sudden drop. Will see what it does until completion. Other task version 1.09 at the same time did not have a drop that i could see.

Fermi LAT Gamma-ray pulsar search #3 "FGRP3"

Forums › Technical News

Comment viewing options

Forums › Technical News