The O2-All Sky Gravitational Wave Search on GPUs - discussion thread.

Gary Roberts

Moderator

Joined: 9 Feb 05

Posts: 5872

Credit: 117536110160

RAC: 35340315

Zalster wrote:I believe Gary

3 Sep 2019 1:05:20 UTC

Message 173084 in response to message 173057

(moderation:

)

Zalster wrote:

I believe Gary is researching different architectures of CPUs early vs modern as a cause of issues?

Yes, I've been doing that for a range of different Intel CPUs. Apart from a small number of Phenom II processors, the bulk of my machines are Intel based and most of those are older generations. Virtually all my machines have an AMD GPU and these are in two different groups. About half are the older GCN 1st gen (southern islands - SI) such as HD 7850, R7 370, HD 7950. The others are newer GCN 4th gen (Polaris) such as RX 460, RX 560, RX 570 and RX 580.

The very first tests I did used RX 570 GPUs, one in a Q6600 (2008) quad core host and another in a G4560 Kaby lake machine (2C/4T). Essentially irrespective of task concurrency, the G4560 gave a very high proportion of valid results (just a single invalid) whilst the Q6600 gave zero valid results. These weren't compute errors, just failure to validate. At a later stage I followed up with around 40 tasks on the Q6600, all crunched at x1. Zero valid results. This behaviour was what made me wonder if CPU/motherboard architecture was somehow involved since the GPUs were identical.

A second set of tests was to see if SI cards could give valid results. I ran a host under the old fglrx/proprietary OpenCL driver and got 100% invalid. I also tried the same host under the new amdgpu/OpenCL from AMDGPU-PRO and this also gave 100% invalid. The CPU was a Q8400 quad core (2009) and since I'd already seen 100% invalid using a modern GPU on a Q6600, this just seemed to confirm that there wasn't much use persisting with older CPU architectures. Bear in mind that all the above hosts have no trouble handling FGRPB1G tasks.

To further test the effect of CPU architecture, I ran GW tasks at x1 on hosts with the following combinations. I had started testing with V1.07 and then the unfortunate V1.08 came along so the comments below only refer to the observed behaviour under V1.07. There were no examples of 'real' computation errors, but see (*).

Q6600 (Core 2 Quad - 2008 - 2.4GHz) / RX 460 --> All that completed without MD5 issue were valid(*).
E6300 (Wolfdale dual core - 2009 - 2.8GHz) / RX 460 --> The majority were invalid (just 2 valid).
G640 (Sandy Bridge dual core - 2012 - 2.8GHz) / RX 570 --> All were invalid.
G3260 (Haswell dual core - 2015 - 3.3GHz) / RX 460 --> All were valid.

(*) With this test, the aim was to see if older CPU architecture would work better with 'slower' GPUs. I had a batch of around 15 tasks initially and 3 completed without incident. The 4th also completed but then registered at the very end as a comp error due to one of the large data files failing a MD5 checksum check. Of course this immediately resulted in all unstarted tasks (which depended on the same data file) immediately being declared as comp errors with no attempt to start any of them. The machine went into a multi-hour backoff with the failed tasks just sitting there.

Einstein uses a 'resend lost tasks' feature. I reasoned that I should be able to make the failed tasks become 'lost' and get fresh copies, along with a fresh download of the 'supposedly suddenly corrupt' large data file. I have done this sort of thing before and I know that the technique works. So I stopped BOINC and edited the state file to make all the failed tasks 'disappear'. It worked a treat except for one small detail. The supposedly corrupt file was replaced, the lost tasks were all replaced but there was an extra download going on - a copy of the CPU app. Yes, that's right, the lost GPU tasks were being replaced with the very same tasks but as CPU versions rather than fresh copies of GPU versions, even though my preferences were set not to allow CPU tasks. It seems like the rule to have 1 CPU and 1 GPU task in the quorum makeup doesn't take into account the fact that there was already a CPU task if the GPU task goes missing :-).

So, since the aim was to help test validation of GPU tasks, I aborted this batch of CPU tasks and got a brand new batch of 10 which indeed were all GPU tasks this time. Crunching resumed normally and I moved on to other things. When I came back later to see how things were going, the very first task had crunched to the end but a different large data file had failed the MD5 check this time and the whole batch had been declared as comp errors. At that point I decided to give up with that machine (3 valid, 0 invalid, 22 MD5sum errors) and put it back on FGRPB1G where it continues to have no problems.

When Bernd announced that V1.08 was giving invalid results and was to be withdrawn, it wasn't clear if something new would take its place or if the previous V1.07 would be resurrected. I decided to wait and see so all the above hosts were simply put back on FGRPB1G until there was a further announcement. My reasoning was that V1.07 probably already has sufficient examples of the validation issues to be studied and that it would be more useful to help test the next iteration whenever that might be. I didn't think it would take this long.

Zalster wrote:

So in short, the GW on GPU require more than 1 CPU thread.

Only for nvidia GPUs. On all my tests with AMD GPUs, the CPU time component has been well short of the elapsed time - quite often around half the elapsed time. I believe this may be a consequence of the "spin wait" behaviour of nvidia's OpenCL implementation. A significant fraction of the computation of GW GPU tasks is done on the CPU so I imagine the combination of the two things is why more than 100% of a CPU core is showing up in the crunch times for nvidia GPUs.

As far as setting cpu_usage to 1.5 in app_config.xml is concerned, that might be necessary if using nvidia GPUs and if also using all spare threads for CPU crunching. It would probably be easier for people with nvidia GPUs and not wanting to go down the app_config.xml path to just use the % of cores that BOINC is allowed to use to make sure there is a spare thread or two when crunching GW tasks on nvidia. That would probably be simpler for the average user to set up and change when necessary.

Cheers,
Gary.

archae86

Joined: 6 Dec 05

Posts: 3157

Credit: 7220564931

RAC: 970958

Gary Roberts wrote:Zalster

3 Sep 2019 1:18:59 UTC

Message 173085 in response to message 173084

(moderation:

)

Gary Roberts wrote:

Zalster wrote:
So in short, the GW on GPU require more than 1 CPU thread.
Only for nvidia GPUs. On all my tests with AMD GPUs, the CPU time component has been well short of the elapsed time - quite often around half the elapsed time.

Actually, I've seen reported CPU time somewhat higher than 100% of reported elapsed time on GW 1.07 work done on AMD RX 570/Windows 10/Intel CPU hosts. Also, I've observed the reported CPU time per task to go quite a bit higher when the multiplicity is raised.

I have the impression, not carefully checked, that the current Linux implementation of the V1.07 GW GPU application uses substantially less CPU time than does the current Windows implementation.

In my case, I've found no need to make special accommodation, but I am running zero BOINC CPU tasks, and few enough BOINC GPU tasks that there is more than one CPU core available per GPU task physically. People who work the CPU side of their systems harder may find this more important to mitigate.

cecht

Joined: 7 Mar 18

Posts: 1533

Credit: 2900208892

RAC: 2174544

Richie wrote:Both of my hosts

3 Sep 2019 1:31:24 UTC

Message 173086 in response to message 173079

(moderation:

)

Richie wrote:

Both of my hosts have stopped getting new tasks. Anyone else?

Same here. No new GW GPU work for the past 7 hr on either of my hosts.

Ideas are not fixed, nor should they be; we live in model-dependent reality.

Gary Roberts

Moderator

Joined: 9 Feb 05

Posts: 5872

Credit: 117536110160

RAC: 35340315

archae86 wrote:I have the

3 Sep 2019 2:03:20 UTC

Message 173087 in response to message 173085

(moderation:

)

archae86 wrote:

I have the impression, not carefully checked, that the current Linux implementation of the V1.07 GW GPU application uses substantially less CPU time than does the current Windows implementation.

Yes, now that you mention it, I'm basing my comment pretty much entirely on my own Linux experience where even at 3x, I don't recall ever seeing a reported CPU time exceeding the elapsed time. I haven't really looked at any AMD results on Windows machines so I might easily have overlooked a difference in behaviour for Windows.

It seems that on each occasion I've looked at somebody's nvidia result, I've seen a higher CPU time than the elapsed time. I wasn't looking at the OS on those occasions so don't recall if there was any difference between Windows and Linux. They were probably all Windows observations anyway :-).

Cheers,
Gary.

Keith Myers

Joined: 11 Feb 11

Posts: 4964

Credit: 18713184914

RAC: 6374941

Quote:Only for nvidia GPUs.

3 Sep 2019 3:24:05 UTC

Message 173088 in response to message 173084

(moderation:

)

Quote:

Only for nvidia GPUs. On all my tests with AMD GPUs, the CPU time component has been well short of the elapsed time - quite often around half the elapsed time. I believe this may be a consequence of the "spin wait" behaviour of nvidia's OpenCL implementation. A significant fraction of the computation of GW GPU tasks is done on the CPU so I imagine the combination of the two things is why more than 100% of a CPU core is showing up in the crunch times for nvidia GPUs.

Except that I have swan_sync=1 in my environment variable for GPUGrid tasks to put Nvidia cards into "no blocking sync" mode to avoid the "spin wait" default.

I still use the whole of the cpu to support the CGW task since my cpu_time matches my run_time.

Richie

Joined: 7 Mar 14

Posts: 656

Credit: 1702989778

RAC: 0

I looked at run times on my

3 Sep 2019 3:47:06 UTC

Message 173089

(moderation:

)

I looked at run times on my Windows + Nvidia GTX 960 hosts. I noticed that cpu time has been less than run time... but only by a small amount. Run times were roughly 6600 and 6200 seconds while running 1x... and cpu time has constantly been about 20-30 seconds less than that per ask.

Well, these 960s are not high end gpus anymore. If these hosts run a newer series 'big block' Nvidia gpu then I wouldn't be surprised if it required more cpu support.

Zalster

Joined: 26 Nov 13

Posts: 3117

Credit: 4050672230

RAC: 0

My machine like Keith's is

3 Sep 2019 4:30:32 UTC

Message 173090

(moderation:

)

My machine like Keith's is running Linux as well as having the Swan_Sync=1. Unfortunately we had a severe thunderstorm last night and lost power. Once power came back, it seems to have shorted out my fiberoptic cable box so I'm out of luck until late Thursday for Internet.

Edit..

132 pending, 97 valid no invalids yet

Jim1348

Joined: 19 Jan 06

Posts: 463

Credit: 257957147

RAC: 0

I don't think swan_sync does

3 Sep 2019 5:05:38 UTC

Message 173091

(moderation:

)

I don't think swan_sync does anything here.

https://einsteinathome.org/content/fgrpb1g-cpu-usage-factor#comment-168589

Keith Myers

Joined: 11 Feb 11

Posts: 4964

Credit: 18713184914

RAC: 6374941

I had long forgotten those

3 Sep 2019 7:31:02 UTC

Message 173092

(moderation:

)

I had long forgotten those posts. OK. Swan_sync not applicable for OpenCL tasks. So the cpu usage is totally dependent on how the app developer wrote the app for cpu support.

For my tasks, it looks to be 1:1 cpu_time:run_time except for one task I was playing with ncpus counts.

Matt White

Joined: 9 Jul 19

Posts: 120

Credit: 280798376

RAC: 0

cecht wrote:Richie wrote:Both

3 Sep 2019 14:25:55 UTC

Message 173095 in response to message 173086

(moderation:

)

cecht wrote:

Richie wrote:
Both of my hosts have stopped getting new tasks. Anyone else?
Same here. No new GW GPU work for the past 7 hr on either of my hosts.

I stopped getting new GPU GW work sometime yesterday. I suspect there may be a backlog of gamma ray tasks piling up. Even my cache of CPU tasks is switching to gamma ray work.

My 2 machines are still crunching on the remaining work in the local cache, but the LINUX box has only 4 GW GPU tasks left.

Clear skies,

Matt

The O2-All Sky Gravitational Wave Search on GPUs - discussion thread.

Forums › Cruncher's Corner

Comment viewing options

Forums › Cruncher's Corner