Running multiple WUs reduces performance

shuhui1990

Joined: 16 Sep 06

Posts: 27

Credit: 3631456971

RAC: 0

27 Feb 2019 5:56:22 UTC

Topic 218278

(moderation:

)

I discovered that on recent data files running multiple WUs concurrently actually reduces performance.

On 2/1, with my GTX 1080 Ti running 3 WUs, the elapsed time is 1268s. 423s per WU. I don't have the record for running 1 WU or 2 WUs. But they are definitely longer.

On recent data files the crunch time is significantly longer. However when I switch to single WU the crunch time became much better. Here's the data.

Concurrency	Elapsed time	Time/WU
3	1668	556
2	1140	570
1	455	455

On my host with GTX 1080 I observed the same behavior.

Concurrency	Elapsed time	Time/WU
3	2381	794
2	1680	840
1	680	680

However on GTX 980 Ti and Radeon VII, running 3 WUs is still optimal. Here's the data for Radeon VII.

Concurrency	Elapsed time	Time/WU
3	495	165
2	356	178
1	208	208

All of these GPUs have enough CPU resources.

Does anyone observe the same phenomenon?

Updated on 2/28:

I finally found the cause of the problem. SLI is the culprit.

When SLI is enabled, on my host with 6700K and 2 GTX 1080 Ti running 2 CPU tasks and 6 GPU tasks, the CPU utilization of hsgamma_FGRPB1G is 4~5%. The CPU time is about one third of the run time.

When SLI is disabled, the CPU utilization of hsgamma_FGRPB1G is 11~12%. The CPU time is close to the run time. The GPUs don't have to wait for the CPU.

I don't know how SLI affects this.

Updated on 3/1:

After disabling SLI, my initial conclusion remains true. Running 3x with SLI off isn't as bad as running 3x with SLI on. But it's still worse than running 1x.

On my host with 6700K and GTX 1080 Ti, with SLI on:

Task	Concurrency	Run Time	CPU Time	Time/WU
LATeah1049M	1	452	440	452
LATeah1049M	3	1641	514	547

With SLI off:

Task	Concurrency	Run Time	CPU Time	Time/WU
LATeah1049M	3	1494	1400	498
LATeah1049M	2	1090	1019	545
LATeah1049M	1	466	432	466

Others are welcome to post their findings.

Zalster

Joined: 26 Nov 13

Posts: 3117

Credit: 4050672230

RAC: 0

When I used to run windows I

27 Feb 2019 6:13:46 UTC

Message 169786

(moderation:

)

When I used to run windows I ran 3 per card as the times were faster than single work units.

There are a lot of factors that influence this. Which cards, which MoBo, which CPU, Ram Speed, OS.

The last thing you need to make sure of is that all the work units come from the same source. Since we know that

some work units are faster than others, it doesn't make sense to compare a fast work unit to 2 slow+1 fast on a GPU.

Linux on the other hand is the fastest no matter what you are running.

shuhui1990

Joined: 16 Sep 06

Posts: 27

Credit: 3631456971

RAC: 0

The data were taken within

27 Feb 2019 6:43:36 UTC

Message 169788 in response to message 169786

(moderation:

)

The data were taken within ten days. The data files range from LATeah1043L to LATeah1049M. They don't have much variation in complexity and crunch time. I am interested in others' results.

mmonnin

Joined: 29 May 16

Posts: 292

Credit: 3444726540

RAC: 537810

What is your GPU utilization

27 Feb 2019 14:51:45 UTC

Message 169793

(moderation:

)

What is your GPU utilization with 1x task?

By "All of these GPUs have enough CPU resources." do you mean there is an open core for the GPU tasks are did you actually permanently set the CPU affinity for CPU and GPU tasks? It makes a difference. If I let windows handle the affinity with like 75%/2 open CPU threads the GPU utilization is worse than if I set the affinity with Process Lasso and keep all other CPU processes away from the GPU exe processes.

I mention this as the tasks on 12699683 have CPU run time variation when you're running 1 task or multiple

These tasks nearly identical CPU time even though the 2nd task looks to have ran at 3x.

762

594

2,346

597

https://einsteinathome.org/task/827048827

https://einsteinathome.org/task/830422253

shuhui1990

Joined: 16 Sep 06

Posts: 27

Credit: 3631456971

RAC: 0

When running 1x the GPU

27 Feb 2019 16:29:08 UTC

Message 169795 in response to message 169793

(moderation:

)

When running 1x the GPU utilization is 90% constantly. When running 2x/3x the utilization is 95%+.

I leave empty threads for GPU tasks. I don't think "CPU run time variation " is the problem. If the GPU doesn't get enough CPU support, the utilization drops.

https://einsteinathome.org/host/12699683/tasks/4/40?sort=desc&order=Run+time

When I run the 3x tasks reported on Feb 27th on this page, I suspended all CPU tasks. But the CPU time is still much less than the run time. And the run time 2220 is still longer than 3x660 when running 1x.

Or does the cpu_usage set in app_config not only affects how BOINC arranges tasks, but also how much CPU time GPU tasks ask? I thought they just grab what they need no matter what it's set in app_config.

archae86

Joined: 6 Dec 05

Posts: 3161

Credit: 7271528401

RAC: 1799504

shuhui1990 wrote:Does anyone

27 Feb 2019 17:50:04 UTC

Message 169798

(moderation:

)

shuhui1990 wrote:

Does anyone observe the same phenomenon?

I do not. In commissioning my Nvidia RTX 2080 on a Windows 10 host in October, I observed productivity improvement for 2X over 1X. Far more recently (yesterday) in commissioning an AMD RX 570 on a different Windows 10 host, I saw a clear performance improvement of 2X over 1X (about 8% throughput improvement, at a system power cost of only 2%, so a clear modest win overall). All my comments regard Einstein GPU GRP work on Windows, running an application which has not changed in many months.

I think your adverse results are not a consequence of some change in the behavior of recent data files, but the symptom of some fixable configuration problem in your system.

mmonnin

Joined: 29 May 16

Posts: 292

Credit: 3444726540

RAC: 537810

shuhui1990 wrote:When running

27 Feb 2019 21:23:52 UTC

Message 169801 in response to message 169795

(moderation:

)

shuhui1990 wrote:

When running 1x the GPU utilization is 90% constantly. When running 2x/3x the utilization is 95%+.

I leave empty threads for GPU tasks. I don't think "CPU run time variation " is the problem. If the GPU doesn't get enough CPU support, the utilization drops.

https://einsteinathome.org/host/12699683/tasks/4/40?sort=desc&order=Run+time

When I run the 3x tasks reported on Feb 27th on this page, I suspended all CPU tasks. But the CPU time is still much less than the run time. And the run time 2220 is still longer than 3x660 when running 1x.

Or does the cpu_usage set in app_config not only affects how BOINC arranges tasks, but also how much CPU time GPU tasks ask? I thought they just grab what they need no matter what it's set in app_config.

At 90% I would expect some improvement with concurrent tasks. I run 2x and some people have reported slight gains with 3x and 4x but the returns really start to diminish past 2x.

The cpu_usage in an app_config only changes how many tasks BOINC will allow to run. If you have it set to 7 on an 8 thread machine then to run 2 tasks you would need 14 threads. Thats not possible so even if 0.5 GPUs you're still limited to 1 concurrent task as BOINC is requesting another 7 CPU threads. In no way does it change the CPU usage of a GPU exe.

shuhui1990

Joined: 16 Sep 06

Posts: 27

Credit: 3631456971

RAC: 0

mmonnin wrote:What is your

28 Feb 2019 23:23:57 UTC

Message 169821 in response to message 169793

(moderation:

)

mmonnin wrote:

What is your GPU utilization with 1x task?

By "All of these GPUs have enough CPU resources." do you mean there is an open core for the GPU tasks are did you actually permanently set the CPU affinity for CPU and GPU tasks? It makes a difference. If I let windows handle the affinity with like 75%/2 open CPU threads the GPU utilization is worse than if I set the affinity with Process Lasso and keep all other CPU processes away from the GPU exe processes.

I mention this as the tasks on 12699683 have CPU run time variation when you're running 1 task or multiple

These tasks nearly identical CPU time even though the 2nd task looks to have ran at 3x.

762 594

2,346 597

https://einsteinathome.org/task/827048827

https://einsteinathome.org/task/830422253

I tried Process Lasso but it didn't make a difference. What made a difference is disabling SLI. When SLI is enabled, the CPU time is a fraction of the run time.

shuhui1990

Joined: 16 Sep 06

Posts: 27

Credit: 3631456971

RAC: 0

archae86 wrote:shuhui1990

28 Feb 2019 23:26:11 UTC

Message 169822 in response to message 169798

(moderation:

)

archae86 wrote:

shuhui1990 wrote:
Does anyone observe the same phenomenon?

I do not. In commissioning my Nvidia RTX 2080 on a Windows 10 host in October, I observed productivity improvement for 2X over 1X. Far more recently (yesterday) in commissioning an AMD RX 570 on a different Windows 10 host, I saw a clear performance improvement of 2X over 1X (about 8% throughput improvement, at a system power cost of only 2%, so a clear modest win overall). All my comments regard Einstein GPU GRP work on Windows, running an application which has not changed in many months.

I think your adverse results are not a consequence of some change in the behavior of recent data files, but the symptom of some fixable configuration problem in your system.

You're right. It's not about the data files. It's about SLI. I don't know how exactly SLI comes into play though.

shuhui1990

Joined: 16 Sep 06

Posts: 27

Credit: 3631456971

RAC: 0

mmonnin wrote:shuhui1990

28 Feb 2019 23:29:42 UTC

Message 169823 in response to message 169801

(moderation:

)

mmonnin wrote:

shuhui1990 wrote:
When running 1x the GPU utilization is 90% constantly. When running 2x/3x the utilization is 95%+.

I leave empty threads for GPU tasks. I don't think "CPU run time variation " is the problem. If the GPU doesn't get enough CPU support, the utilization drops.

https://einsteinathome.org/host/12699683/tasks/4/40?sort=desc&order=Run+time

When I run the 3x tasks reported on Feb 27th on this page, I suspended all CPU tasks. But the CPU time is still much less than the run time. And the run time 2220 is still longer than 3x660 when running 1x.

Or does the cpu_usage set in app_config not only affects how BOINC arranges tasks, but also how much CPU time GPU tasks ask? I thought they just grab what they need no matter what it's set in app_config.

At 90% I would expect some improvement with concurrent tasks. I run 2x and some people have reported slight gains with 3x and 4x but the returns really start to diminish past 2x.

The cpu_usage in an app_config only changes how many tasks BOINC will allow to run. If you have it set to 7 on an 8 thread machine then to run 2 tasks you would need 14 threads. Thats not possible so even if 0.5 GPUs you're still limited to 1 concurrent task as BOINC is requesting another 7 CPU threads. In no way does it change the CPU usage of a GPU exe.

Initially I thought I messed up the app_config. I used app_config to stop BOINC from going into panic mode where the CPU tasks pile up and it only allows one GPU task. But it turned out what changes the CPU usage is SLI.

shuhui1990

Joined: 16 Sep 06

Posts: 27

Credit: 3631456971

RAC: 0

After disabling SLI, my

1 Mar 2019 6:47:06 UTC

Message 169833

(moderation:

)

After disabling SLI, my initial conclusion remains true. Running 3x with SLI off isn't as bad as running 3x with SLI on. But it's still worse than running 1x. See the updated thread for data.

Running multiple WUs reduces performance

Forums › Cruncher's Corner

Comment viewing options

Forums › Cruncher's Corner