Generic GPU discusssion

Tom M

Joined: 2 Feb 06

Posts: 6600

Credit: 9670076500

RAC: 2441084

27 Sep 2021 14:38:12 UTC

Topic 226106

(moderation:

)

I hope this will be a good place for anyone to discuss the video cards we are using to process E@H data. My (new to me) current constraint is no more than 2 gpus per system. So I thought I would specifically start a Forum thread that doesn't specifically exclude anyone less than 3 gpus :) (like my other one started out too).

I now have a pair of Rx 5700's under Windows 10 that seems to be peaking at just past 900,000 RAC / GPU. That is with two tasks per GPU. I have just bumped it to 3 to see if I can squeeze out a bit more RAC

I believe I am getting this level of performance due to the version 1.28 Gamma Ray app the Petri/Ian&SteveC./Bernard has introduced for Windows.

Tom M

A Proud member of the O.F.A. (Old Farts Association). Be well, do good work, and keep in touch.® (Garrison Keillor) I want some more patience. RIGHT NOW!

Joseph Stateson

Joined: 7 May 07

Posts: 174

Credit: 3096221328

RAC: 817471

I looked at your first 5

10 Oct 2021 3:13:50 UTC

Message 189669

(moderation:

)

I looked at your first 5 pages of valid data and sorted the estimated completion time ascending

6 values 334-350 I am guessing this represents a single workunit

80 values 551-585 Can I assume a pair of concurrent work units?

13 values 814-830 This is probably the time to do 3 concurrent work units?

It looks to me that running 3 work units concurrently takes about 822/3 = 274

If I did the math right and am correct about your concurrency, then running 3 concurrent tasks finishes each task about 70 seconds faster.

I just tried two concurrent tasks on my NVidia p102-100 and my completion time more than doubled. So for me there is no benefit to running more than one tasks on the equivalent of GTX-1080Ti in Ubuntu with drive 470.

elapsed time (cpu time)

00:10:35 (00:10:33)   1C + 0.5NV (d2)   99.69   Reported: OK
00:09:46 (00:09:44)   1C + 0.5NV (d0)   99.66   Reported: OK

00:10:41 (00:10:40)   1C + 0.5NV (d1)   99.84   Reported: OK

00:10:36 (00:10:33)   1C + 0.5NV (d2)   99.53   Reported: OK

00:10:41 (00:10:38)   1C + 0.5NV (d1)   99.53   Reported: OK

00:10:08 (00:10:07)   1C + 1NV (d0)   99.84   Reported: OK

00:10:19 (00:10:17)   1C + 0.5NV (d0)   99.68   Reported: OK

00:07:15 (00:07:13)   1C + 1NV (d2)   99.54   Reported: OK

00:04:38 (00:04:36)   1C + 1NV (d1)   99.28   Reported: OK

00:04:39 (00:04:37)   1C + 1NV (d1)   99.28   Reported: OK

00:04:41 (00:04:39)   1C + 1NV (d2)   99.29   Reported: OK

00:04:40 (00:04:38)   1C + 1NV (d0)   99.29   Reported: OK

00:04:39 (00:04:37)   1C + 1NV (d1)   99.28   Reported: OK

nvidia-smi showed 2x as much video memory used but the gpu utilization went from 96% to 100% so obviously already maxed out.

Ian&Steve C.

Joined: 19 Jan 20

Posts: 4051

Credit: 48297345746

RAC: 33192367

AMD cards see a benefit with

10 Oct 2021 13:00:50 UTC

Message 189676

(moderation:

)

AMD cards see a benefit with running more tasks at a time, as the CPU component is fairly low, like 20-30%. Allowing to run many tasks at once on a single CPU thread.

Nvidia tasks use 100% of a CPU thread per task. running multiples on nvidia cards is not only wasteful for CPU resources, but also results in overall lower GPU production.

1x is better for Nvidia with Gamma Ray tasks.

Gravitational Wave tasks can see a small benefit with 2x but only if you properly stagger them to cover the CPU-only portion of the computation.

_________________________________________________________________________

Exard3k

Joined: 25 Jul 21

Posts: 66

Credit: 56155179

RAC: 0

Ian&Steve C.

12 Oct 2021 13:28:33 UTC

Message 189733 in response to message 189676

(moderation:

)

Ian&Steve C. wrote:

Nvidia tasks use 100% of a CPU thread per task.

They do. But it doesn't really impact performance on whether you run the task when full core is available or the CPU is sharing the core by SMT/Hyperthreading and other tasks. I tested this by resetting some nice values and putting my 4-core system with HT to a load of 24.00. No changes on GPU task runtime. I only encountered performance slowing down when starting new CPU tasks (the "warmup phase" before actually starting to get work done), which seems more demanding and looks like a memory bandwidth issue.

Ian&Steve C.

Joined: 19 Jan 20

Posts: 4051

Credit: 48297345746

RAC: 33192367

I disagree. I've seen tasks

12 Oct 2021 15:32:35 UTC

Message 189736

(moderation:

)

I disagree. I've seen tasks slowdown when CPU is 100% occupied + trying to run additional GPU tasks. some projects are more effected than others.

I make it a point to always allocate a full CPU thread for any nvidia GPU tasks. I even leave 1-2 threads doing nothing to allow for background tasks to have some threads and not impact BOINC computation.

_________________________________________________________________________

Exard3k

Joined: 25 Jul 21

Posts: 66

Credit: 56155179

RAC: 0

Ian&Steve C. wrote:I

12 Oct 2021 17:18:40 UTC

Message 189737 in response to message 189736

(moderation:

)

Ian&Steve C. wrote:

I disagree. I've seen tasks slowdown when CPU is 100% occupied + trying to run additional GPU tasks. some projects are more effected than others.

Really depends on your scheduler. I use the zen kernel which (among others like bfq IO scheduler)reduces latencies in the scheduler. Really keeps the important threads at full capacity during loads>=cpu_count. Might be the difference, considering I also have rather old cores going.

My new homeserver with a 5900x is coming this week, so I will also check this with 24 CPU threads + VMs + IO at the same time. Proxmox ships with stable or LTS, but as I'm testrunning everything anyway, I may just grab different kernels too.

Generic GPU discusssion

Forums › Cruncher's Corner

I looked at your first 5

AMD cards see a benefit with

Ian&Steve C.

I disagree. I've seen tasks

Ian&Steve C. wrote:I

Comment viewing options

Forums › Cruncher's Corner