Generic GPU discusssion

Tom M
Tom M
Joined: 2 Feb 06
Posts: 6,368
Credit: 9,370,075,510
RAC: 16,941,288
Topic 226106

I hope this will be a good place for anyone to discuss the video cards we are using to process E@H data.  My (new to me) current constraint is no more than 2 gpus per system.  So I thought I would specifically start a Forum thread that doesn't specifically exclude anyone less than 3 gpus :) (like my other one started out too).

I now have a pair of Rx 5700's under Windows 10 that seems to be peaking at just past 900,000 RAC / GPU.  That is with two tasks per GPU. I have just bumped it to 3 to see if I can squeeze out a bit more RAC

I believe I am getting this level of performance due to the version 1.28 Gamma Ray app the Petri/Ian&SteveC./Bernard has introduced for Windows.

Tom M

 

 

A Proud member of the O.F.A.  (Old Farts Association).  Be well, do good work, and keep in touch.® (Garrison Keillor)  I want some more patience. RIGHT NOW!

Joseph Stateson
Joseph Stateson
Joined: 7 May 07
Posts: 174
Credit: 3,058,874,783
RAC: 300,876

I looked at your first 5

I looked at your first 5 pages of valid data and sorted the estimated completion time ascending

 6 values 334-350  I am guessing this represents a single workunit

80 values 551-585  Can I assume a pair of concurrent work units?

13 values 814-830  This is probably the time to do 3 concurrent work units?

 

It looks to me that running 3 work units concurrently takes about 822/3 = 274

If I did the math right and am correct about your concurrency, then running 3 concurrent tasks finishes each task about 70 seconds faster.

 

I just tried two concurrent tasks on my NVidia p102-100 and my completion time more than doubled.  So for me there is no benefit to running more than one tasks on the equivalent of GTX-1080Ti in Ubuntu with drive 470.

elapsed time (cpu time)

00:10:35 (00:10:33)    1C + 0.5NV (d2)    99.69    Reported: OK    
00:09:46 (00:09:44)    1C + 0.5NV (d0)    99.66    Reported: OK    

00:10:41 (00:10:40)    1C + 0.5NV (d1)    99.84    Reported: OK    

00:10:36 (00:10:33)    1C + 0.5NV (d2)    99.53    Reported: OK    

00:10:41 (00:10:38)    1C + 0.5NV (d1)    99.53    Reported: OK    

00:10:08 (00:10:07)    1C + 1NV (d0)    99.84    Reported: OK    

00:10:19 (00:10:17)    1C + 0.5NV (d0)    99.68    Reported: OK    

00:07:15 (00:07:13)    1C + 1NV (d2)    99.54    Reported: OK

00:04:38 (00:04:36)    1C + 1NV (d1)    99.28    Reported: OK    

00:04:39 (00:04:37)    1C + 1NV (d1)    99.28    Reported: OK    

00:04:41 (00:04:39)    1C + 1NV (d2)    99.29    Reported: OK    

00:04:40 (00:04:38)    1C + 1NV (d0)    99.29    Reported: OK    

00:04:39 (00:04:37)    1C + 1NV (d1)    99.28    Reported: OK


 

 

nvidia-smi showed 2x as much video memory used but the gpu utilization went from 96% to 100% so obviously already maxed out.

 

Ian&Steve C.
Ian&Steve C.
Joined: 19 Jan 20
Posts: 3,926
Credit: 45,530,622,642
RAC: 63,198,247

AMD cards see a benefit with

AMD cards see a benefit with running more tasks at a time, as the CPU component is fairly low, like 20-30%. Allowing to run many tasks at once on a single CPU thread. 
 

Nvidia tasks use 100% of a CPU thread per task. running multiples on nvidia cards is not only wasteful for CPU resources, but also results in overall lower GPU production. 
 

1x is better for Nvidia with Gamma Ray tasks.

Gravitational Wave tasks can see a small benefit with 2x but only if you properly stagger them to cover the CPU-only portion of the computation. 

_________________________________________________________________________

Exard3k
Exard3k
Joined: 25 Jul 21
Posts: 66
Credit: 56,155,179
RAC: 0

Ian&Steve C.

Ian&Steve C. wrote:

 

Nvidia tasks use 100% of a CPU thread per task.

 

They do. But it doesn't really impact performance on whether you run the task when full core is available or the CPU is sharing the core by SMT/Hyperthreading and other tasks. I tested this by resetting some nice values and putting my 4-core system with HT to a load of 24.00. No changes on GPU task runtime. I only encountered performance slowing down when starting new CPU tasks (the "warmup phase" before actually starting to get work done), which seems more demanding and looks like a memory bandwidth issue.

Ian&Steve C.
Ian&Steve C.
Joined: 19 Jan 20
Posts: 3,926
Credit: 45,530,622,642
RAC: 63,198,247

I disagree. I've seen tasks

I disagree. I've seen tasks slowdown when CPU is 100% occupied + trying to run additional GPU tasks.  some projects are more effected than others.

 

I make it a point to always allocate a full CPU thread for any nvidia GPU tasks. I even leave 1-2 threads doing nothing to allow for background tasks to have some threads and not impact BOINC computation.

_________________________________________________________________________

Exard3k
Exard3k
Joined: 25 Jul 21
Posts: 66
Credit: 56,155,179
RAC: 0

Ian&Steve C. wrote:I

Ian&Steve C. wrote:

I disagree. I've seen tasks slowdown when CPU is 100% occupied + trying to run additional GPU tasks.  some projects are more effected than others.

Really depends on your scheduler. I use the zen kernel which (among others like bfq IO scheduler)reduces latencies in the scheduler. Really keeps the important threads at full capacity during loads>=cpu_count. Might be the difference, considering I also have rather old cores going.

 

My new homeserver with a 5900x is coming this week, so I will also check this with 24 CPU threads + VMs + IO at the same time. Proxmox ships with stable or LTS, but as I'm testrunning everything anyway, I may just grab different kernels too.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.