Benchmarking List for GPUs and Gamma-ray Pulsar binary search

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5872
Credit: 117592553134
RAC: 35208049

petri33 wrote:The variation

petri33 wrote:
The variation in the tasks of 3-series seems to be relate to the ...

The 884.0 parameter is to do with the spin frequency (in Hz) of the potential pulsar being teased out of the data.

The data file (LATeah3012L00.dat) gets scanned for a whole range of different potential frequencies in steps of 8Hz.  From my perhaps unreliable memory, the starting value was somewhere around 700Hz and the highest value was around 900Hz for this particular data set.  It can have different starting and ending values for different data files.

The crunch time shows a small increase with each higher frequency step being analysed.  I think the crunch time is reasonably constant if you use the same frequency value for different tasks that have the same "3012Lxx" part in the name, irrespective of the final two 'xx' digits, 00, 01, 02, etc.  I haven't specifically checked that, but it does appear that way.

if you grab a bunch of tasks when a particular frequency value is in play (they last for several hours usually) and abort any resends that have different frequency values, you should end up with tasks that have the same basic crunch time.

Cheers,
Gary.

Exard3k
Exard3k
Joined: 25 Jul 21
Posts: 66
Credit: 56155179
RAC: 0

I wonder if we get someone

I wonder if we get someone with an AMD MI100 or 250 to provide some numbers.Not that I'm ever able to afford them, but the stats look promising.

petri33
petri33
Joined: 4 Mar 20
Posts: 123
Credit: 4040315819
RAC: 7028452

Thanks Gary! Now I

Thanks Gary!

Now I understand another bit more.

--

petri33

tictoc
tictoc
Joined: 1 Jan 13
Posts: 44
Credit: 7202322078
RAC: 7607123

Here are a few numbers at the

Here are a few numbers at the far ends of clock/power levels on a Radeon VII.

Running 4x tasks, so times posted are the effective single task runtimes (1/4 of actual runtime).

 

Arch Linux | Kernel 5.17-rc3 | ROCm 5.0

CPU: Threadripper 3960X @ 4200 MHz all-core

GPU: Radeon VII

Effective Runtime - 98s | 1988 MHz Avg Core Clock | 1150 MHz Mem Clock | 358W Avg Power Consumption

Effective Runtime - 117s | 1621 MHz Avg Core Clock | 1000 MHz Mem Clock | 172W Avg Power Consumption

 

The faster OC is not at all optimized, and I know I can dial back the volts and keep the clocks at nearly the same speed.  Something like 1980 MHz at 290-300W should be stable on Einstein, but I haven't had time to mess around with the OC. 

Ian&Steve C.
Ian&Steve C.
Joined: 19 Jan 20
Posts: 3953
Credit: 46839512642
RAC: 64339246

Ian&Steve C. wrote: I run 3x

Ian&Steve C. wrote:

I run 3x per GPU with these current tasks  times listed are 1/3 of actual runtime to reflect effective task speed  


My RTX 3080Tis do them in about 90 seconds per task. (+50 core, +1000 mem, 300W PL)

my RTX 3070Tis do them in about 128 seconds per task (+50 core, +1000 mem, 230W PL)

my RTX 2080Tis do them in about 142 seconds per task. (+50 core, +400 mem, 225W PL)

 

(note: these are times for the current LAH3000 series tasks, 4000 series run slower) 

Using Petri's further optimized code, I've now gotten this down to (average):

 

3080Ti: 82 secnds

3070Ti: 117 seconds

2080Ti: 127 seconds

 

at the same power levels as before. faster for free :)

_________________________________________________________________________

Tom M
Tom M
Joined: 2 Feb 06
Posts: 6443
Credit: 9574007795
RAC: 8178231

Ian&Steve C. wrote: Using

Ian&Steve C. wrote:

Using Petri's further optimized code, I've now gotten this down to (average):

Sounds like a major argument for switching back to Linux.

Sigh.

Tom M

A Proud member of the O.F.A.  (Old Farts Association).  Be well, do good work, and keep in touch.® (Garrison Keillor)  I want some more patience. RIGHT NOW!

Ian&Steve C.
Ian&Steve C.
Joined: 19 Jan 20
Posts: 3953
Credit: 46839512642
RAC: 64339246

Tom M wrote: Ian&Steve C.

Tom M wrote:

Ian&Steve C. wrote:

Using Petri's further optimized code, I've now gotten this down to (average):

Sounds like a major argument for switching back to Linux.

Sigh.

Tom M

about 10% improvement over his previous optimized code. only 5% on the 3070Ti, but that model seems to have hit a memory bandwidth bottleneck when looking at the GPU metrics.

 

he's a wizard with nvidia GPU code!

 

if Petri allows it, there's likely to be a more public release, but be advised that it will likely be Linux/Nvidia only and will require running his custom application under Anonymous Platform. (unless the project devs are able to decipher his code and roll it into their standard apps, but SETI devs didn't have the time or ability to do this, so I wont be surprised if it's the same situation at Einstein)

_________________________________________________________________________

Markus Windisch
Markus Windisch
Joined: 23 Aug 21
Posts: 61
Credit: 97881372
RAC: 0

Ian&Steve C. schrieb: about

Ian&Steve C. wrote:

about 10% improvement over his previous optimized code. only 5% on the 3070Ti, but that model seems to have hit a memory bandwidth bottleneck when looking at the GPU metrics.

he's a wizard with nvidia GPU code!

This is amazing!

Tom M
Tom M
Joined: 2 Feb 06
Posts: 6443
Credit: 9574007795
RAC: 8178231

Windows with rxt 3080 ti (2)

Windows with rxt 3080 ti (2) and rtx 3080

Running two threads per gpu I appear to be getting a range of 208s to 226s.

208 / 2 = 104s equivalent ( 1m45s ~).

I have switched to Zorin (core) a Windows-like flavor of Ubuntu and seem to be getting someplace ~95s equivalent.

Tom M

A Proud member of the O.F.A.  (Old Farts Association).  Be well, do good work, and keep in touch.® (Garrison Keillor)  I want some more patience. RIGHT NOW!

HWpecker
HWpecker
Joined: 27 Jan 22
Posts: 25
Credit: 77748827
RAC: 0

RTX 3060ti LHR 8GB Win10P

RTX 3060ti LHR 8GB

Win10P tried upto 9x tasks on GPU
don't do 1x or 2x, underutilized

4x does better than 3x due to the slow last 10% on any single GPU task.

4x upto 9x pretty much no difference on average per task

normal clock around 180s/WU

OC around 160s / 165s average per WU

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.