Benchmarking List for GPUs and Gamma-ray Pulsar binary search

Gary Roberts

Moderator

Joined: 9 Feb 05

Posts: 5872

Credit: 117483933806

RAC: 35497203

petri33 wrote:The variation

11 Feb 2022 1:14:00 UTC

Message 192585 in response to message 192554

(moderation:

)

petri33 wrote:

The variation in the tasks of 3-series seems to be relate to the ...

The 884.0 parameter is to do with the spin frequency (in Hz) of the potential pulsar being teased out of the data.

The data file (LATeah3012L00.dat) gets scanned for a whole range of different potential frequencies in steps of 8Hz. From my perhaps unreliable memory, the starting value was somewhere around 700Hz and the highest value was around 900Hz for this particular data set. It can have different starting and ending values for different data files.

The crunch time shows a small increase with each higher frequency step being analysed. I think the crunch time is reasonably constant if you use the same frequency value for different tasks that have the same "3012Lxx" part in the name, irrespective of the final two 'xx' digits, 00, 01, 02, etc. I haven't specifically checked that, but it does appear that way.

if you grab a bunch of tasks when a particular frequency value is in play (they last for several hours usually) and abort any resends that have different frequency values, you should end up with tasks that have the same basic crunch time.

Cheers,
Gary.

Exard3k

Joined: 25 Jul 21

Posts: 66

Credit: 56155179

RAC: 0

I wonder if we get someone

11 Feb 2022 22:01:10 UTC

Message 192618

(moderation:

)

I wonder if we get someone with an AMD MI100 or 250 to provide some numbers.Not that I'm ever able to afford them, but the stats look promising.

petri33

Joined: 4 Mar 20

Posts: 123

Credit: 4018795819

RAC: 7093036

Thanks Gary! Now I

11 Feb 2022 22:09:17 UTC

Message 192619 in response to message 192585

(moderation:

)

Thanks Gary!

Now I understand another bit more.

petri33

tictoc

Joined: 1 Jan 13

Posts: 44

Credit: 7180075428

RAC: 7805409

Here are a few numbers at the

12 Feb 2022 2:17:41 UTC

Message 192629

(moderation:

)

Here are a few numbers at the far ends of clock/power levels on a Radeon VII.

Running 4x tasks, so times posted are the effective single task runtimes (1/4 of actual runtime).

Arch Linux | Kernel 5.17-rc3 | ROCm 5.0

CPU: Threadripper 3960X @ 4200 MHz all-core

GPU: Radeon VII

Effective Runtime - 98s | 1988 MHz Avg Core Clock | 1150 MHz Mem Clock | 358W Avg Power Consumption

Effective Runtime - 117s | 1621 MHz Avg Core Clock | 1000 MHz Mem Clock | 172W Avg Power Consumption

The faster OC is not at all optimized, and I know I can dial back the volts and keep the clocks at nearly the same speed. Something like 1980 MHz at 290-300W should be stable on Einstein, but I haven't had time to mess around with the OC.

Ian&Steve C.

Joined: 19 Jan 20

Posts: 3945

Credit: 46635632642

RAC: 64209435

Ian&Steve C. wrote: I run 3x

12 Feb 2022 21:18:33 UTC

Message 192650 in response to message 192547

(moderation:

)

Ian&Steve C. wrote:

I run 3x per GPU with these current tasks times listed are 1/3 of actual runtime to reflect effective task speed

My RTX 3080Tis do them in about 90 seconds per task. (+50 core, +1000 mem, 300W PL)

my RTX 3070Tis do them in about 128 seconds per task (+50 core, +1000 mem, 230W PL)

my RTX 2080Tis do them in about 142 seconds per task. (+50 core, +400 mem, 225W PL)

(note: these are times for the current LAH3000 series tasks, 4000 series run slower)

Using Petri's further optimized code, I've now gotten this down to (average):

3080Ti: 82 secnds

3070Ti: 117 seconds

2080Ti: 127 seconds

at the same power levels as before. faster for free :)

_________________________________________________________________________

Tom M

Joined: 2 Feb 06

Posts: 6432

Credit: 9562197887

RAC: 9768072

Ian&Steve C. wrote: Using

13 Feb 2022 16:08:23 UTC

Message 192678 in response to message 192650

(moderation:

)

Ian&Steve C. wrote:

Using Petri's further optimized code, I've now gotten this down to (average):

Sounds like a major argument for switching back to Linux.

Sigh.

Tom M

A Proud member of the O.F.A. (Old Farts Association). Be well, do good work, and keep in touch.® (Garrison Keillor) I want some more patience. RIGHT NOW!

Ian&Steve C.

Joined: 19 Jan 20

Posts: 3945

Credit: 46635632642

RAC: 64209435

Tom M wrote: Ian&Steve C.

13 Feb 2022 17:21:38 UTC

Message 192689 in response to message 192678

(moderation:

)

Tom M wrote:

Ian&Steve C. wrote:

Using Petri's further optimized code, I've now gotten this down to (average):

Sounds like a major argument for switching back to Linux.

Sigh.

Tom M

about 10% improvement over his previous optimized code. only 5% on the 3070Ti, but that model seems to have hit a memory bandwidth bottleneck when looking at the GPU metrics.

he's a wizard with nvidia GPU code!

if Petri allows it, there's likely to be a more public release, but be advised that it will likely be Linux/Nvidia only and will require running his custom application under Anonymous Platform. (unless the project devs are able to decipher his code and roll it into their standard apps, but SETI devs didn't have the time or ability to do this, so I wont be surprised if it's the same situation at Einstein)

_________________________________________________________________________

Markus Windisch

Joined: 23 Aug 21

Posts: 61

Credit: 97881372

RAC: 0

Ian&Steve C. schrieb: about

13 Feb 2022 22:48:07 UTC

Message 192699 in response to message 192689

(moderation:

)

Ian&Steve C. wrote:

about 10% improvement over his previous optimized code. only 5% on the 3070Ti, but that model seems to have hit a memory bandwidth bottleneck when looking at the GPU metrics.

he's a wizard with nvidia GPU code!

This is amazing!

Tom M

Joined: 2 Feb 06

Posts: 6432

Credit: 9562197887

RAC: 9768072

Windows with rxt 3080 ti (2)

14 Feb 2022 2:07:26 UTC

Message 192712

(moderation:

)

Windows with rxt 3080 ti (2) and rtx 3080

Running two threads per gpu I appear to be getting a range of 208s to 226s.

208 / 2 = 104s equivalent ( 1m45s ~).

I have switched to Zorin (core) a Windows-like flavor of Ubuntu and seem to be getting someplace ~95s equivalent.

Tom M

A Proud member of the O.F.A. (Old Farts Association). Be well, do good work, and keep in touch.® (Garrison Keillor) I want some more patience. RIGHT NOW!

HWpecker

Joined: 27 Jan 22

Posts: 25

Credit: 77748827

RAC: 0

RTX 3060ti LHR 8GB Win10P

15 Feb 2022 19:26:34 UTC

Message 192777

(moderation:

)

RTX 3060ti LHR 8GB

Win10P tried upto 9x tasks on GPU
don't do 1x or 2x, underutilized

4x does better than 3x due to the slow last 10% on any single GPU task.

4x upto 9x pretty much no difference on average per task

normal clock around 180s/WU

OC around 160s / 165s average per WU

Benchmarking List for GPUs and Gamma-ray Pulsar binary search

Forums › Cruncher's Corner

Comment viewing options

Forums › Cruncher's Corner