The variation in the tasks of 3-series seems to be relate to the ...
The 884.0 parameter is to do with the spin frequency (in Hz) of the potential pulsar being teased out of the data.
The data file (LATeah3012L00.dat) gets scanned for a whole range of different potential frequencies in steps of 8Hz. From my perhaps unreliable memory, the starting value was somewhere around 700Hz and the highest value was around 900Hz for this particular data set. It can have different starting and ending values for different data files.
The crunch time shows a small increase with each higher frequency step being analysed. I think the crunch time is reasonably constant if you use the same frequency value for different tasks that have the same "3012Lxx" part in the name, irrespective of the final two 'xx' digits, 00, 01, 02, etc. I haven't specifically checked that, but it does appear that way.
if you grab a bunch of tasks when a particular frequency value is in play (they last for several hours usually) and abort any resends that have different frequency values, you should end up with tasks that have the same basic crunch time.
Here are a few numbers at the far ends of clock/power levels on a Radeon VII.
Running 4x tasks, so times posted are the effective single task runtimes (1/4 of actual runtime).
Arch Linux | Kernel 5.17-rc3 | ROCm 5.0
CPU: Threadripper 3960X @ 4200 MHz all-core
GPU: Radeon VII
Effective Runtime - 98s | 1988 MHz Avg Core Clock | 1150 MHz Mem Clock | 358W Avg Power Consumption
Effective Runtime - 117s | 1621 MHz Avg Core Clock | 1000 MHz Mem Clock | 172W Avg Power Consumption
The faster OC is not at all optimized, and I know I can dial back the volts and keep the clocks at nearly the same speed. Something like 1980 MHz at 290-300W should be stable on Einstein, but I haven't had time to mess around with the OC.
Using Petri's further optimized code, I've now gotten this down to (average):
Sounds like a major argument for switching back to Linux.
Sigh.
Tom M
about 10% improvement over his previous optimized code. only 5% on the 3070Ti, but that model seems to have hit a memory bandwidth bottleneck when looking at the GPU metrics.
he's a wizard with nvidia GPU code!
if Petri allows it, there's likely to be a more public release, but be advised that it will likely be Linux/Nvidia only and will require running his custom application under Anonymous Platform. (unless the project devs are able to decipher his code and roll it into their standard apps, but SETI devs didn't have the time or ability to do this, so I wont be surprised if it's the same situation at Einstein)
about 10% improvement over his previous optimized code. only 5% on the 3070Ti, but that model seems to have hit a memory bandwidth bottleneck when looking at the GPU metrics.
petri33 wrote:The variation
)
The 884.0 parameter is to do with the spin frequency (in Hz) of the potential pulsar being teased out of the data.
The data file (LATeah3012L00.dat) gets scanned for a whole range of different potential frequencies in steps of 8Hz. From my perhaps unreliable memory, the starting value was somewhere around 700Hz and the highest value was around 900Hz for this particular data set. It can have different starting and ending values for different data files.
The crunch time shows a small increase with each higher frequency step being analysed. I think the crunch time is reasonably constant if you use the same frequency value for different tasks that have the same "3012Lxx" part in the name, irrespective of the final two 'xx' digits, 00, 01, 02, etc. I haven't specifically checked that, but it does appear that way.
if you grab a bunch of tasks when a particular frequency value is in play (they last for several hours usually) and abort any resends that have different frequency values, you should end up with tasks that have the same basic crunch time.
Cheers,
Gary.
I wonder if we get someone
)
I wonder if we get someone with an AMD MI100 or 250 to provide some numbers.Not that I'm ever able to afford them, but the stats look promising.
Thanks Gary! Now I
)
Thanks Gary!
Now I understand another bit more.
--
petri33
Here are a few numbers at the
)
Here are a few numbers at the far ends of clock/power levels on a Radeon VII.
Running 4x tasks, so times posted are the effective single task runtimes (1/4 of actual runtime).
Arch Linux | Kernel 5.17-rc3 | ROCm 5.0
CPU: Threadripper 3960X @ 4200 MHz all-core
GPU: Radeon VII
Effective Runtime - 98s | 1988 MHz Avg Core Clock | 1150 MHz Mem Clock | 358W Avg Power Consumption
Effective Runtime - 117s | 1621 MHz Avg Core Clock | 1000 MHz Mem Clock | 172W Avg Power Consumption
The faster OC is not at all optimized, and I know I can dial back the volts and keep the clocks at nearly the same speed. Something like 1980 MHz at 290-300W should be stable on Einstein, but I haven't had time to mess around with the OC.
Ian&Steve C. wrote: I run 3x
)
Using Petri's further optimized code, I've now gotten this down to (average):
3080Ti: 82 secnds
3070Ti: 117 seconds
2080Ti: 127 seconds
at the same power levels as before. faster for free :)
_________________________________________________________________________
Ian&Steve C. wrote: Using
)
Sounds like a major argument for switching back to Linux.
Sigh.
Tom M
A Proud member of the O.F.A. (Old Farts Association). Be well, do good work, and keep in touch.® (Garrison Keillor)
Tom M wrote: Ian&Steve C.
)
about 10% improvement over his previous optimized code. only 5% on the 3070Ti, but that model seems to have hit a memory bandwidth bottleneck when looking at the GPU metrics.
he's a wizard with nvidia GPU code!
if Petri allows it, there's likely to be a more public release, but be advised that it will likely be Linux/Nvidia only and will require running his custom application under Anonymous Platform. (unless the project devs are able to decipher his code and roll it into their standard apps, but SETI devs didn't have the time or ability to do this, so I wont be surprised if it's the same situation at Einstein)
_________________________________________________________________________
Ian&Steve C. schrieb: about
)
This is amazing!
Windows with rxt 3080 ti (2)
)
Windows with rxt 3080 ti (2) and rtx 3080
Running two threads per gpu I appear to be getting a range of 208s to 226s.
208 / 2 = 104s equivalent ( 1m45s ~).
I have switched to Zorin (core) a Windows-like flavor of Ubuntu and seem to be getting someplace ~95s equivalent.
Tom M
A Proud member of the O.F.A. (Old Farts Association). Be well, do good work, and keep in touch.® (Garrison Keillor)
RTX 3060ti LHR 8GB Win10P
)
RTX 3060ti LHR 8GB
Win10P tried upto 9x tasks on GPU
don't do 1x or 2x, underutilized
4x does better than 3x due to the slow last 10% on any single GPU task.
4x upto 9x pretty much no difference on average per task
normal clock around 180s/WU
OC around 160s / 165s average per WU