Brp7/MeerKat 1x vs 2x crunching speeds

GWGeorge007
GWGeorge007
Joined: 8 Jan 18
Posts: 3065
Credit: 4971117686
RAC: 1427890

Tom M wrote: GWGeorge007

Tom M wrote:

GWGeorge007 wrote:

I just don't get why you are so obsessive with higher RAC values

Or lower RAC values since I have been stare-ring at the bottom RAC of the top 50 list too.

Since George bought a dual CPU MB for the purpose of possibly becoming very competitive in the universe at home project.  And an rtx 3090 hybrid to possibly compete with Freewill's rtx 3090 on e@h tasks....

I claim "kettle, pot, black"  ;)

Well... Okay... Yes... I did buy an EVGA RTX 3090 Hybrid to possibly compete with Freewill's RTX 3090 on E@H tasks...

But that was the only(?) one to which I did that for.  And don't forget, my Supermicro dual EPYC socket MB & EPYC 7532 (32C/64T) CPU combo does have a bad socket which I can't find a place to get it replaced.  So...  I'm down to a single EPYC 7532 CPU with 128MB of DRAM (instead of 256).

So...  I will lay down my accusation of "why you are so obsessive" for just this one instance.

My original statement remains intact.  "Why..."  Can you address the rest of my original statement?

George

Proud member of the Old Farts Association

Tom M
Tom M
Joined: 2 Feb 06
Posts: 6460
Credit: 9583167187
RAC: 7003495

GWGeorge007 wrote: Well...

GWGeorge007 wrote:

Well... Okay... Yes... I did buy an EVGA RTX 3090 Hybrid to possibly compete with Freewill's RTX 3090 on E@H tasks...

But that was the only(?) one to which I did that for.  And don't forget, my Supermicro dual EPYC socket MB & EPYC 7532 (32C/64T) CPU combo does have a bad socket which I can't find a place to get it replaced.  So...  I'm down to a single EPYC 7532 CPU with 128MB of DRAM (instead of 256).

So...  I will lay down my accusation of "why you are so obsessive" for just this one instance.

My original statement remains intact.  "Why..."  Can you address the rest of my original statement?

Ah, to explain away the rest of your statement would probable require a deep psychoanalysis.  And I am not prepared to lay on a couch that long...  ;)

While I have pondered leaving the Dual CPU MB as a cpu only cruncher.  I have also pondered committing to running at least one rtx 3080 ti on "every" system I usually run production Boinc on.  That would mean at least 5 rtx 3080 ti's and systems.

And oh my aching power bill.

So I think I really am only going to be able to sustain 3-4 systems at most.  And probably 2 or 3 systems.  One dual cpu cruncher.  One gpu cruncher and one single gpu cruncher.

Tom M

A Proud member of the O.F.A.  (Old Farts Association).  Be well, do good work, and keep in touch.® (Garrison Keillor)  I want some more patience. RIGHT NOW!

HWpecker
HWpecker
Joined: 27 Jan 22
Posts: 25
Credit: 77748827
RAC: 0

I decided to have a small

I decided to have a small spin on BOINC for a bit with BRP7/Meerkat.

OpenSUSE 15.5 with a RTX 3060 ti LHR and a small OC with different concurrencies all roughly have the same crunching times per WU tried 2/4/6, didn't try single unit (yet).

It is noticable that 4/6 units temporarily suspends the CPU. The few WUs so far weren't in a train 1by1, but all finished at once at the sime time.

 

greetings :)

Tom M
Tom M
Joined: 2 Feb 06
Posts: 6460
Credit: 9583167187
RAC: 7003495

HWpecker wrote: I decided to

HWpecker wrote:

I decided to have a small spin on BOINC for a bit with BRP7/Meerkat.

OpenSUSE 15.5 with a RTX 3060 ti LHR and a small OC with different concurrencies all roughly have the same crunching times per WU tried 2/4/6, didn't try single unit (yet).

It is noticeable that 4/6 units temporarily suspends the CPU. The few WUs so far weren't in a train 1by1, but all finished at once at the same time.

 

greetings :)

Hello, Peter,

Out curiosity I tried 3 All Skys on an rtx 3080 ti since the tasks use so little GPU ram. And often don't use the GPU fully.  After a few seconds the tasks started computation error-ing out. :(   Oh well.

Tom M

A Proud member of the O.F.A.  (Old Farts Association).  Be well, do good work, and keep in touch.® (Garrison Keillor)  I want some more patience. RIGHT NOW!

B.I.G
B.I.G
Joined: 26 Oct 07
Posts: 117
Credit: 1176619349
RAC: 994628

Ian&Steve C. wrote: AMD

Ian&Steve C. wrote:

AMD historically has benefited from multiples, while Nvidia has not (or not much). that probably still holds true here.

Well I didn't try 2 BRP7 WUs at the same time with my AMD Radeon PRO W7600 yet for the simple reason that a single BRP7 task already leads to higher GPU utilisation than 2 GW tasks so I see no point in even trying. When the GPU  load is at 95% and it's power draw is at the tdp there is nothing much to expect.

Maybe with more powerful cards this is difference, maybe there is a difference between the pro cards and the consumer ones.

Also the BRP7 tasks are more RAC efficient on this GPU than the GW tasks, but that historically always was the case.

HWpecker
HWpecker
Joined: 27 Jan 22
Posts: 25
Credit: 77748827
RAC: 0

I have been trying different

I have been trying different concurrencies for some more today, including 1 WU:

Good 'ol X79/Xeon E5-1680v2 underclocked @ 1,5GHz
3060ti LHR running on OpenSUSE 15.5 KDE having fun with Meerkat WUs.
Couldn't limit the power (no nvidia-smi), so I might as well OC that GPU with coolbits.

NVidia Settings
fanspeed 50/50
prefered mode: max performance (runs level 3, 4 not needed)
graphics clock offset +170 (Clock reports 2100)
Mem offset +1700 (Rate reports 15300)
Room temp 20C, GPU mostly 61C drops a bit when finishing WU
1 WU ~620s OC, system+ 490 Watt
2 WU ~651s OC, system+ 499 Watt
4 WU ~641s OC, system+ 508 Watt
6 WU ~637s OC, system+ 512 Watt

A single BRP7/Meerkat WU seems to average around 910+MB in size.

 

Edit: system without that plus at 1 WU 238 Watt :)

Greetings :)

 

HWpecker
HWpecker
Joined: 27 Jan 22
Posts: 25
Credit: 77748827
RAC: 0

Now I did 1x/2x on win10p

Now I did 1x/2x on win10p with the same clockspeeds as OpenSUSE Leap 15.5

1x was way slower and quickly upped that vs 2x, 6x was even better at 240 Watts

CPU behaved way better under win10p, no suspending/busy.

Same NVidia Settings ASUS GPU Tweak
fanspeed 50/50
prefered mode: max performance
graphics clock offset +170 (Clock reports 2100)
Mem offset +1700 (Rate reports 15300)
Room temp 20C, GPU mostly 59C drops a bit when finishing WU

1 WU ~700s OC, system 240 Watt quickly moved on to 2 WUs
2 WU ~547s OC, system 240 Watt
4 WU ~53xish OC, system 240 Watt quickly moved to 6 WUs
6 WU ~531s OC, system 240 Watt

Now crunching 10 WU concurrent, and clocked the CPU down to 1,2 GHz

 

A single BRP7-cuda55 Win10P WU is 555MB in size for the same 3333 credits, does that mean there's an Einstein@home RAC-subsidy on windows users?

 

:P

Ian&Steve C.
Ian&Steve C.
Joined: 19 Jan 20
Posts: 3958
Credit: 47026562642
RAC: 65080395

you should have compared the

you should have compared the Linux CUDA app to the windows CUDA app. on your linux tests, you ran the opencl app, which has much higher CPU utilization per task (and will cause more system power use due to higher number of tasks and higher overall CPU use). to get the linux cuda app, you need to enable test/beta applications in your project preferences.

in my testing, the CUDA app is fairly low CPU utilization, and a bit faster too.

_________________________________________________________________________

HWpecker
HWpecker
Joined: 27 Jan 22
Posts: 25
Credit: 77748827
RAC: 0

Thank you for your

Thank you for your reply,

I've been simply slapping on Leap15.5 twice KDE and IceWM (no different results there), get updates and boinc working and see how it goes.

Now back and sticking for a while to Win10p which is more convenient here for other users, all while this machine is acting as a small more useful crunching heater too as autumn is kicking in.

Indeed that opencl was getting 100% on a CPU-core triggering suspension/busy status messages. I don't know if PCIe NVME could play a role on this older system (updating bios enabled NVMe, yay)

 

greetings :)

HWpecker
HWpecker
Joined: 27 Jan 22
Posts: 25
Credit: 77748827
RAC: 0

Not touching that machine for

Not touching that machine for a while get's the average time per WU to ~520s with 10 concurrent.

Another difference to my previous post is the CPU is clocked further down from 1,5 to 1,2 GHz.

Should be a good 160+ BRP7 units a day.

 

@Ian&Steve,

How big are those CUDA chunks? What works better for you, 1 at a time? Or multiple?

 

greetings :)

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.