Top Production apps OS3GW or Brp7-meerKat - Discussion

Tom M
Tom M
Joined: 2 Feb 06
Posts: 6528
Credit: 9631291505
RAC: 2870282

Ian& SteveC, Right now I

Ian& SteveC,

Right now I am running at 20 percent or less CPU load which give 3.1+ MHz on the active tasks. I hope to get up to a higher CPU load later.

Running WCG since Rosetta has been down.

A Proud member of the O.F.A.  (Old Farts Association).  Be well, do good work, and keep in touch.® (Garrison Keillor)  I want some more patience. RIGHT NOW!

Tom M
Tom M
Joined: 2 Feb 06
Posts: 6528
Credit: 9631291505
RAC: 2870282

It is clear that running on

It is clear that running on "real" cores when you are running the 1.07/1.08/1.15 versions the tasks run more quickly.

I am running a 4 Titan V box with no other tasks.  It has an Epyc 7282 (16c/32t) cpu right now.

When I run it on Nvidia MPS server with 4x per GPU the time each task takes approximates the top performing Titan V boxes of the Top 50.

They are running 5x not 4x.  When I run 5x my tasks split between under 2000s and over 2000s both of which are slower than the top performer task times.

So it looks like I need more "real" cpu cores.  The choices in ascending sequence of cost are 7402 (24c/48t), 7f72 (24c/48t, 3.6MHz) and a 7742-class cpu (64c/128t).

Anyone of these choices will raise production.  In theory and probably practice the 7f72 will raise production the most.  But provide the least increase available to run any cpu-only tasks.

Comments?

 

A Proud member of the O.F.A.  (Old Farts Association).  Be well, do good work, and keep in touch.® (Garrison Keillor)  I want some more patience. RIGHT NOW!

pututu
pututu
Joined: 6 Apr 17
Posts: 67
Credit: 653417392
RAC: 1

Tom M wrote: It is clear

Tom M wrote:

It is clear that running on "real" cores when you are running the 1.07/1.08/1.15 versions the tasks run more quickly.

I am running a 4 Titan V box with no other tasks.  It has an Epyc 7282 (16c/32t) cpu right now.

When I run it on Nvidia MPS server with 4x per GPU the time each task takes approximates the top performing Titan V boxes of the Top 50.

They are running 5x not 4x.  When I run 5x my tasks split between under 2000s and over 2000s both of which are slower than the top performer task times.

So it looks like I need more "real" cpu cores.  The choices in ascending sequence of cost are 7402 (24c/48t), 7f72 (24c/48t, 3.6MHz) and a 7742-class cpu (64c/128t).

Anyone of these choices will raise production.  In theory and probably practice the 7f72 will raise production the most.  But provide the least increase available to run any cpu-only tasks.

Comments?

 

When I ran my VII, I disabled HT on my 7950x. Could have set 50% cpu utilization but that's just me. I suspect the cpu portion of the statistical recalculation in O3AS uses heavy floating point arithmetic, so each real core is fully occupied that enabling hyperthreading or running cpu at 100% (HT on) doesn't really help. Project like CPDN (climateprediction.net) and PrimeGrid PG LLR cpu subprojects use heavy floating point arithmetic. There are a few feedback posted in einstein forum earlier on about cpu portion of the O3AS taking so long than expected, even if the hosts have fast cpu. I suspect they could run other boinc projects which need heavy floating point arithmetic.

Tom M
Tom M
Joined: 2 Feb 06
Posts: 6528
Credit: 9631291505
RAC: 2870282

It looks like 3x may be the

It looks like 3x may be the newest maximum production point with latest LF data being processed.

I would be interested in the 1x results from a Nvidia Windows box running the GW task.

As well as the 1x results from running the 1.15/1.08 version under Linux. (All GPU)

After all there might be a way to regain much more production under the new LF data.

Tom M

===edit===

In other conversations it has been suggested you explore brp7/meerKat to see if you like it's production better.  This is especially true if you have the petri cuda optimized version available (Linux).

A Proud member of the O.F.A.  (Old Farts Association).  Be well, do good work, and keep in touch.® (Garrison Keillor)  I want some more patience. RIGHT NOW!

stsfred
stsfred
Joined: 1 Apr 23
Posts: 5
Credit: 90750270
RAC: 269465

new workunits takes longer to

new workunits takes longer to complete by ~40% and consumes 40% more power while grants only 4000 credits instead of 10000. It is under windows 10, 4x tasks run in parallel. GPU is 4070TiS, CPU is 3950x, running WCG MCM, too.

I switched back to meerkat brp7 as it is no longer as effective as was brp7.

 

Boca Raton Community HS
Boca Raton Comm...
Joined: 4 Nov 15
Posts: 263
Credit: 10818365813
RAC: 13275146

Tom M wrote: It looks like

Tom M wrote:

It looks like 3x may be the newest maximum production point with latest LF data being processed.

I would be interested in the 1x results from a Nvidia Windows box running the GW task.

As well as the 1x results from running the 1.15/1.08 version under Linux. (All GPU)

After all there might be a way to regain much more production under the new LF data.

Tom M

===edit===

In other conversations it has been suggested you explore brp7/meerKat to see if you like it's production better.  This is especially true if you have the petri cuda optimized version available (Linux).

 

Here is what we came up with to help answer your question. This comparison is not exact (somewhat apples to oranges). The times included show ONLY the time on the GPU in order to negate the impact of different speed CPUs (I did not include the recalc step- I stopped timing when it hit 99%). 

Host 1: Threadripper PRO 5965WX, Windows 11 Pro, RTX A4500, 1.07 opencl running at 1x = 2040 seconds  

Host 2: Xeon(R) Gold 6258R, Linux Mint, RTX A4500, 1.08 cuda, no MPS running 1x = 1700 seconds

Host 3: Threadripper PRO 5995WX, Linux Mint, RTX A6000, 1.08 cuda, no MPS running 1x = 1440 seconds

I think the comparison of host 1 to 2 is the most valid comparison since they are using the same GPU. Host 3 was included for the sake of additional information only. Initial observation- although the cuda app on Linux is still more productive, the gap closed significantly versus the HF work units. 

Harri Liljeroos
Harri Liljeroos
Joined: 10 Dec 05
Posts: 4429
Credit: 3247192722
RAC: 1743199

My RTX 4070 Ti Super on Win

My RTX 4070 Ti Super on Win 11 Pro and AMD 7950X used to do OAS tasks in 550 seconds and they take now 1270 seconds. I tested a few LF tasks with 2 at a time and they took about 2650 seconds each. So running one at a time is more productive.

Tom M
Tom M
Joined: 2 Feb 06
Posts: 6528
Credit: 9631291505
RAC: 2870282

Boca Raton HS has a single

Boca Raton HS has a single Titan V GPU getting really good results now that the os3gw tasks are getting 20,000 / valid task.

1.5M+ RAC 

A Proud member of the O.F.A.  (Old Farts Association).  Be well, do good work, and keep in touch.® (Garrison Keillor)  I want some more patience. RIGHT NOW!

Boca Raton Community HS
Boca Raton Comm...
Joined: 4 Nov 15
Posts: 263
Credit: 10818365813
RAC: 13275146

Tom M wrote:Boca Raton HS

Tom M wrote:

Boca Raton HS has a single Titan V GPU getting really good results now that the os3gw tasks are getting 20,000 / valid task.

1.5M+ RAC 

 

It has been an interesting (and difficult) system to work with but a great learning experience. There have been some hiccups along the way and I am sure more to come. The Titan V is great, the 14900ks has been like trying to train a 100m sprinter to run a marathon. Undervolting, massive cooling, and running only E@H work has been key (along with lots of bios updates...). It can easily maintain 5.9GHz across the active cores completing E@H work and with the 8400MT/s DDR5, it really is impressive during the recalc steps. It will hit higher than 5.9GHz (up to 6.2GHz) but only in small bursts across 2 cores.

The system is part experiment, part learning experience for the students (and me), and hopefully can make a good contribution. 

Also, not directly related to this thread, but I started using the Linux application "s-tui". It is the best program I have used for monitoring core speed, temp, and speed. There might be other programs out there that are better, but not that I have used yet. And it can stress test. Really, really great and has been absolutely essential for figuring out this system. 

GWGeorge007
GWGeorge007
Joined: 8 Jan 18
Posts: 3109
Credit: 4995456820
RAC: 1154178

Boca Raton Community HS

Boca Raton Community HS wrote:

Also, not directly related to this thread, but I started using the Linux application "s-tui". It is the best program I have used for monitoring core speed, temp, and speed. There might be other programs out there that are better, but not that I have used yet. And it can stress test. Really, really great and has been absolutely essential for figuring out this system. 

I agree, yes "s-tui" is one of the best apps out there.  I do use it also, but I don't leave it up like Zenmoitor, it takes up too much room if you want to see everything.  And I do, but I'm not quite the purest at getting the most out of my systems.  At least not yet.

George

Proud member of the Old Farts Association

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.