Top Production apps OS3GW or Brp7-meerKat - Discussion

Tom M

Joined: 2 Feb 06

Posts: 6529

Credit: 9631474836

RAC: 2861975

Ian& SteveC, Right now I

14 Aug 2024 17:32:40 UTC

Message 227531 in response to message 227530

(moderation:

)

Ian& SteveC,

Right now I am running at 20 percent or less CPU load which give 3.1+ MHz on the active tasks. I hope to get up to a higher CPU load later.

Running WCG since Rosetta has been down.

A Proud member of the O.F.A. (Old Farts Association). Be well, do good work, and keep in touch.® (Garrison Keillor) I want some more patience. RIGHT NOW!

Tom M

Joined: 2 Feb 06

Posts: 6529

Credit: 9631474836

RAC: 2861975

It is clear that running on

23 Aug 2024 18:54:32 UTC

Message 227709

(moderation:

)

It is clear that running on "real" cores when you are running the 1.07/1.08/1.15 versions the tasks run more quickly.

I am running a 4 Titan V box with no other tasks. It has an Epyc 7282 (16c/32t) cpu right now.

When I run it on Nvidia MPS server with 4x per GPU the time each task takes approximates the top performing Titan V boxes of the Top 50.

They are running 5x not 4x. When I run 5x my tasks split between under 2000s and over 2000s both of which are slower than the top performer task times.

So it looks like I need more "real" cpu cores. The choices in ascending sequence of cost are 7402 (24c/48t), 7f72 (24c/48t, 3.6MHz) and a 7742-class cpu (64c/128t).

Anyone of these choices will raise production. In theory and probably practice the 7f72 will raise production the most. But provide the least increase available to run any cpu-only tasks.

Comments?

A Proud member of the O.F.A. (Old Farts Association). Be well, do good work, and keep in touch.® (Garrison Keillor) I want some more patience. RIGHT NOW!

pututu

Joined: 6 Apr 17

Posts: 67

Credit: 653417392

RAC: 1

Tom M wrote: It is clear

25 Aug 2024 4:00:14 UTC

Message 227729 in response to message 227709

(moderation:

)

Tom M wrote:

It is clear that running on "real" cores when you are running the 1.07/1.08/1.15 versions the tasks run more quickly.

I am running a 4 Titan V box with no other tasks. It has an Epyc 7282 (16c/32t) cpu right now.

When I run it on Nvidia MPS server with 4x per GPU the time each task takes approximates the top performing Titan V boxes of the Top 50.

They are running 5x not 4x. When I run 5x my tasks split between under 2000s and over 2000s both of which are slower than the top performer task times.

So it looks like I need more "real" cpu cores. The choices in ascending sequence of cost are 7402 (24c/48t), 7f72 (24c/48t, 3.6MHz) and a 7742-class cpu (64c/128t).

Anyone of these choices will raise production. In theory and probably practice the 7f72 will raise production the most. But provide the least increase available to run any cpu-only tasks.

Comments?

When I ran my VII, I disabled HT on my 7950x. Could have set 50% cpu utilization but that's just me. I suspect the cpu portion of the statistical recalculation in O3AS uses heavy floating point arithmetic, so each real core is fully occupied that enabling hyperthreading or running cpu at 100% (HT on) doesn't really help. Project like CPDN (climateprediction.net) and PrimeGrid PG LLR cpu subprojects use heavy floating point arithmetic. There are a few feedback posted in einstein forum earlier on about cpu portion of the O3AS taking so long than expected, even if the hosts have fast cpu. I suspect they could run other boinc projects which need heavy floating point arithmetic.

Tom M

Joined: 2 Feb 06

Posts: 6529

Credit: 9631474836

RAC: 2861975

It looks like 3x may be the

6 Dec 2024 14:42:03 UTC

Message 230603

(moderation:

)

It looks like 3x may be the newest maximum production point with latest LF data being processed.

I would be interested in the 1x results from a Nvidia Windows box running the GW task.

As well as the 1x results from running the 1.15/1.08 version under Linux. (All GPU)

After all there might be a way to regain much more production under the new LF data.

Tom M

===edit===

In other conversations it has been suggested you explore brp7/meerKat to see if you like it's production better. This is especially true if you have the petri cuda optimized version available (Linux).

A Proud member of the O.F.A. (Old Farts Association). Be well, do good work, and keep in touch.® (Garrison Keillor) I want some more patience. RIGHT NOW!

stsfred

Joined: 1 Apr 23

Posts: 5

Credit: 90750270

RAC: 269465

new workunits takes longer to

6 Dec 2024 14:59:11 UTC

Message 230623

(moderation:

)

new workunits takes longer to complete by ~40% and consumes 40% more power while grants only 4000 credits instead of 10000. It is under windows 10, 4x tasks run in parallel. GPU is 4070TiS, CPU is 3950x, running WCG MCM, too.

I switched back to meerkat brp7 as it is no longer as effective as was brp7.

Boca Raton Comm...

Joined: 4 Nov 15

Posts: 263

Credit: 10819719135

RAC: 13281979

Tom M wrote: It looks like

6 Dec 2024 15:22:38 UTC

Message 230626 in response to message 230603

(moderation:

)

Tom M wrote:

It looks like 3x may be the newest maximum production point with latest LF data being processed.

I would be interested in the 1x results from a Nvidia Windows box running the GW task.

As well as the 1x results from running the 1.15/1.08 version under Linux. (All GPU)

After all there might be a way to regain much more production under the new LF data.

Tom M

===edit===

In other conversations it has been suggested you explore brp7/meerKat to see if you like it's production better. This is especially true if you have the petri cuda optimized version available (Linux).

Here is what we came up with to help answer your question. This comparison is not exact (somewhat apples to oranges). The times included show ONLY the time on the GPU in order to negate the impact of different speed CPUs (I did not include the recalc step- I stopped timing when it hit 99%).

Host 1: Threadripper PRO 5965WX, Windows 11 Pro, RTX A4500, 1.07 opencl running at 1x = 2040 seconds

Host 2: Xeon(R) Gold 6258R, Linux Mint, RTX A4500, 1.08 cuda, no MPS running 1x = 1700 seconds

Host 3: Threadripper PRO 5995WX, Linux Mint, RTX A6000, 1.08 cuda, no MPS running 1x = 1440 seconds

I think the comparison of host 1 to 2 is the most valid comparison since they are using the same GPU. Host 3 was included for the sake of additional information only. Initial observation- although the cuda app on Linux is still more productive, the gap closed significantly versus the HF work units.

Harri Liljeroos

Joined: 10 Dec 05

Posts: 4429

Credit: 3247319388

RAC: 1737352

My RTX 4070 Ti Super on Win

6 Dec 2024 16:59:32 UTC

Message 230632

(moderation:

)

My RTX 4070 Ti Super on Win 11 Pro and AMD 7950X used to do OAS tasks in 550 seconds and they take now 1270 seconds. I tested a few LF tasks with 2 at a time and they took about 2650 seconds each. So running one at a time is more productive.

Tom M

Joined: 2 Feb 06

Posts: 6529

Credit: 9631474836

RAC: 2861975

Boca Raton HS has a single

14 Dec 2024 13:11:57 UTC

Message 230957

(moderation:

)

Boca Raton HS has a single Titan V GPU getting really good results now that the os3gw tasks are getting 20,000 / valid task.

1.5M+ RAC

A Proud member of the O.F.A. (Old Farts Association). Be well, do good work, and keep in touch.® (Garrison Keillor) I want some more patience. RIGHT NOW!

Boca Raton Comm...

Joined: 4 Nov 15

Posts: 263

Credit: 10819719135

RAC: 13281979

Tom M wrote:Boca Raton HS

15 Dec 2024 1:53:28 UTC

Message 230980 in response to message 230957

(moderation:

)

Tom M wrote:

Boca Raton HS has a single Titan V GPU getting really good results now that the os3gw tasks are getting 20,000 / valid task.

1.5M+ RAC

It has been an interesting (and difficult) system to work with but a great learning experience. There have been some hiccups along the way and I am sure more to come. The Titan V is great, the 14900ks has been like trying to train a 100m sprinter to run a marathon. Undervolting, massive cooling, and running only E@H work has been key (along with lots of bios updates...). It can easily maintain 5.9GHz across the active cores completing E@H work and with the 8400MT/s DDR5, it really is impressive during the recalc steps. It will hit higher than 5.9GHz (up to 6.2GHz) but only in small bursts across 2 cores.

The system is part experiment, part learning experience for the students (and me), and hopefully can make a good contribution.

Also, not directly related to this thread, but I started using the Linux application "s-tui". It is the best program I have used for monitoring core speed, temp, and speed. There might be other programs out there that are better, but not that I have used yet. And it can stress test. Really, really great and has been absolutely essential for figuring out this system.

GWGeorge007

Joined: 8 Jan 18

Posts: 3109

Credit: 4995473485

RAC: 1145899

Boca Raton Community HS

15 Dec 2024 3:42:53 UTC

Message 230985 in response to message 230980

(moderation:

)

Boca Raton Community HS wrote:

Also, not directly related to this thread, but I started using the Linux application "s-tui". It is the best program I have used for monitoring core speed, temp, and speed. There might be other programs out there that are better, but not that I have used yet. And it can stress test. Really, really great and has been absolutely essential for figuring out this system.

I agree, yes "s-tui" is one of the best apps out there. I do use it also, but I don't leave it up like Zenmoitor, it takes up too much room if you want to see everything. And I do, but I'm not quite the purest at getting the most out of my systems. At least not yet.

George

Proud member of the Old Farts Association

Top Production apps OS3GW or Brp7-meerKat - Discussion

Forums › Cruncher's Corner

Comment viewing options

Forums › Cruncher's Corner