O3ASE Questions - Issues - Advice

Cherokee150

Joined: 13 May 11

Posts: 24

Credit: 909114324

RAC: 299517

13 May 2021 9:01:51 UTC

Topic 225387

(moderation:

)

O3ASE: Post Questions, Answers, Issues, and Advice for O3ASE GW.

Cherokee150

Joined: 13 May 11

Posts: 24

Credit: 909114324

RAC: 299517

I would like to process 4

13 May 2021 10:02:17 UTC

Message 185767

(moderation:

)

I would like to process 4 OSEASE GW units simultaneously on my computer 4213062. However, I can only run 3 at a time.

Computer 4213062 is:

CPU type:
GenuineIntel Intel(R) Core(TM)2 Quad CPU Q8400 @ 2.66GHz [Family 6 Model 23 Stepping 10]
Number of processors: 4
Coprocessors:
NVIDIA GeForce GTX 1070 (4095MB) driver: 456.71
Operating system:
Microsoft Windows 10 Professional x64 Edition, (10.00.19041.00)
--------------------------------

My global_prefs_override.xml file is:

<global_preferences>
   <run_on_batteries>1</run_on_batteries>
   <run_if_user_active>1</run_if_user_active>
   <run_gpu_if_user_active>1</run_gpu_if_user_active>
   <suspend_cpu_usage>0.000000</suspend_cpu_usage>
   <start_hour>0.000000</start_hour>
   <end_hour>0.000000</end_hour>
   <net_start_hour>0.000000</net_start_hour>
   <net_end_hour>0.000000</net_end_hour>
   <leave_apps_in_memory>0</leave_apps_in_memory>
   <confirm_before_connecting>0</confirm_before_connecting>
   <hangup_if_dialed>0</hangup_if_dialed>
   <dont_verify_images>0</dont_verify_images>
   <work_buf_min_days>5.000000</work_buf_min_days>
   <work_buf_additional_days>0.100000</work_buf_additional_days>
   <max_ncpus_pct>100.000000</max_ncpus_pct>
   <cpu_scheduling_period_minutes>600.000000</cpu_scheduling_period_minutes>
   <disk_interval>60.000000</disk_interval>
   <disk_max_used_gb>10.000000</disk_max_used_gb>
   <disk_max_used_pct>75.000000</disk_max_used_pct>
   <disk_min_free_gb>2.000000</disk_min_free_gb>
   <vm_max_used_pct>50.000000</vm_max_used_pct>
   <ram_max_used_busy_pct>50.000000</ram_max_used_busy_pct>
   <ram_max_used_idle_pct>80.000000</ram_max_used_idle_pct>
   <max_bytes_sec_up>0.000000</max_bytes_sec_up>
   <max_bytes_sec_down>0.000000</max_bytes_sec_down>
   <cpu_usage_limit>100.000000</cpu_usage_limit>
   <daily_xfer_limit_mb>0.000000</daily_xfer_limit_mb>
   <daily_xfer_period_days>0</daily_xfer_period_days>
</global_preferences>
----------------------------------

My app_config.xml file is:

<app_config>
   <project_max_concurrent>3</project_max_concurrent>
   <app>
      <name>hsgamma_FGRPB1G</name>
         <max_concurrent>3</max_concurrent>
         <gpu_versions>
           <gpu_usage>0.24</gpu_usage>
           <cpu_usage>0.15</cpu_usage>
         </gpu_versions>
   </app>
   <app>
      <name>einstein_O2MDF</name>
         <max_concurrent>2</max_concurrent>
         <gpu_versions>
           <gpu_usage>0.5</gpu_usage>
           <cpu_usage>1</cpu_usage>
         </gpu_versions>
   </app>
   <app>
      <name>einstein_O3ASE</name>
         <max_concurrent>4</max_concurrent>
         <gpu_versions>
           <gpu_usage>0.25</gpu_usage>
           <cpu_usage>0.25</cpu_usage>
         </gpu_versions>
   </app>
</app_config>
--------------------------

I did notice that my NVIDIA card (from NVIDIA, not a third party) claims to have 8 GB of RAM. GPU-Z says its Memory Size is 8192 MB. However, Einstein's computer description shows the GPU as having only 4095 MB.

With 3 O3ASE units my GPU runs only about 65-75%, and my CPU is only running at 75%. I prefer to run BOINC as close to 100% CPU and GPU as possible on all my computers.

Can you please help me resolve this? Thank you for any assistance you can give me!

Richard Haselgrove

Joined: 10 Dec 05

Posts: 2143

Credit: 3045654182

RAC: 2040156

I'll check one off that

13 May 2021 10:42:01 UTC

Message 185768 in response to message 185767

(moderation:

)

I'll check one off that list.

Cherokee150 wrote:

I did notice that my NVIDIA card (from NVIDIA, not a third party) claims to have 8 GB of RAM. GPU-Z says its Memory Size is 8192 MB. However, Einstein's computer description shows the GPU as having only 4095 MB.

The server can only report what the client on your machine tells it.

For historical reasons (GPU cards had much less RAM in the early days), the detection code was written as a 32-bit application. Card sizes have evolved, but the BOINC client code hasn't evolved with it. 4 GB is the maximum figure that can be stored in a 32-bit variable.

So, 4095 MB is a consequence of BOINC's message handling. The 8192 MB reported by NVidia will be authoritative. Once the application is launched onto the card, it will see the full 8 GB.

San-Fernando-Valley

Joined: 16 Mar 16

Posts: 561

Credit: 10930389501

RAC: 16012598

Cherokee150 wrote: ... My

13 May 2021 12:02:54 UTC

Message 185771 in response to message 185767

(moderation:

)

Cherokee150 wrote:

...

My app_config.xml file is:

<app_config>
<project_max_concurrent>3</project_max_concurrent>

...

Try project max concurrent 4 not 3

AND I would give the O3AS at least one full CPU core per WU (you have 4 real cores)

cecht

Joined: 7 Mar 18

Posts: 1618

Credit: 3031000236

RAC: 1446820

San-Fernando-Valley

13 May 2021 12:48:20 UTC

Message 185773 in response to message 185771

(moderation:

)

San-Fernando-Valley wrote:

Cherokee150 wrote:

...

My app_config.xml file is:

<app_config>
<project_max_concurrent>3</project_max_concurrent>

...

Try project max concurrent 4 not 3

I had to chuckle at that because I have been caught out by project_max_concurrent not once, not twice, but, well, an embarrassing number of times. I set it for testing one thing or another, then forget about it when I later increase gpu_usage and run around it circles trying to figure out why more tasks aren't running.

Ideas are not fixed, nor should they be; we live in model-dependent reality.

San-Fernando-Valley

Joined: 16 Mar 16

Posts: 561

Credit: 10930389501

RAC: 16012598

Cherokee150 wrote: ... I

13 May 2021 14:43:10 UTC

Message 185775 in response to message 185767

(moderation:

)

Cherokee150 wrote:

...

I did notice that my NVIDIA card (from NVIDIA, not a third party) claims to have 8 GB of RAM. GPU-Z says its Memory Size is 8192 MB. However, Einstein's computer description shows the GPU as having only 4095 MB.

...

If you look in your list of computers and click on "details" you will see the "correct" memory size of 8GB.

Gary Roberts

Moderator

Joined: 9 Feb 05

Posts: 5888

Credit: 119686137234

RAC: 25333416

Cherokee150 wrote:I would

14 May 2021 5:25:43 UTC

Message 185796 in response to message 185767

(moderation:

)

Cherokee150 wrote:

I would like to process 4 OSEASE GW units simultaneously on my computer 4213062. However, I can only run 3 at a time.

You should be very grateful that at least BOINC has some common sense ;-) :-).

Past experience with GW GPU tasks shows that a lot of support from a fast modern CPU is needed for the best performance. The new O3ASE run is a test run using dummy data so we don't yet have a clue as to how the real run might perform. We don't yet know about VRAM requirements but on past experience, even 8GB will probably lead to memory errors with at least some tasks at x4.

Your CPU (Q8400 2.66GHz) was a great CPU in its day (ie. 2009). I have about 20 of them still running just to support GRP GPU tasks at x2. I use only AMD GPUs so I don't have direct experience with nvidia. Q8400 CPUs keep up OK with AMD for GRP tasks. I doubt that's true for GW on nvidia. I'm just reiterating what others have mentioned.

There have been many reports that running multiple concurrent Einstein tasks on nvidia GPUs leads to a degradation in output and not a gain. You really should make an effort to test GW GPU tasks at x1, x2, x3 very carefully to see for sure that you get any benefit at all from increasing multiplicity. I suspect you may not if you go past x2 and that using x4 would be even worse.

In your global_prefs_override.xml, you have minimum days set at 5!! Are you crazy?? :-). The GW tasks deadline has been 7 days and if that continues (it probably will) you will have BOINC in permanent panic mode, even if some fast GRP GPU tasks haven't lowered the time estimate to some stupidly low value. If you run GW only, it might be safe to sneak that value up to 2-3 days at the most, provided you have sensible, stable crunch times with correct, stable estimates. If things are not stable, keep it quite low.

Cherokee150 wrote:

With 3 O3ASE units my GPU runs only about 65-75%, and my CPU is only running at 75%. I prefer to run BOINC as close to 100% CPU and GPU as possible on all my computers.

In simple terms, both CPU and GPU are complex devices containing multiple subsystems. GPU crunching doesn't uniformly use all subsystems. If one subsystem is maxed out whilst others are only lightly used, do you really expect that running more instances (that undoubtedly need the very same maxed out subsystem) is going to result in greater output?

The CPU is needed for rapid support, exactly when it's requested (ie.fast modern CPU is best) so make sure you have minimal other things stealing CPU cycles. The best you can do is carefully experiment running x1 to get a baseline. Repeat with x2 until you are sure that the average of quite a few tasks is really an improvement. If so, rinse and repeat with x3, etc., until you get the best output.

Please take into account that the real run (when it starts) may give a different outcome. Life wasn't meant to be easy :-).

Cheers,
Gary.

cecht

Joined: 7 Mar 18

Posts: 1618

Credit: 3031000236

RAC: 1446820

Invalid rates for O3ASE tasks

14 May 2021 12:15:10 UTC

Message 185803

(moderation:

)

Invalid rates for O3ASE tasks are running a bit over 7% on my two hosts. It is nearly the same with an RX 570 (4GB) running 3X concurrent tasks and a RX 5600 XT (6GB) running 4X. This is quite a bit higher than with the former O2 tasks, which gave much less than 1% invalid. Hopefully invalid rates will improve once the Engineering phase is completed.

Ideas are not fixed, nor should they be; we live in model-dependent reality.

Betreger

Joined: 25 Feb 05

Posts: 992

Credit: 1645957429

RAC: 657648

I too am also getting more

14 May 2021 18:52:54 UTC

Message 185820 in response to message 185803

(moderation:

)

I too am also getting more invalid results than I normally get on both my twin 3GB GTX1060 running 1X and my GTX1660s running 2X

Ron Kosinski

Joined: 23 Mar 05

Posts: 57

Credit: 1131120643

RAC: 820491

I am also getting an

16 May 2021 12:52:22 UTC

Message 185865

(moderation:

)

I am also getting an unusually high number of crash and burn GW03ASE w/u on my two boxes that are running them. One box has a single GTX 1050 Ti card, the other box has dual RX 580 cards. My app_config file is set for one GPU and one CPU per w/u. It seems like the box with the NVIDIA card is getting more "errors" and the box with the AMD cards is getting more "invalids".

earthbilly

Joined: 4 Apr 18

Posts: 59

Credit: 1140229967

RAC: 0

I just ran a large batch of

18 May 2021 0:21:09 UTC

Message 185919

(moderation:

)

I just ran a large batch of engineering tasks, 4000-5000, and had very little trouble except for when I was experimenting around too much with hardware. Even some at 2X strings on two rx570 4gb did well but I had to watch that closely and decided next time to just crunch along at one string per gpu. I am in no hurry. I am retired, every morning.

Oh! I found really better luck with multi strings per gpu when I staggered the start time by half the completion time.

Work runs fine on Bosons reacted into Fermions,

Sunny regards,

earthbilly

O3ASE Questions - Issues - Advice

Forums › Cruncher's Corner

Comment viewing options

Forums › Cruncher's Corner