Discussion Thread for the Continuous GW Search known as O2MD1 (now O2MDF - GPUs only)

Mr P Hucker
Mr P Hucker
Joined: 12 Aug 06
Posts: 838
Credit: 519321801
RAC: 14174

Tom M wrote:I am NNT

Tom M wrote:
I am NNT Cosmology@Home to free up that thread for cpu GW processing.  I was going to do the same with WCG (4 threads) until I discovered that I am now processing a Rosetti Beta task on that project.

So Rosetta farms out to other projects aswell?  I'm running Rosetta directly on almost all CPU cores, with a couple doing LHC and Universe.  The Rosetta server status shows a queue of up to 8.5 million WUs!  But they're getting ploughed through very quickly indeed, many people are doing it.

If this page takes an hour to load, reduce posts per page to 20 in your settings, then the tinpot 486 Einstein uses can handle it.

Mr P Hucker
Mr P Hucker
Joined: 12 Aug 06
Posts: 838
Credit: 519321801
RAC: 14174

Ian&Steve C. wrote:i don't

Ian&Steve C. wrote:
i don't think it has to do with the driver. some GW workunits require more memory than others, even under the same "type", such as VelaJrs: Ive seen some use more than 3GB, ive seen some use less than 1GB. you will probably still fail if you get sent another one requiring more than 3GB again. 

Mine did not fail when I was running two GW WUs on a 4GB card and exceeded 4GB.  It just used system RAM and went a lot slower (about 3-4 times slower).

If this page takes an hour to load, reduce posts per page to 20 in your settings, then the tinpot 486 Einstein uses can handle it.

Tom M
Tom M
Joined: 2 Feb 06
Posts: 6461
Credit: 9585887871
RAC: 6896470

Peter Hucker wrote:Tom M

Peter Hucker wrote:
Tom M wrote:
I am NNT Cosmology@Home to free up that thread for CPU GW processing.  I was going to do the same with WCG (4 threads) until I discovered that I am now processing a Rosetti Beta task on that project.

So Rosetta farms out to other projects as well?  I'm running Rosetta directly on almost all CPU cores, with a couple doing LHC and Universe.  The Rosetta server status shows a queue of up to 8.5 million WUs!  But they're getting plowed through very quickly indeed, many people are doing it.

Sorry.  I miswrote.  I should have wrote "Conronvid19" not Rosetti.

Tom M

 

A Proud member of the O.F.A.  (Old Farts Association).  Be well, do good work, and keep in touch.® (Garrison Keillor)  I want some more patience. RIGHT NOW!

Tom M
Tom M
Joined: 2 Feb 06
Posts: 6461
Credit: 9585887871
RAC: 6896470

Peter Hucker wrote:I've got

Peter Hucker wrote:
I've got 1010 Gamma awaiting validation!  My wingmen are taking too long.... I've reduced my buffer to 3+3 hours, so WUs are returned faster, which I assume is better for the project, aswell as keeping things tidier on my end, especially if I want to change projects or weights.

How about changing it to 0.5+ 0.1? It will poll more frequently while not carrying as big a cache. And your production will be the same.

Tom M

A Proud member of the O.F.A.  (Old Farts Association).  Be well, do good work, and keep in touch.® (Garrison Keillor)  I want some more patience. RIGHT NOW!

Mr P Hucker
Mr P Hucker
Joined: 12 Aug 06
Posts: 838
Credit: 519321801
RAC: 14174

Tom M wrote:How about

Tom M wrote:
How about changing it to 0.5+ 0.1? It will poll more frequently while not carrying as big a cache. And your production will be the same.

Note I wrote hours not days, it's entered in the options in days, which in my case is 0.13+0.13.

The reason I set it to x+x instead of x+y is to give it a bigger range, so:
a) It doesn't pester the server more than necessary.
b) If I'm running two projects like I am now, it can get a bigger chunk from one that needs to catch up to meet the weighting I set.

If this page takes an hour to load, reduce posts per page to 20 in your settings, then the tinpot 486 Einstein uses can handle it.

Tom M
Tom M
Joined: 2 Feb 06
Posts: 6461
Credit: 9585887871
RAC: 6896470

Lovely, missing reference to

Lovely, missing reference to both GW tasks. Fixed.

Tom M wrote:

<cc_config>
 <log_flags>
   <sched_op_debug>1</sched_op_debug>
 </log_flags>
 <options>
   <use_all_gpus>1</use_all_gpus>
   <save_stats_days>365</save_stats_days>
<exclude_gpu>
   <url>http://einstein.phys.uwm.edu/</url>
   <device_num>0</device_num>
   <app>einstein_O2MDF</app>
<app>einstein_O2MD1</app>
</exclude_gpu>
<exclude_gpu>
   <url>http://einstein.phys.uwm.edu/</url>
   <device_num>1</device_num>
  <app>einstein_O2MDF</app>
   <app>einstein_O2MD1</app>
</exclude_gpu>
 </options>
</cc_config>

Tom M

A Proud member of the O.F.A.  (Old Farts Association).  Be well, do good work, and keep in touch.® (Garrison Keillor)  I want some more patience. RIGHT NOW!

Zalster
Zalster
Joined: 26 Nov 13
Posts: 3117
Credit: 4050672230
RAC: 0

Not sure but I think each

Not sure but I think each work unit needs it's own exclude for each work unit.  ie, you can't but both in the same exclude.  Let me know if it does or doesn't work.

Tom M
Tom M
Joined: 2 Feb 06
Posts: 6461
Credit: 9585887871
RAC: 6896470

Zalster wrote:Not sure but I

Zalster wrote:
Not sure but I think each work unit needs it's own exclude for each work unit.  ie, you can't but both in the same exclude.  Let me know if it does or doesn't work.

I have thrown up my hands and dropped the excludes and switched on GW gpu only.  Its beginning to look like you can have one or the other but running both on the same machines gpus is a major hassle.

A Proud member of the O.F.A.  (Old Farts Association).  Be well, do good work, and keep in touch.® (Garrison Keillor)  I want some more patience. RIGHT NOW!

Mr P Hucker
Mr P Hucker
Joined: 12 Aug 06
Posts: 838
Credit: 519321801
RAC: 14174

Tom M wrote:I have thrown up

Tom M wrote:
I have thrown up my hands and dropped the excludes and switched on GW gpu only.  Its beginning to look like you can have one or the other but running both on the same machines gpus is a major hassle.

What is it you're trying to achieve?

If you have different types of GPU, and want GW on one model and gamma on the other, can you put the different types of GPU in different computers?

If you want to do an even amount of work for both, it should manage that itself on the defaults, which is what I do on one of my machines.  Sometimes it runs two gammas at once, sometimes one GW.  All I changed was to say gamma needs 0.5 GPUs, and GW needs 1 GPU (I only did this because two GWs makes the GPU run out of memory and slow down drastically).

If you're trying to make the GPU have full usage, and this is best achieved by running one of each task on it, then you could adjust the usage of GW and gamma tasks, perhaps 0.65 and 0.35, so it would never run two gravities and should usually run one of each.  You might sometimes get 3 gammas, but I assume that would also produce full load.

If this page takes an hour to load, reduce posts per page to 20 in your settings, then the tinpot 486 Einstein uses can handle it.

cecht
cecht
Joined: 7 Mar 18
Posts: 1535
Credit: 2910288722
RAC: 2110329

I've been running the beta

I've been running the beta app (v2.08, GW-opencl-ati-Beta) overnight on my 4-thread Linux system and have seen a good performance boost over the standard app. All the runs have been with VelaJr1 tasks, so I don't yet know about validations, but there have been no errors so far from about 100 completed tasks.

Comparisons of realized single run times (minutes) for 2 RX 570s, running O1MDFV2g_VelaJr1_1475.xxHz:

ver.->2.082.08_Beta
2GPU@1x2017
2GPU@2x2210.2
1GPU@3xna8.7
2GPU@3xna16

Times for the standard app differ from what I previously posted because the tasks spanned a different set of analysis frequencies, I'm guessing.

Beta can run a single task about 10% faster, but the biggest improvement was a more efficient use of system resources to allow higher task multiplicities and greater task productivity. Previously, I was not able to run 2GPU@2x with the std app because either GPU or GPU resources were limiting, which resulted in longer realized task times. With the Beta app, however, at least with this current batch of tasks, I can run four concurrent tasks and see shortened task times.

Even better task times can be had at 3x tasks on a single GPU, but running both GPUs at 3x resulted in a nearly doubled increase of task time. I found a trick, however, to run 5 concurrent tasks across two GPUs on my system to realize the best of all possible worlds: in app_config.xml set gpu_usage to 0.33 and cpu_usage to the default 0.9 (or 1). I usually have cpu_usage set to 0.4 or 0.5, which allows boinc-client to run 6, or even 8, concurrent tasks across two GPUs. Increasing cpu_usage restricts the number of concurrent tasks to 5, thus maximizing task productivity on my system. I expect that systems with different CPU and GPU capacities would need different boinc-client configurations to maximize productivity. I don't know how well this odd-ball configuration works with the non-VelaJr GW tasks.   

Arcana note: Background system processes of smda0 and comp_1 now use far less CPU resources than previously, whereas previously (i.e. last month) they would take up the majority of CPU time at higher multiplicities for GW GPU tasks.  Through observation I have learned that these two processes are related to AMD GPU activity, and guess they are called by the AMD drivers. One smda0 process is run for one active GPU, two for two active GPUs; 'active' meaning a non-zero GPU load. One comp_1 process is needed for each active boinc task on a GPU, or associated with one smda0 process; i.e. 3 concurrent tasks have 3 comp_1 processes running (comp_1.0.0, comp_1.1.0, comp_1.2.0); a second GPU running @ 3x doubles up on those comp processes.
I am not sure whether the lighter CPU use of these processes is because of the current set of tasks, or because of a recent AMDGPU driver package update that I did for the OpenCL component (amdgpu-pro-20.10-1048554-ubuntu-18.04, updated from amdgpu-pro-19.50-967956-ubuntu-18.04),

Ideas are not fixed, nor should they be; we live in model-dependent reality.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.