Discussion Thread for the Continuous GW Search known as O2MD1 (now O2MDF - GPUs only)

Mr P Hucker

Joined: 12 Aug 06

Posts: 838

Credit: 519321801

RAC: 14174

Tom M wrote:I am NNT

24 Apr 2020 18:23:47 UTC

Message 177081 in response to message 177071

(moderation:

)

Tom M wrote:

I am NNT Cosmology@Home to free up that thread for cpu GW processing. I was going to do the same with WCG (4 threads) until I discovered that I am now processing a Rosetti Beta task on that project.

So Rosetta farms out to other projects aswell? I'm running Rosetta directly on almost all CPU cores, with a couple doing LHC and Universe. The Rosetta server status shows a queue of up to 8.5 million WUs! But they're getting ploughed through very quickly indeed, many people are doing it.

If this page takes an hour to load, reduce posts per page to 20 in your settings, then the tinpot 486 Einstein uses can handle it.

Mr P Hucker

Joined: 12 Aug 06

Posts: 838

Credit: 519321801

RAC: 14174

Ian&Steve C. wrote:i don't

24 Apr 2020 18:25:45 UTC

Message 177083 in response to message 177075

(moderation:

)

Ian&Steve C. wrote:

i don't think it has to do with the driver. some GW workunits require more memory than others, even under the same "type", such as VelaJrs: Ive seen some use more than 3GB, ive seen some use less than 1GB. you will probably still fail if you get sent another one requiring more than 3GB again.

Mine did not fail when I was running two GW WUs on a 4GB card and exceeded 4GB. It just used system RAM and went a lot slower (about 3-4 times slower).

If this page takes an hour to load, reduce posts per page to 20 in your settings, then the tinpot 486 Einstein uses can handle it.

Tom M

Joined: 2 Feb 06

Posts: 6461

Credit: 9585887871

RAC: 6896470

Peter Hucker wrote:Tom M

24 Apr 2020 18:34:51 UTC

Message 177084 in response to message 177081

(moderation:

)

Peter Hucker wrote:

Tom M wrote:
I am NNT Cosmology@Home to free up that thread for CPU GW processing. I was going to do the same with WCG (4 threads) until I discovered that I am now processing a Rosetti Beta task on that project.

So Rosetta farms out to other projects as well? I'm running Rosetta directly on almost all CPU cores, with a couple doing LHC and Universe. The Rosetta server status shows a queue of up to 8.5 million WUs! But they're getting plowed through very quickly indeed, many people are doing it.

Sorry. I miswrote. I should have wrote "Conronvid19" not Rosetti.

Tom M

A Proud member of the O.F.A. (Old Farts Association). Be well, do good work, and keep in touch.® (Garrison Keillor) I want some more patience. RIGHT NOW!

Tom M

Joined: 2 Feb 06

Posts: 6461

Credit: 9585887871

RAC: 6896470

Peter Hucker wrote:I've got

24 Apr 2020 18:38:48 UTC

Message 177085 in response to message 177079

(moderation:

)

Peter Hucker wrote:

I've got 1010 Gamma awaiting validation! My wingmen are taking too long.... I've reduced my buffer to 3+3 hours, so WUs are returned faster, which I assume is better for the project, aswell as keeping things tidier on my end, especially if I want to change projects or weights.

How about changing it to 0.5+ 0.1? It will poll more frequently while not carrying as big a cache. And your production will be the same.

Tom M

A Proud member of the O.F.A. (Old Farts Association). Be well, do good work, and keep in touch.® (Garrison Keillor) I want some more patience. RIGHT NOW!

Mr P Hucker

Joined: 12 Aug 06

Posts: 838

Credit: 519321801

RAC: 14174

Tom M wrote:How about

24 Apr 2020 19:05:34 UTC

Message 177086 in response to message 177085

(moderation:

)

Tom M wrote:

How about changing it to 0.5+ 0.1? It will poll more frequently while not carrying as big a cache. And your production will be the same.

Note I wrote hours not days, it's entered in the options in days, which in my case is 0.13+0.13.

The reason I set it to x+x instead of x+y is to give it a bigger range, so:
a) It doesn't pester the server more than necessary.
b) If I'm running two projects like I am now, it can get a bigger chunk from one that needs to catch up to meet the weighting I set.

If this page takes an hour to load, reduce posts per page to 20 in your settings, then the tinpot 486 Einstein uses can handle it.

Tom M

Joined: 2 Feb 06

Posts: 6461

Credit: 9585887871

RAC: 6896470

Lovely, missing reference to

24 Apr 2020 19:05:48 UTC

Message 177087 in response to message 177071

(moderation:

)

Lovely, missing reference to both GW tasks. Fixed.

Tom M wrote:

<cc_config>
<log_flags>
   <sched_op_debug>1</sched_op_debug>
</log_flags>
<options>
   <use_all_gpus>1</use_all_gpus>
   <save_stats_days>365</save_stats_days>
<exclude_gpu>
   <url>http://einstein.phys.uwm.edu/</url>
   <device_num>0</device_num>
   <app>einstein_O2MDF</app>
<app>einstein_O2MD1</app>
</exclude_gpu>
<exclude_gpu>
   <url>http://einstein.phys.uwm.edu/</url>
   <device_num>1</device_num>
<app>einstein_O2MDF</app>
   <app>einstein_O2MD1</app>
</exclude_gpu>
</options>
</cc_config>

Tom M

A Proud member of the O.F.A. (Old Farts Association). Be well, do good work, and keep in touch.® (Garrison Keillor) I want some more patience. RIGHT NOW!

Zalster

Joined: 26 Nov 13

Posts: 3117

Credit: 4050672230

RAC: 0

Not sure but I think each

24 Apr 2020 19:22:43 UTC

Message 177088 in response to message 177087

(moderation:

)

Not sure but I think each work unit needs it's own exclude for each work unit. ie, you can't but both in the same exclude. Let me know if it does or doesn't work.

Tom M

Joined: 2 Feb 06

Posts: 6461

Credit: 9585887871

RAC: 6896470

Zalster wrote:Not sure but I

25 Apr 2020 11:49:02 UTC

Message 177102 in response to message 177088

(moderation:

)

Zalster wrote:

Not sure but I think each work unit needs it's own exclude for each work unit. ie, you can't but both in the same exclude. Let me know if it does or doesn't work.

I have thrown up my hands and dropped the excludes and switched on GW gpu only. Its beginning to look like you can have one or the other but running both on the same machines gpus is a major hassle.

A Proud member of the O.F.A. (Old Farts Association). Be well, do good work, and keep in touch.® (Garrison Keillor) I want some more patience. RIGHT NOW!

Mr P Hucker

Joined: 12 Aug 06

Posts: 838

Credit: 519321801

RAC: 14174

Tom M wrote:I have thrown up

25 Apr 2020 16:10:47 UTC

Message 177106 in response to message 177102

(moderation:

)

Tom M wrote:

I have thrown up my hands and dropped the excludes and switched on GW gpu only. Its beginning to look like you can have one or the other but running both on the same machines gpus is a major hassle.

What is it you're trying to achieve?

If you have different types of GPU, and want GW on one model and gamma on the other, can you put the different types of GPU in different computers?

If you want to do an even amount of work for both, it should manage that itself on the defaults, which is what I do on one of my machines. Sometimes it runs two gammas at once, sometimes one GW. All I changed was to say gamma needs 0.5 GPUs, and GW needs 1 GPU (I only did this because two GWs makes the GPU run out of memory and slow down drastically).

If you're trying to make the GPU have full usage, and this is best achieved by running one of each task on it, then you could adjust the usage of GW and gamma tasks, perhaps 0.65 and 0.35, so it would never run two gravities and should usually run one of each. You might sometimes get 3 gammas, but I assume that would also produce full load.

If this page takes an hour to load, reduce posts per page to 20 in your settings, then the tinpot 486 Einstein uses can handle it.

cecht

Joined: 7 Mar 18

Posts: 1535

Credit: 2910288722

RAC: 2110329

I've been running the beta

26 Apr 2020 14:57:24 UTC

Message 177130

(moderation:

)

I've been running the beta app (v2.08, GW-opencl-ati-Beta) overnight on my 4-thread Linux system and have seen a good performance boost over the standard app. All the runs have been with VelaJr1 tasks, so I don't yet know about validations, but there have been no errors so far from about 100 completed tasks.

Comparisons of realized single run times (minutes) for 2 RX 570s, running O1MDFV2g_VelaJr1_1475.xxHz:

ver.->	2.08	2.08_Beta
2GPU@1x	20	17
2GPU@2x	22	10.2
1GPU@3x	na	8.7
2GPU@3x	na	16

Times for the standard app differ from what I previously posted because the tasks spanned a different set of analysis frequencies, I'm guessing.

Beta can run a single task about 10% faster, but the biggest improvement was a more efficient use of system resources to allow higher task multiplicities and greater task productivity. Previously, I was not able to run 2GPU@2x with the std app because either GPU or GPU resources were limiting, which resulted in longer realized task times. With the Beta app, however, at least with this current batch of tasks, I can run four concurrent tasks and see shortened task times.

Even better task times can be had at 3x tasks on a single GPU, but running both GPUs at 3x resulted in a nearly doubled increase of task time. I found a trick, however, to run 5 concurrent tasks across two GPUs on my system to realize the best of all possible worlds: in app_config.xml set gpu_usage to 0.33 and cpu_usage to the default 0.9 (or 1). I usually have cpu_usage set to 0.4 or 0.5, which allows boinc-client to run 6, or even 8, concurrent tasks across two GPUs. Increasing cpu_usage restricts the number of concurrent tasks to 5, thus maximizing task productivity on my system. I expect that systems with different CPU and GPU capacities would need different boinc-client configurations to maximize productivity. I don't know how well this odd-ball configuration works with the non-VelaJr GW tasks.

Arcana note: Background system processes of smda0 and comp_1 now use far less CPU resources than previously, whereas previously (i.e. last month) they would take up the majority of CPU time at higher multiplicities for GW GPU tasks. Through observation I have learned that these two processes are related to AMD GPU activity, and guess they are called by the AMD drivers. One smda0 process is run for one active GPU, two for two active GPUs; 'active' meaning a non-zero GPU load. One comp_1 process is needed for each active boinc task on a GPU, or associated with one smda0 process; i.e. 3 concurrent tasks have 3 comp_1 processes running (comp_1.0.0, comp_1.1.0, comp_1.2.0); a second GPU running @ 3x doubles up on those comp processes.
I am not sure whether the lighter CPU use of these processes is because of the current set of tasks, or because of a recent AMDGPU driver package update that I did for the OpenCL component (amdgpu-pro-20.10-1048554-ubuntu-18.04, updated from amdgpu-pro-19.50-967956-ubuntu-18.04),

Ideas are not fixed, nor should they be; we live in model-dependent reality.

Discussion Thread for the Continuous GW Search known as O2MD1 (now O2MDF - GPUs only)

Forums › Cruncher's Corner

Comment viewing options

Forums › Cruncher's Corner