Discussion Thread for the Continuous GW Search known as O2MD1 (now O2MDF - GPUs only)

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5842
Credit: 109385760148
RAC: 35924014

Bernd Machenschalk wrote:...

Bernd Machenschalk wrote:
... it was reported somewhere ...

I sent you a PM about the warning that gets issued (clFFT resources not freed) if a GPU task gets suspended - eg. if the multiplicity of concurrent tasks gets reduced and as a result a task gets suspended.

For more background about why I sent the PM, you can check this message, and the messages that led up to it.  The warning was in the stderr.txt output and I wasn't sure if it was important or not.

Thanks for the new test app.  I guess cecht will be interested to see if it performs any differently when his script is reducing task multiplicity :-).  Is there expected to be any benefit (reduced memory consumption) for running multiple tasks under conditions where multiplicity is being changed like this?

Cheers,
Gary.

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4265
Credit: 244922893
RAC: 16808

Richie wrote: I'm receiving

Richie wrote:

I'm receiving that only for AMD cards (for about 10 hours now). My Nvidia cards still get only 2.07.

Thanks for reporting. Indeed there was something wrong on the server side. Should be fixed.

BM

cecht
cecht
Joined: 7 Mar 18
Posts: 1421
Credit: 2444685658
RAC: 1502355

Gary Roberts wrote:...Thanks

Gary Roberts wrote:
...Thanks for the new test app.  I guess cecht will be interested to see if it performs any differently when his script is reducing task multiplicity :-).  Is there expected to be any benefit (reduced memory consumption) for running multiple tasks under conditions where multiplicity is being changed like this?

So far as I can tell, 2.09 is running tasks with the same memory usage as 2.08. My fancy little script-in-development hasn't yet been put to the test to resolve excessive GPU memory use because the Spotlight tasks to date have had relatively light memory requirements. Wouldn't it be wonderful if Bernd's new app rendered obsolete my fancy little script?  :-\ 

Ideas are not fixed, nor should they be; we live in model-dependent reality.

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4265
Credit: 244922893
RAC: 16808

Apparently there's a problem

Apparently there's a problem with our GW app and the new OSX version "Big Sur" (described here). For now we'll stop delivering GW work to such systems.

BM

Tom M
Tom M
Joined: 2 Feb 06
Posts: 5586
Credit: 7673272876
RAC: 1751098

I have found it possible to

I have found it possible to run a mix of Rx 5700 and Rx 580's under Windows 10.

However, the Rx 580's are not real happy running 3 threads per gpu with Gamma Ray.  Might run 1 or 2 GW tasks though.

I thought I would try segmenting the Rx 5700s to not run GW and the Rx 580s to not run GR.

I think I have the profile set up to download both GR and GW.

Right now I am getting a GW app not found error because I don't have any downloaded.

According to the Boinc log 0, 1 GPUs are the Rx 580's.

The cc_config.xml looks like this:

  <cc_config>
<log_flags>
   <sched_op_debug>1</sched_op_debug>
</log_flags>
<options>
   <max_tasks_reported>150</max_tasks_reported>
   <max_file_xfers>4</max_file_xfers>
   <max_file_xfers_per_project>4</max_file_xfers_per_project>   
   <save_stats_days>180</save_stats_days> 
   <ncpus>16</ncpus> 
   <exclude_gpu>
   <url>http://einstein.phys.uwm.edu/</url>
   <device_num>0</device_num>
   <app>hsgamma_FGRPB1G</app>   
   </exclude_gpu> 
 <exclude_gpu>
   <url>http://einstein.phys.uwm.edu/</url>
   <device_num>1</device_num>
   <app>hsgamma_FGRPB1G</app>   
   </exclude_gpu> 
<exclude_gpu>
   <url>http://einstein.phys.uwm.edu/</url>
   <device_num>2</device_num>
   <app>einstein_O2MDF</app>   
   </exclude_gpu> 
 <exclude_gpu>
   <url>http://einstein.phys.uwm.edu/</url>
   <device_num>3</device_num>
   <app>einstein_O2MDF</app>   
   </exclude_gpu> 
 <exclude_gpu>
   <url>http://einstein.phys.uwm.edu/</url>
   <device_num>4</device_num>
   <app>einstein_O2MDF</app>   
   </exclude_gpu> 
 <exclude_gpu>
   <url>http://einstein.phys.uwm.edu/</url>
   <device_num>5</device_num>
   <app>einstein_O2MDF</app>   
   </exclude_gpu> 
 </options>
</cc_config>

Am I missing anything or screwing something up?

Tom M

 

 

A Proud member of the O.F.A.  (Old Farts Association).  Be well, do good work, and keep in touch.® (Garrison Keillor)

363fc9cda368b2e14d4322e60afde27d
363fc9cda368b2e...
Joined: 4 Sep 17
Posts: 10
Credit: 172312824
RAC: 0

maybe try removing exclude

try restarting boinc after you’ve saved the config file if you haven’t already

Tom M
Tom M
Joined: 2 Feb 06
Posts: 5586
Credit: 7673272876
RAC: 1751098

Because I wasn't getting any

Because I wasn't getting any GR tasks I toggled the "run non-preferred if preferred are not available" and it started downloading GW gpu tasks.

Which leads me to the question of how many gpu threads on my Rx 5700's should I be running?

So far, running either two tasks or even three tasks at a time have not pushed my gpu utilization above 80% on a regular basis.

And with 2 tasks running I am noticing a very high level of "copy" activity on my Windows task manager for the GPUs.  I am used to 1-4%.  I am getting 7-21%.

That leads me to wonder if I am not getting "thrashing" activity that is leading to shuttling two different tasks in/out of the gpu memory.  Usually, that implies a slow down of processing.

So far the tasks seem to be executing ok.

Any guidance on if I should be running 3-4 tasks for GW or 1 only would be appreciated.  As usual, the goal is maximizing production.

Tom M

 

 

A Proud member of the O.F.A.  (Old Farts Association).  Be well, do good work, and keep in touch.® (Garrison Keillor)

Ian&Steve C.
Ian&Steve C.
Joined: 19 Jan 20
Posts: 3681
Credit: 33822140648
RAC: 37784974

GW tasks are much more

GW tasks are much more dependent on the CPU speed/capability. make sure you are leaving enough free CPU resources. I'm not sure the behavior of the Windows_AMD app, but the Linux_nvidia app routinely uses more than 1 CPU thread per GPU_GW task. In my mind, your 3950X should be more than capable to feed the GPU, unless there's some major difference in the behavior of the windows_amd app.

I wouldn't run 4x on your 5700, since they only have 8GB of VRAM. i'd stick to 2-3x. again, make sure that you have enough spare CPU available, reduce your CPU work if you have to.

 

I run 1x on my GPUs, and get good GPU utilization on my systems, so no reason to run 2x for me:

RTX 2080ti @225W (4.2GHz Ryzen 3900x) : ~90-95%

RTX 2080ti @215W (3.3GHz EPYC 7402P) : ~85-90%

GTX 1650 @75W (4.2GHz Ryzen 3900X) : ~95%

RTX 2070 @150W (3.3GHz EPYC 7402P) : ~90%

RTX 3070 @200W (2.8GHz EPYC 7642) : ~90%

GTX 1660 Super @100W (2.8GHz EPYC 7642) :~90-95%

 

 

 

_________________________________________________________________________

Ian&Steve C.
Ian&Steve C.
Joined: 19 Jan 20
Posts: 3681
Credit: 33822140648
RAC: 37784974

a quick look at some various

a quick look at some various systems, looks like Nvidia has a significant edge over AMD for these GW tasks.

_________________________________________________________________________

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5842
Credit: 109385760148
RAC: 35924014

Tom M wrote:Any guidance on

Tom M wrote:
Any guidance on if I should be running 3-4 tasks for GW or 1 only would be appreciated.

About a month ago, I setup 5 hosts with AMD RX 570 8GB GPUs specifically for GW tasks.  Each machine runs tasks 4x and none have had any 'lack of memory' issues with the 8GB GPUs at 4x multiplicity.  At the start, when I first tested, there were distinct improvements in going from 1x right through to 4x, although going from 1x to 2x was the biggest gain.

The CPUs are quite basic and do not crunch.  They include AMD 3000G, Intel G4560, and Intel i3-9100F, all with 4 CPU threads which support the 4x GPU tasks.  There is also a 6 core Ryzen 5 2600 for comparison.  Somewhat surprisingly, the hosts with weaker CPUs keep up with and often beat the Ryzen.  4 threads are certainly enough to support 4 concurrent GPU tasks it would seem.

The machine performing the best is actually the 3000G which completes 4 tasks in about 32 to 36 mins most of the time (ie. 8-9mins on a per task basis).  All the others take around 35 to 39 mins most of the time.  There are a very small number of 'outliers' - tasks that take 40-50% longer than normal.  Out of around 2500-3000 tasks in a particular frequency 'series', I've noticed so far perhaps around 30 tasks or so that show this behaviour.  I don't have full coverage of all the issue numbers in a full series so there could be a few more that show up.

Something similar happened in the previous run and it seems due to this small number simply having that much more work to do than the standard task.  I deliberately crunched some of these at lower multiplicities and they took that much longer than the standard tasks there as well.  So not a behaviour caused by lack of memory, it would seem.

These hosts are currently crunching frequencies in the range 450Hz to 475Hz.  Like last time, memory issues might creep in at much higher frequencies but everything seems OK at 4x for an 8GB AMD GPU at the moment.

Cheers,
Gary.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.