Maximizing Nvidia production

Tom M
Tom M
Joined: 2 Feb 06
Posts: 5586
Credit: 7673909490
RAC: 1783353
Topic 222009

I just noticed that often the processing load on my Nvidia gpus is under 50%.

Does this mean I would get more total production if I were to increase the # of tasks per gpu to 2?

Tom M

 

 

 

A Proud member of the O.F.A.  (Old Farts Association).  Be well, do good work, and keep in touch.® (Garrison Keillor)

Ian&Steve C.
Ian&Steve C.
Joined: 19 Jan 20
Posts: 3681
Credit: 33828773540
RAC: 37776826

I assume you mean with the

I assume you mean with the Gravity Wave tasks? 

the GW tasks need a lot of CPU support. So unless you have a powerful CPU you won’t get very high GPU utilization. 

running 2x tasks per GPU helps, at the expense of using more CPU resources. Do you really want to use 2+ thread for each GPU? I know you like to run CPU work too, so using so much of the CPU just to feed the GPU seems like a waste. 

_________________________________________________________________________

Tom M
Tom M
Joined: 2 Feb 06
Posts: 5586
Credit: 7673909490
RAC: 1783353

Your right.  The cpu work has

Your right.  The cpu work has the priority so I need to run 1 gpu task on 1 thread. 

---edit---

I am trying 1.25 cpus per gpu to see if I can drive the trend of the gpu utilization up for Gravity Waves.

I have also dropped GW from my selected gpu apps list.  And disabled the "run none selected apps".

Unfortunately, E@H has buried me in "several" tasks so it will be a while before I get the GW's processed

--edit--

Thank you.

 

Tom M

A Proud member of the O.F.A.  (Old Farts Association).  Be well, do good work, and keep in touch.® (Garrison Keillor)

DanNeely
DanNeely
Joined: 4 Sep 05
Posts: 1364
Credit: 3562358667
RAC: 1580

If you don't want to run the

If you don't want to run the GW GPU tasks you can just abort them.  As long as you've also got at least a few Fermi GPU tasks that will report success when finished there won't be any major disruption to your supply of new tasks.

Tom M
Tom M
Joined: 2 Feb 06
Posts: 5586
Credit: 7673909490
RAC: 1783353

That is odd.  While I have

That is odd.  While I have turned off Gravity Wave and then restricted Gravity Wave gpu to 2 gpus out of 5 available gpus, as of this morning my system wasn't crunching Pulsar GPU at all.

I checked available tasks and there was still Pulsar tasks available.

I had reset the cache to 0.1 day and 0.1 additional so the backoff was very high.  I did an manual update.

When I renamed app_config.xml to app_config_stop.xml and read the config files again, 3 more gpu's started up running Gravity Wave.

So why was Boinc Manager not running the Pulsar app for which I had all sorts of data files?

---edit----

Add cold boots, setting the cache to 1 as things that don't start the Pulsar gpu tasks

Examined the log, no errors listed,

Switched to NNT so if the last resort of reseting the project becomes necessary I can run out as many tasks as I can before I do.

-edit---

Tom M

 

 

 

 

 

 

 

 

A Proud member of the O.F.A.  (Old Farts Association).  Be well, do good work, and keep in touch.® (Garrison Keillor)

Zalster
Zalster
Joined: 26 Nov 13
Posts: 3117
Credit: 4050672230
RAC: 0

Probably would help to see

Probably would help to see your cc_config.  Also, is this a dedicated Einstein Machine?  I've had many talks with Keith over the years about the value of dedicated machines for each project. Having multi-project machines injects unknown variables into the mix.

Tom M
Tom M
Joined: 2 Feb 06
Posts: 5586
Credit: 7673909490
RAC: 1783353

Zalster wrote:Probably would

Zalster wrote:
Probably would help to see your cc_config.  Also, is this a dedicated Einstein Machine?  I've had many talks with Keith over the years about the value of dedicated machines for each project. Having multi-project machines injects unknown variables into the mix.

The GPU's run E@H and optionally S@H if something shows up. It is a multi-project machine.

This cc_config.xml file has not been changed since before the Pulsar tasks suddenly stopped running.

<cc_config>
 <log_flags>
   <sched_op_debug>1</sched_op_debug>
 </log_flags>
 <options>
   <use_all_gpus>1</use_all_gpus>
   <save_stats_days>90</save_stats_days>
   <max_file_xfers>4</max_file_xfers>
   <max_file_xfers_per_project>2</max_file_xfers_per_project>   
   <no_alt_platform>1</no_alt_platform>
 </options>
</cc_config>

<max_tasks_reported>50</max_tasks_reported>

Tom M

A Proud member of the O.F.A.  (Old Farts Association).  Be well, do good work, and keep in touch.® (Garrison Keillor)

Tom M
Tom M
Joined: 2 Feb 06
Posts: 5586
Credit: 7673909490
RAC: 1783353

The good news is with the NNT

The good news is with the NNT the Gravity Waves supply of tasks are showing real signs of running out within a few more days at worse or maybe today.

This could be a self-correcting issue.

Tom

A Proud member of the O.F.A.  (Old Farts Association).  Be well, do good work, and keep in touch.® (Garrison Keillor)

Zalster
Zalster
Joined: 26 Nov 13
Posts: 3117
Credit: 4050672230
RAC: 0

@Tom I don't see where you

@Tom

I don't see where you are excluding 3 of 5 GPUs from your machine in the cc_config file.  From what I see, you would be using all GPUs for GW. Where did you put the <exclude>?
Z

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5842
Credit: 109391646756
RAC: 35875760

Tom M wrote:So why was Boinc

Tom M wrote:
So why was Boinc Manager not running the Pulsar app for which I had all sorts of data files?

Firstly, data files (.dat extensions) aren't tasks.  Do you mean data files or do you mean tasks?  None of the physical files you can see are tasks.  Tasks are essentially just parameter entries in the state file.

If you do mean tasks, they will be done in FIFO order.  You can override the FIFO queue by suspending the entries that would run next, if your aim is to run a different job that is further down the queue.

If you do actually mean that you have data files but you don't have any jobs to run, the following may help.  You mentioned you had settings of 0.1 and 0.1 for your work cache.  The way that works is that your client will initially request GPU work (of whatever description) so that you have at least 0.2 days worth (at current estimates) on board.  No further requests for GPU work (of any description) will occur until you have less than 0.1 days worth (at current estimates) remaining.  If you want to always have at least 0.2 days worth on hand, it's probably better to put all of that in the 1st setting and leave the 2nd setting at zero.  Otherwise, the amount on hand may tend to cycle up and down a bit.  Admittedly, there's not much 'cycling' that can happen between 0.1 and 0.2 so it's a minor point only.  It would become much more visible if someone set 0.1 and 1.0 as the two values.

So if you currently have more than 0.1 days worth of work on board, your client wont request any sort of GPU work even if there are currently no FGRPB1G tasks on board.

The other thing you need to realise is that when there is a choice between the FGRP and GW versions of GPU tasks, there is a 'server-side mechanism' (set by staff) that will influence the choice made by the scheduler.  At the moment, that mechanism seems to be set so as to prefer the sending of GW work, a lot of the time.  It's quite easy to imagine periods where there is no FGRPB1G work on board.  That is quite understandable, since the 'holy grail' for this project is the first ever detection of continuous GW.  The scientists will want to get the GW work done as quickly as possible so sending it is likely to be prioritised.

If you want to get some FGRPB1G tasks, just deselect the GW type (perhaps temporarily) and once that change is in place, increase your work cache size enough to allow your client to make a new work request where the scheduler has only one choice in what it can send.  If you don't do something like that, chances are that you will just get more GW work.

I don't know if I'm properly understanding what you are trying to do.  Please correct me if I'm not.

Cheers,
Gary.

petri33
petri33
Joined: 4 Mar 20
Posts: 117
Credit: 3341045819
RAC: 0

To decrease CPU-usage I have

To decrease CPU-usage I have tried this: (with lowered CPU usage, higher GPU utilization and lower throughput :( 

1) Modify this:


#define _GNU_SOURCE
#include <time.h>
#include <errno.h>
#include <dlfcn.h>
#include <poll.h>
#include <CL/cl.h>


/*
 * To compile run:
 * gcc -O2 -fPIC -shared -Wl,-soname,libsleep.so -o libsleep.so libsleep.c
 *
 * To use (with seti@home): (modify to suit your needs)
 *  daemon --check $BOINCEXE --user $BOINCUSER +10 "LD_PRELOAD=/etc/init.d/libsleep.so $BOINCEXE $BOINCOPTS --dir $BOINCDIR >>$LOGFILE 2>>$ERRORLOG &" >& /dev/null
*
*
*
*/

void testing(void) __attribute__((constructor));

void testing(void)
{
  //
}

int myInternalSleep(int us)
{
  struct timespec t;
  struct timespec r;

  t.tv_sec  = 0;
  t.tv_nsec = 1000L * (long)us; // 1 us

  nanosleep(&t, &r); // never mind if interrupted
 
  /*
  while(nanosleep(&t, &r) == -1 && errno == EINTR)
    { //continue sleeping if interrupted
      t.tv_sec = r.tv_sec;
      t.tv_nsec = r.tv_nsec;
    }
  */
 
  return 0;
}


int sched_yield(void)
{
  myInternalSleep(1);
}

/*
static int (*original_clFinish)(cl_command_queue a) = NULL;
*/

int clFinish(cl_command_queue a)
{
  int (*original_clFinish)(cl_command_queue a);
  //if(original_clFinish == NULL)
  original_clFinish = dlsym(RTLD_NEXT, "clFinish");
 
  myInternalSleep(1);
 
  int ret = (*original_clFinish)(a);

  return ret;
}

/*
static int (*original_pthread_cond_wait)(void *cond, void *mutex) = NULL;

int pthread_cond_wait(void *cond, void *mutex)
{
  if(original_pthread_cond_wait == NULL)
    original_pthread_cond_wait = dlsym(RTLD_NEXT, "pthread_cond_wait");
 
  myInternalSleep(10);
 
  return original_pthread_cond_wait(cond, mutex);
}


static int (*original_poll)(struct pollfd fds[], nfds_t nfds, int timeout) = NULL;

int poll(struct pollfd fds[], nfds_t nfds, int timeout)
{
  if(original_poll == NULL)
    original_poll = dlsym(RTLD_NEXT, "poll");
 
  myInternalSleep(10);
 
  return original_poll(fds, nfds, timeout);
}
*/

/*

static cl_int (*original_clEnqueueNDRangeKernel)(cl_command_queue command_queue,
                    cl_kernel kernel,
                    cl_uint work_dim,
                    const size_t *global_work_offset,
                    const size_t *global_work_size,
                    const size_t *local_work_size,
                    cl_uint num_events_in_wait_list,
                    const cl_event *event_wait_list,
                    cl_event *event) = NULL;
*/
static int w = 3;
cl_int clEnqueueNDRangeKernel(cl_command_queue command_queue,
                    cl_kernel kernel,
                    cl_uint work_dim,
                    const size_t *global_work_offset,
                    const size_t *global_work_size,
                    const size_t *local_work_size,
                    cl_uint num_events_in_wait_list,
                    const cl_event *event_wait_list,
                    cl_event *event)
{
  cl_int (*original_clEnqueueNDRangeKernel)(cl_command_queue command_queue,
                        cl_kernel kernel,
                        cl_uint work_dim,
                        const size_t *global_work_offset,
                        const size_t *global_work_size,
                        const size_t *local_work_size,
                        cl_uint num_events_in_wait_list,
                        const cl_event *event_wait_list,
                        cl_event *event) = NULL;
  //if(original_clEnqueueNDRangeKernel == NULL)
  original_clEnqueueNDRangeKernel = dlsym(RTLD_NEXT, "clEnqueueNDRangeKernel");
 
  cl_int i = original_clEnqueueNDRangeKernel(command_queue,
                     kernel,
                     work_dim,
                     global_work_offset,
                     global_work_size,
                     local_work_size,
                     num_events_in_wait_list,
                     event_wait_list,
                     event);

  myInternalSleep(w);

  w = (w + 191) & 255;
 
  return i;
}

/*
CLFFTAPI clfftStatus    clfftEnqueueTransform(
                                                clfftPlanHandle plHandle,
                                                clfftDirection dir,
                                                cl_uint numQueuesAndEvents,
                                                cl_command_queue* commQueues,
                                                cl_uint numWaitEvents,
                                                const cl_event* waitEvents,
                                                cl_event* outEvents,
                                                cl_mem* inputBuffers,
                                                cl_mem* outputBuffers,
                                                cl_mem tmpBuffer
                                                );
*/

//To be implemented

2) Suspend GPU work.

# gcc -I/usr/local/cuda-10.2/targets/x86_64-linux/include/ -O2 -fPIC -shared -Wl,-soname,libsleep.so -o libsleep.so libsleep.c

# cp ~myhome/libsleep/libsleep.so /usr/lib/libsleep.so
# sync

3) resume GPU work.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.