Maximizing Nvidia production

Tom M

Joined: 2 Feb 06

Posts: 6831

Credit: 9756068855

RAC: 2526655

18 Apr 2020 14:12:56 UTC

Topic 222009

(moderation:

)

I just noticed that often the processing load on my Nvidia gpus is under 50%.

Does this mean I would get more total production if I were to increase the # of tasks per gpu to 2?

Tom M

A Proud member of the O.F.A. (Old Farts Association).

Ian&Steve C.

Joined: 19 Jan 20

Posts: 4149

Credit: 49595441888

RAC: 36943993

I assume you mean with the

18 Apr 2020 14:47:51 UTC

Message 176799

(moderation:

)

I assume you mean with the Gravity Wave tasks?

the GW tasks need a lot of CPU support. So unless you have a powerful CPU you won’t get very high GPU utilization.

running 2x tasks per GPU helps, at the expense of using more CPU resources. Do you really want to use 2+ thread for each GPU? I know you like to run CPU work too, so using so much of the CPU just to feed the GPU seems like a waste.

_________________________________________________________________________

Tom M

Joined: 2 Feb 06

Posts: 6831

Credit: 9756068855

RAC: 2526655

Your right. The cpu work has

18 Apr 2020 18:00:42 UTC

Message 176803 in response to message 176799

(moderation:

)

Your right. The cpu work has the priority so I need to run 1 gpu task on 1 thread.

---edit---

I am trying 1.25 cpus per gpu to see if I can drive the trend of the gpu utilization up for Gravity Waves.

I have also dropped GW from my selected gpu apps list. And disabled the "run none selected apps".

Unfortunately, E@H has buried me in "several" tasks so it will be a while before I get the GW's processed

--edit--

Thank you.

Tom M

A Proud member of the O.F.A. (Old Farts Association).

DanNeely

Joined: 4 Sep 05

Posts: 1364

Credit: 3594328935

RAC: 669469

If you don't want to run the

19 Apr 2020 3:56:58 UTC

Message 176836

(moderation:

)

If you don't want to run the GW GPU tasks you can just abort them. As long as you've also got at least a few Fermi GPU tasks that will report success when finished there won't be any major disruption to your supply of new tasks.

Tom M

Joined: 2 Feb 06

Posts: 6831

Credit: 9756068855

RAC: 2526655

That is odd. While I have

20 Apr 2020 11:31:43 UTC

Message 176882

(moderation:

)

That is odd. While I have turned off Gravity Wave and then restricted Gravity Wave gpu to 2 gpus out of 5 available gpus, as of this morning my system wasn't crunching Pulsar GPU at all.

I checked available tasks and there was still Pulsar tasks available.

I had reset the cache to 0.1 day and 0.1 additional so the backoff was very high. I did an manual update.

When I renamed app_config.xml to app_config_stop.xml and read the config files again, 3 more gpu's started up running Gravity Wave.

So why was Boinc Manager not running the Pulsar app for which I had all sorts of data files?

---edit----

Add cold boots, setting the cache to 1 as things that don't start the Pulsar gpu tasks

Examined the log, no errors listed,

Switched to NNT so if the last resort of reseting the project becomes necessary I can run out as many tasks as I can before I do.

-edit---

Tom M

A Proud member of the O.F.A. (Old Farts Association).

Zalster

Joined: 26 Nov 13

Posts: 3117

Credit: 4050672230

RAC: 0

Probably would help to see

20 Apr 2020 15:16:04 UTC

Message 176893

(moderation:

)

Probably would help to see your cc_config. Also, is this a dedicated Einstein Machine? I've had many talks with Keith over the years about the value of dedicated machines for each project. Having multi-project machines injects unknown variables into the mix.

Tom M

Joined: 2 Feb 06

Posts: 6831

Credit: 9756068855

RAC: 2526655

Zalster wrote:Probably would

20 Apr 2020 17:21:48 UTC

Message 176898 in response to message 176893

(moderation:

)

Zalster wrote:

Probably would help to see your cc_config. Also, is this a dedicated Einstein Machine? I've had many talks with Keith over the years about the value of dedicated machines for each project. Having multi-project machines injects unknown variables into the mix.

The GPU's run E@H and optionally S@H if something shows up. It is a multi-project machine.

This cc_config.xml file has not been changed since before the Pulsar tasks suddenly stopped running.

<cc_config>
<log_flags>
   <sched_op_debug>1</sched_op_debug>
</log_flags>
<options>
   <use_all_gpus>1</use_all_gpus>
   <save_stats_days>90</save_stats_days>
   <max_file_xfers>4</max_file_xfers>
   <max_file_xfers_per_project>2</max_file_xfers_per_project>
   <no_alt_platform>1</no_alt_platform>
</options>
</cc_config>

<max_tasks_reported>50</max_tasks_reported>

Tom M

A Proud member of the O.F.A. (Old Farts Association).

Tom M

Joined: 2 Feb 06

Posts: 6831

Credit: 9756068855

RAC: 2526655

The good news is with the NNT

20 Apr 2020 17:34:21 UTC

Message 176900

(moderation:

)

The good news is with the NNT the Gravity Waves supply of tasks are showing real signs of running out within a few more days at worse or maybe today.

This could be a self-correcting issue.

Tom

A Proud member of the O.F.A. (Old Farts Association).

Zalster

Joined: 26 Nov 13

Posts: 3117

Credit: 4050672230

RAC: 0

@Tom I don't see where you

20 Apr 2020 19:32:08 UTC

Message 176902 in response to message 176898

(moderation:

)

@Tom

I don't see where you are excluding 3 of 5 GPUs from your machine in the cc_config file. From what I see, you would be using all GPUs for GW. Where did you put the <exclude>?
Z

Gary Roberts

Moderator

Joined: 9 Feb 05

Posts: 5887

Credit: 119413888975

RAC: 25923026

Tom M wrote:So why was Boinc

20 Apr 2020 23:38:28 UTC

Message 176918 in response to message 176882

(moderation:

)

Tom M wrote:

So why was Boinc Manager not running the Pulsar app for which I had all sorts of data files?

Firstly, data files (.dat extensions) aren't tasks. Do you mean data files or do you mean tasks? None of the physical files you can see are tasks. Tasks are essentially just parameter entries in the state file.

If you do mean tasks, they will be done in FIFO order. You can override the FIFO queue by suspending the entries that would run next, if your aim is to run a different job that is further down the queue.

If you do actually mean that you have data files but you don't have any jobs to run, the following may help. You mentioned you had settings of 0.1 and 0.1 for your work cache. The way that works is that your client will initially request GPU work (of whatever description) so that you have at least 0.2 days worth (at current estimates) on board. No further requests for GPU work (of any description) will occur until you have less than 0.1 days worth (at current estimates) remaining. If you want to always have at least 0.2 days worth on hand, it's probably better to put all of that in the 1st setting and leave the 2nd setting at zero. Otherwise, the amount on hand may tend to cycle up and down a bit. Admittedly, there's not much 'cycling' that can happen between 0.1 and 0.2 so it's a minor point only. It would become much more visible if someone set 0.1 and 1.0 as the two values.

So if you currently have more than 0.1 days worth of work on board, your client wont request any sort of GPU work even if there are currently no FGRPB1G tasks on board.

The other thing you need to realise is that when there is a choice between the FGRP and GW versions of GPU tasks, there is a 'server-side mechanism' (set by staff) that will influence the choice made by the scheduler. At the moment, that mechanism seems to be set so as to prefer the sending of GW work, a lot of the time. It's quite easy to imagine periods where there is no FGRPB1G work on board. That is quite understandable, since the 'holy grail' for this project is the first ever detection of continuous GW. The scientists will want to get the GW work done as quickly as possible so sending it is likely to be prioritised.

If you want to get some FGRPB1G tasks, just deselect the GW type (perhaps temporarily) and once that change is in place, increase your work cache size enough to allow your client to make a new work request where the scheduler has only one choice in what it can send. If you don't do something like that, chances are that you will just get more GW work.

I don't know if I'm properly understanding what you are trying to do. Please correct me if I'm not.

Cheers,
Gary.

petri33

Joined: 4 Mar 20

Posts: 129

Credit: 4399543144

RAC: 5088435

To decrease CPU-usage I have

25 Apr 2020 20:25:41 UTC

Message 177109

(moderation:

)

To decrease CPU-usage I have tried this: (with lowered CPU usage, higher GPU utilization and lower throughput :(

1) Modify this:


#define _GNU_SOURCE
#include <time.h>
#include <errno.h>
#include <dlfcn.h>
#include <poll.h>
#include <CL/cl.h>


/*
 * To compile run:
 * gcc -O2 -fPIC -shared -Wl,-soname,libsleep.so -o libsleep.so libsleep.c
 *
 * To use (with seti@home): (modify to suit your needs)
 *  daemon --check $BOINCEXE --user $BOINCUSER +10 "LD_PRELOAD=/etc/init.d/libsleep.so $BOINCEXE $BOINCOPTS --dir $BOINCDIR >>$LOGFILE 2>>$ERRORLOG &" >& /dev/null
*
*
*
*/

void testing(void) __attribute__((constructor));

void testing(void)
{
  //
}

int myInternalSleep(int us)
{
  struct timespec t;
  struct timespec r;

  t.tv_sec  = 0;
  t.tv_nsec = 1000L * (long)us; // 1 us

  nanosleep(&t, &r); // never mind if interrupted
 
  /*
  while(nanosleep(&t, &r) == -1 && errno == EINTR)
    { //continue sleeping if interrupted
      t.tv_sec = r.tv_sec;
      t.tv_nsec = r.tv_nsec;
    }
  */
 
  return 0;
}


int sched_yield(void)
{
  myInternalSleep(1);
}

/*
static int (*original_clFinish)(cl_command_queue a) = NULL;
*/

int clFinish(cl_command_queue a)
{
  int (*original_clFinish)(cl_command_queue a);
  //if(original_clFinish == NULL)
  original_clFinish = dlsym(RTLD_NEXT, "clFinish");
 
  myInternalSleep(1);
 
  int ret = (*original_clFinish)(a);

  return ret;
}

/*
static int (*original_pthread_cond_wait)(void *cond, void *mutex) = NULL;

int pthread_cond_wait(void *cond, void *mutex)
{
  if(original_pthread_cond_wait == NULL)
    original_pthread_cond_wait = dlsym(RTLD_NEXT, "pthread_cond_wait");
 
  myInternalSleep(10);
 
  return original_pthread_cond_wait(cond, mutex);
}


static int (*original_poll)(struct pollfd fds[], nfds_t nfds, int timeout) = NULL;

int poll(struct pollfd fds[], nfds_t nfds, int timeout)
{
  if(original_poll == NULL)
    original_poll = dlsym(RTLD_NEXT, "poll");
 
  myInternalSleep(10);
 
  return original_poll(fds, nfds, timeout);
}
*/

/*

static cl_int (*original_clEnqueueNDRangeKernel)(cl_command_queue command_queue,
                    cl_kernel kernel,
                    cl_uint work_dim,
                    const size_t *global_work_offset,
                    const size_t *global_work_size,
                    const size_t *local_work_size,
                    cl_uint num_events_in_wait_list,
                    const cl_event *event_wait_list,
                    cl_event *event) = NULL;
*/
static int w = 3;
cl_int clEnqueueNDRangeKernel(cl_command_queue command_queue,
                    cl_kernel kernel,
                    cl_uint work_dim,
                    const size_t *global_work_offset,
                    const size_t *global_work_size,
                    const size_t *local_work_size,
                    cl_uint num_events_in_wait_list,
                    const cl_event *event_wait_list,
                    cl_event *event)
{
  cl_int (*original_clEnqueueNDRangeKernel)(cl_command_queue command_queue,
                        cl_kernel kernel,
                        cl_uint work_dim,
                        const size_t *global_work_offset,
                        const size_t *global_work_size,
                        const size_t *local_work_size,
                        cl_uint num_events_in_wait_list,
                        const cl_event *event_wait_list,
                        cl_event *event) = NULL;
  //if(original_clEnqueueNDRangeKernel == NULL)
  original_clEnqueueNDRangeKernel = dlsym(RTLD_NEXT, "clEnqueueNDRangeKernel");
 
  cl_int i = original_clEnqueueNDRangeKernel(command_queue,
                     kernel,
                     work_dim,
                     global_work_offset,
                     global_work_size,
                     local_work_size,
                     num_events_in_wait_list,
                     event_wait_list,
                     event);

  myInternalSleep(w);

  w = (w + 191) & 255;
 
  return i;
}

/*
CLFFTAPI clfftStatus    clfftEnqueueTransform(
                                                clfftPlanHandle plHandle,
                                                clfftDirection dir,
                                                cl_uint numQueuesAndEvents,
                                                cl_command_queue* commQueues,
                                                cl_uint numWaitEvents,
                                                const cl_event* waitEvents,
                                                cl_event* outEvents,
                                                cl_mem* inputBuffers,
                                                cl_mem* outputBuffers,
                                                cl_mem tmpBuffer
                                                );
*/

//To be implemented

2) Suspend GPU work.

# gcc -I/usr/local/cuda-10.2/targets/x86_64-linux/include/ -O2 -fPIC -shared -Wl,-soname,libsleep.so -o libsleep.so libsleep.c

# cp ~myhome/libsleep/libsleep.so /usr/lib/libsleep.so
# sync

3) resume GPU work.

Maximizing Nvidia production

Forums › Cruncher's Corner

Comment viewing options

Forums › Cruncher's Corner