BOINC, Condor, CUDA and multiple tasks per GPU

joe areeda

Joined: 13 Dec 10

Posts: 285

Credit: 320378898

RAC: 0

21 Aug 2012 1:54:50 UTC

Topic 196490

(moderation:

)

I'm in the process of moving E@H to a backfill Condor job. The good news is that we'll have more cores processing the bad news is that I can't figure out how to run multiple jobs on each GPU but I'll probably have a smaller percentage of time allocated to E@H.

How is this done in BRPR4? Do we have something special that allows more than one WU per GPU or can I just run 2 or 3 jobs per GPU?

Has anyone implemented this already who can explain how to do it?

Joe

archae86

Joined: 6 Dec 05

Posts: 3163

Credit: 7347941687

RAC: 2064927

BOINC, Condor, CUDA and multiple tasks per GPU

21 Aug 2012 3:13:05 UTC

Message 111128

(moderation:

)

In the old days this called for an app_info, so not so easy. Now, however, just go to your account at Einstein@home, select Einstein@home preferences, edit for the location (a.k.a. venue) of your host the following parameter:

GPU utilization factor of BRP apps

The value of 1.0 gives one active WU, the value of 0.5 gives two, 0.33 gives three. You are not likely to find more than three helpful (the big leap is often from one to two), and may not find it possible.

The current Einstein BRP ap requires rather a lot of support from the associated CPU task. Many of us have found total system output to be higher if we restrict BOINC to fewer than the maximum number of CPU tasks when running BRP on a GPU. The reduction in latency-imposed waiting of the GPU task in these cases increases productivity more than enough to pay for the loss of CPU work. The detailed tradeoffs are quite dependent on host characteristics, so experimentation is key. Happily the Einstein BRP jobs seem to be very consistent in computation requirement--so a small sample can be enough in many cases.

joe areeda

Joined: 13 Dec 10

Posts: 285

Credit: 320378898

RAC: 0

Thanks archae86. I have that

21 Aug 2012 11:28:43 UTC

Message 111129 in response to message 111128

(moderation:

)

Thanks archae86. I have that set up when running boinc as a regular user but my question is really "will that work if I'm running boinc as a Condor backfill task?"

Condor (http://research.cs.wisc.edu/condor/) is similar to Boinc in theory but much more flexible (read complicated to set up). It assigns jobs to cores or slots and as far as I can tell so far a GPU has to be assigned to one and only one of these.

We think the Einstein developers did something special to get more than one WU on a GPU and if so I wonder how special.

Joe

archae86

Joined: 6 Dec 05

Posts: 3163

Credit: 7347941687

RAC: 2064927

Joe--sorry for misconstruing

21 Aug 2012 13:57:49 UTC

Message 111130

(moderation:

)

Joe--sorry for misconstruing your question--and I know nothing on the actual one.

Horacio

Joined: 3 Oct 11

Posts: 205

Credit: 80557243

RAC: 0

RE: Thanks archae86. I

21 Aug 2012 14:14:13 UTC

Message 111131 in response to message 111129

(moderation:

)

Quote:

Thanks archae86. I have that set up when running boinc as a regular user but my question is really "will that work if I'm running boinc as a Condor backfill task?"

Condor (http://research.cs.wisc.edu/condor/) is similar to Boinc in theory but much more flexible (read complicated to set up). It assigns jobs to cores or slots and as far as I can tell so far a GPU has to be assigned to one and only one of these.

We think the Einstein developers did something special to get more than one WU on a GPU and if so I wonder how special.

Joe

How is your setup? AFAIK Condor is a kind of cluster, so are you running a separated instance of BOINC in each core/slot or are you running BOINC in a kind a virtual supercomputer made of all the available resources? (I guess you are using BOINC, true?)

Anyway, the apps has nothing special, its the BOINC client who reads the utilization factor (or the tags in app_info) and then starts as many instances of the GPU app as it can until the whole GPU is used (or the unused fraction is not enough for another instance)

The only speciall thing in Einstein is that they use a customized version of the BOINC server software that allows to set the utilization factor in the prefferences page which save us from the burden of the ap_info maintenance...

joe areeda

Joined: 13 Dec 10

Posts: 285

Credit: 320378898

RAC: 0

Thanks Horacio, Just

21 Aug 2012 15:13:49 UTC

Message 111132

(moderation:

)

Thanks Horacio,

Just getting started with this. My current set up has each hyperthread assigned a slot. Boinc will be added as a backfill so when a slot is idle it will run as a separate instance in each slot.

I believe we can assign multiple cores/hyperthreads so some slots but I haven't implemented that yet.

I'll give it a try with 2 or 3 slots advertising they have a gpu available and see what happens.

Joe

Alex

Joined: 1 Mar 05

Posts: 451

Credit: 521113479

RAC: 251462

Try

21 Aug 2012 15:42:59 UTC

Message 111133

(moderation:

)

Try this:
https://nmi.cs.wisc.edu/node/1753

Alexander

joe areeda

Joined: 13 Dec 10

Posts: 285

Credit: 320378898

RAC: 0

RE: Try

21 Aug 2012 22:03:34 UTC

Message 111134 in response to message 111133

(moderation:

)

Quote:

Try this:
https://nmi.cs.wisc.edu/node/1753

Alexander

Thanks Alexander that's one of the better descriptions of backfill that I've seen.

Joe

BOINC, Condor, CUDA and multiple tasks per GPU

Forums › Cruncher's Corner

BOINC, Condor, CUDA and multiple tasks per GPU

Thanks archae86. I have that

Joe--sorry for misconstruing

RE: Thanks archae86. I

Thanks Horacio, Just

Try

RE: Try

Comment viewing options

Forums › Cruncher's Corner