BOINC, Condor, CUDA and multiple tasks per GPU

joe areeda
joe areeda
Joined: 13 Dec 10
Posts: 285
Credit: 320378898
RAC: 0
Topic 196490

I'm in the process of moving E@H to a backfill Condor job. The good news is that we'll have more cores processing the bad news is that I can't figure out how to run multiple jobs on each GPU but I'll probably have a smaller percentage of time allocated to E@H.

How is this done in BRPR4? Do we have something special that allows more than one WU per GPU or can I just run 2 or 3 jobs per GPU?

Has anyone implemented this already who can explain how to do it?

Joe

archae86
archae86
Joined: 6 Dec 05
Posts: 3146
Credit: 7062874931
RAC: 1214264

BOINC, Condor, CUDA and multiple tasks per GPU

In the old days this called for an app_info, so not so easy. Now, however, just go to your account at Einstein@home, select Einstein@home preferences, edit for the location (a.k.a. venue) of your host the following parameter:

GPU utilization factor of BRP apps

The value of 1.0 gives one active WU, the value of 0.5 gives two, 0.33 gives three. You are not likely to find more than three helpful (the big leap is often from one to two), and may not find it possible.

The current Einstein BRP ap requires rather a lot of support from the associated CPU task. Many of us have found total system output to be higher if we restrict BOINC to fewer than the maximum number of CPU tasks when running BRP on a GPU. The reduction in latency-imposed waiting of the GPU task in these cases increases productivity more than enough to pay for the loss of CPU work. The detailed tradeoffs are quite dependent on host characteristics, so experimentation is key. Happily the Einstein BRP jobs seem to be very consistent in computation requirement--so a small sample can be enough in many cases.

joe areeda
joe areeda
Joined: 13 Dec 10
Posts: 285
Credit: 320378898
RAC: 0

Thanks archae86. I have that

Thanks archae86. I have that set up when running boinc as a regular user but my question is really "will that work if I'm running boinc as a Condor backfill task?"

Condor (http://research.cs.wisc.edu/condor/) is similar to Boinc in theory but much more flexible (read complicated to set up). It assigns jobs to cores or slots and as far as I can tell so far a GPU has to be assigned to one and only one of these.

We think the Einstein developers did something special to get more than one WU on a GPU and if so I wonder how special.

Joe

archae86
archae86
Joined: 6 Dec 05
Posts: 3146
Credit: 7062874931
RAC: 1214264

Joe--sorry for misconstruing

Joe--sorry for misconstruing your question--and I know nothing on the actual one.

Horacio
Horacio
Joined: 3 Oct 11
Posts: 205
Credit: 80557243
RAC: 0

RE: Thanks archae86. I

Quote:

Thanks archae86. I have that set up when running boinc as a regular user but my question is really "will that work if I'm running boinc as a Condor backfill task?"

Condor (http://research.cs.wisc.edu/condor/) is similar to Boinc in theory but much more flexible (read complicated to set up). It assigns jobs to cores or slots and as far as I can tell so far a GPU has to be assigned to one and only one of these.

We think the Einstein developers did something special to get more than one WU on a GPU and if so I wonder how special.

Joe

How is your setup? AFAIK Condor is a kind of cluster, so are you running a separated instance of BOINC in each core/slot or are you running BOINC in a kind a virtual supercomputer made of all the available resources? (I guess you are using BOINC, true?)

Anyway, the apps has nothing special, its the BOINC client who reads the utilization factor (or the tags in app_info) and then starts as many instances of the GPU app as it can until the whole GPU is used (or the unused fraction is not enough for another instance)

The only speciall thing in Einstein is that they use a customized version of the BOINC server software that allows to set the utilization factor in the prefferences page which save us from the burden of the ap_info maintenance...

joe areeda
joe areeda
Joined: 13 Dec 10
Posts: 285
Credit: 320378898
RAC: 0

Thanks Horacio, Just

Thanks Horacio,

Just getting started with this. My current set up has each hyperthread assigned a slot. Boinc will be added as a backfill so when a slot is idle it will run as a separate instance in each slot.

I believe we can assign multiple cores/hyperthreads so some slots but I haven't implemented that yet.

I'll give it a try with 2 or 3 slots advertising they have a gpu available and see what happens.

Joe

Alex
Alex
Joined: 1 Mar 05
Posts: 451
Credit: 500988352
RAC: 63086

Try

joe areeda
joe areeda
Joined: 13 Dec 10
Posts: 285
Credit: 320378898
RAC: 0

RE: Try

Quote:

Try this:
https://nmi.cs.wisc.edu/node/1753

Alexander


Thanks Alexander that's one of the better descriptions of backfill that I've seen.

Joe

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.