Would like to volunteer some time on a HPC

teslatech

Joined: 29 Jan 11

Posts: 14

Credit: 50724666

RAC: 0

18 Jan 2013 22:29:03 UTC

Topic 196751

(moderation:

)

At my job I have access to a decently powerful cluster(right now 1048 cores, 25 nodes) that uses TORQUE with pbs scrips to submit jobs. Most of the time the cluster is running at under 50% load. Has anyone done any thing with a system like this before. I would like to submit jobs(single work units) to single nodes when they are not being used.

Now this sort of system does not fit into the normal boinc client.

We have single node queues and multi node queues.

Bernd Machenschalk

Moderator

Administrator

Joined: 15 Oct 04

Posts: 4330

Credit: 251467148

RAC: 36534

Would like to volunteer some time on a HPC

18 Jan 2013 23:21:20 UTC

Message 114429

(moderation:

)

There are a few configuration and command-line options in recent BOINC Clients that support processing single jobs. I would suggest

--attach_project
to attach the client to a project. Alternatively supply an account_.xml file in the Clients CWD with account information. I would suggest to create a new account for this cluster jobs and browsing the computing- and project-specific settings of this account before attaching any client to that account.

--fetch_minimal_work
Get only one task per CPU core or GPU

--exit_when_idle
Exit the client when there are no more tasks, and report completed tasks immediately.

--no_gui_rpc
Don't make a socket for GUI communication.

--no_priority_change
Run apps at same priority as client (or else they would be niced).

--redirectio
Redirect stdout and stderr to log files. Else they'll output to the command window.

Much of this and more could also be configured in a client configuration file to be supplied in the same directory where the client is to be ran.

Alternatively you may want to take a look at BoincLite for a very much simplified BOINC Client that might be more suitable for this purpose than a full-featured BOINC Core Client.

PS:

This expects you to submit a boinc client as a cluster job. On launch the client will contact the project scheduler, download the application and data via http, and also (try to) upload the result file(s) and report the result itself. This requires http access to the outside world from the cluster nodes.

To avoid such communication of the nodes, you could use --exit_before_start and --exit_after_finish to interrupt the client before and after processing the job. The procedure will be

* on a headnode / submit machine / workstation with web access start a client with the above configuration and --exit_before_start. When it exits, it should have downloaded a task and all data necessary to process it. tar / zip / whatever together the CWD of the client, preserving the directory structure (in particular including projects/ and slots/ directories).

* Submit a job that unpacks this directory on the node, starts the client - this time with --exit_after_finish - and after the client exited again packs together the whole directory structure.

* back on the workstation unpack the structure again and run the client a third time - this time with --exit_when_idle - to finally upload and report the result.

teslatech

Joined: 29 Jan 11

Posts: 14

Credit: 50724666

RAC: 0

Awesome!!! I will look into

18 Jan 2013 23:46:39 UTC

Message 114430 in response to message 114429

(moderation:

)

Awesome!!! I will look into that!

Would love to put our nodes to work when no one else is using it.

Bernd Machenschalk

Moderator

Administrator

Joined: 15 Oct 04

Posts: 4330

Credit: 251467148

RAC: 36534

You're welcome. I got a

18 Jan 2013 23:57:14 UTC

Message 114431

(moderation:

)

You're welcome.

I got a few such requests over time. I would appreciate if you could post your experiences here (what you ended up doing and found working etc.) for others to learn from.

teslatech

Joined: 29 Jan 11

Posts: 14

Credit: 50724666

RAC: 0

Thanks for that edition.

18 Jan 2013 23:57:21 UTC

Message 114432

(moderation:

)

Thanks for that edition. That is exactly what I need to know.

I will let you know how it goes.

joe areeda

Joined: 13 Dec 10

Posts: 285

Credit: 320378898

RAC: 0

I'd just like to add one

19 Jan 2013 16:50:30 UTC

Message 114433

(moderation:

)

I'd just like to add one alternative that might be of interest.

If your cluster is running Condor as the job manager you can configure it to back fill E@H jobs (probably other boinc projects too but ...) That way any real jobs will take precedence and kick out the E@H job (properly checkpointed of course) but E@H can automagically use any portion of the idle time you wish.

It that will help I'll dig up the documentation on how to do it. They may be on private web pages. I'm not the one who set it up on the clusters I use.

Joe

Gaurav Khanna

Joined: 8 Nov 04

Posts: 42

Credit: 30716708044

RAC: 11876807

This was helpful to me too.

5 Feb 2013 1:33:07 UTC

Message 114434

(moderation:

)

This was helpful to me too. Thanks, Bernd.

One question. Is there a way to control how much work is downloaded with --exit_before_start? I'd like to download extra work (not just minimum) to prepare reasonable duration jobs.

Thanks,
Gaurav

Bernd Machenschalk

Moderator

Administrator

Joined: 15 Oct 04

Posts: 4330

Credit: 251467148

RAC: 36534

RE: Is there a way to

5 Feb 2013 9:11:00 UTC

Message 114435 in response to message 114434

(moderation:

)

Quote:

Is there a way to control how much work is downloaded with --exit_before_start?

None that I know of. Once the first task has been downloaded (and, if you have the wrong client version, even before that) the client will start that task - (or exit when given --exit_before_start), even if it has not yet finished downloading the data for additional tasks.

You'll either need to manually run the client on the workstation and watch when it has finished downloading all the tasks that it got. Then the work fetched depends on your "work cache" preference settings.

Or you could wrap a script around the procedure described above, that downloads a fixed number n of tasks by running n clients with --fetch_minimal_work --exit_before_start, submits a job that unpacks/runs/packs these tasks one after the other, and finally runs all the clients again with --exit_when_idle to upload and report the tasks.

Would like to volunteer some time on a HPC

Forums › Cruncher's Corner

Would like to volunteer some time on a HPC

Awesome!!! I will look into

You're welcome. I got a

Thanks for that edition.

I'd just like to add one

This was helpful to me too.

RE: Is there a way to

Comment viewing options

Forums › Cruncher's Corner