Limiting BRP4 to CUDA machines

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4312
Credit: 250458547
RAC: 35072
Topic 195884

As we continue to have problems generating enough BRP4 work, for now we will disable BRP4 CPU application versions and thus will ship BRP4 work only to CUDA machines to feed at least these.

I've done everything I could on the software side, but I'm afraid the problems with work generation could only be resolved by some serious hardware upgrade to the machines running the work unit generator and serving the data files. This will need coordination (it's vacation time in Germany), planning, ordering new hardware etc., and thus won't be done in the next few days.

BM

BM

Rechenkuenstler
Rechenkuenstler
Joined: 22 Aug 10
Posts: 138
Credit: 102567115
RAC: 0

Limiting BRP4 to CUDA machines

That's OK. There is so many work for CPU. Nobody must be afraid to run out of CPU work.

Allen Clifford
Allen Clifford
Joined: 24 Aug 10
Posts: 17
Credit: 827193
RAC: 0

No problem. I'd rather

No problem.

I'd rather find a gravity wave than a pulsar anyways ;)

oz
oz
Joined: 28 Feb 05
Posts: 7
Credit: 54902288
RAC: 0

As far as I know BRP's are

As far as I know BRP's are g-wave eligible candidates. So go on Einstein at home

Stranger7777
Stranger7777
Joined: 17 Mar 05
Posts: 436
Credit: 429644839
RAC: 76303

OTOH, may be it is better to

OTOH, may be it is better to generate workunits only for CPU machines instead of CUDA since they cosume less WUs per second. Am I right, Bernd?

telegd
telegd
Joined: 17 Apr 07
Posts: 91
Credit: 10212522
RAC: 0

RE: OTOH, may be it is

Quote:
OTOH, may be it is better to generate workunits only for CPU machines instead of CUDA since they cosume less WUs per second. Am I right, Bernd?

Except that would mean that the volunteer GPU resource would go completely unused. Everyone would take their CUDA cards somewhere else if there was no work here for long stretches of time. I doubt that is what you intend, though.

Stranger7777
Stranger7777
Joined: 17 Mar 05
Posts: 436
Credit: 429644839
RAC: 76303

RE: RE: OTOH, may be it

Quote:
Quote:
OTOH, may be it is better to generate workunits only for CPU machines instead of CUDA since they cosume less WUs per second. Am I right, Bernd?

Except that would mean that the volunteer GPU resource would go completely unused. Everyone would take their CUDA cards somewhere else if there was no work here for long stretches of time. I doubt that is what you intend, though.

Yes, this will lead to moving GPUs somewhere else, where the work queue is stable. But we here are not cobblestone freaks. We just do the science. So, the BRP will just move a little bit slower, but it will help us to eleminate the problem with WU cache and concentrate on more actual things. When new machines will be installed (I suppose it will take no more then half a year), than it becomes possible to make new WU generators that will be able to produce enough work to support CUDA machines even with new searches.

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4312
Credit: 250458547
RAC: 35072

* As a short term relief we

* As a short term relief we hastily set up a new machine that allows us to run another six WUG instances. This means that we are now sending out about 1500 BRP4 tasks per hour (compared to previously 1000). Unfortunately this still doesn't seem to be enough to feed even our GPUs.

* We plan to implement a new WUG that would scale much better, but this will take some more time. With all the things currently going on @AEI (around the BOINC workshop) I would expect this to be done not before the end of next week.

BM

BM

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 2143
Credit: 2956596422
RAC: 715297

In the meantime, there's

In the meantime, there's nothing to stop us leaving CUDA cards connected to Einstein to download work whenever it might be available, but also attaching to other CUDA projects to share the resource around.

I have, however, increased the 'Task Switch Interval' on my Windows 7 machines, so that BRP4 tasks run to completion in a single session - in an attempt to minimise the downclocking and other problems caused by the non-threadsafe exit in the current app. I had one GPU become unusable (until reboot) yesterday after a BRP4 task was switched out in favour of another project.

joe areeda
joe areeda
Joined: 13 Dec 10
Posts: 285
Credit: 320378898
RAC: 0

Too bad we can't distribute

Too bad we can't distribute the WU generation to the Grid but I suspect the size of the datasets needed is too big to make that practical. I am curious though what kind of equipment is used to generate the WUs and how much on-line data is needed?

I used to use my daily credit as a sign there were problems to be addressed but it seems that is also very dependent on outside factors such as how much GPU work is available, and how long it takes to validate a returned result. I now have 765 tasks waiting to be validated with the oldest one returned on July 7.

I do enjoy participating in this project. I am more interested in the E@H work than the number of credits I get. Now if we could redeem those credits for something like Boinc T-shirts or pocket protectors it might be different.

Joe

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4312
Credit: 250458547
RAC: 35072

The original instrument data

The original instrument data is about 2GB per beam. You probably don't want to download that, especially not for just a few hours of computing time. Furthermore the dedispersion takes a lot of memory that the average user doesn't have or least not wants to donate.

The mid-term plan is to use parts of the Atlas cluster for dedispersion and compression, but for this to work, parts of the current workunit generation needs to be made more scalable. Rest assured that we're working on that.

BM

BM

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.