Limiting BRP4 to CUDA machines

robertmiles

Joined: 8 Oct 09

Posts: 127

Credit: 29287549

RAC: 22618

RE: The original instrument

17 Aug 2011 15:50:00 UTC

Message 106215 in response to message 106214

(moderation:

)

Quote:

The original instrument data is about 2GB per beam. You probably don't want to download that, especially not for just a few hours of computing time. Furthermore the dedispersion takes a lot of memory that the average user doesn't have or least not wants to donate.

The mid-term plan is to use parts of the Atlas cluster for dedispersion and compression, but for this to work, parts of the current workunit generation needs to be made more scalable. Rest assured that we're working on that.

BM

If you decide to make that into workunits, I'd want an option to download them only at certain times during the night here, and possible only have them run at certain hours of the night. I might consider providing the memory, if you make it a 64-bit application and get the BOINC developers to allow setting separate memory limits for total use by BOINC and 32-bit application use by BOINC - I keep seeing assorted problems if the 32-bit use gets too close to 4 GB, without similar problems when 64-bit use pushes the total past 7 GB (I have 8 GB installed, and have recently ordered another computer with 16 GB installed.)

Or is even that still too little memory?

Stranger7777

Joined: 17 Mar 05

Posts: 436

Credit: 429427621

RAC: 75743

Actually the problem is not

17 Aug 2011 18:40:24 UTC

Message 106216

(moderation:

)

Actually the problem is not in computer memory at all. The problem is in splitting data files into WUs. So, to be the WU generator you have to download a huge database files, not those "small" files you are receiving as WUs. And this way you have to download this database and after splitting it to upload back again. This will consume the traffic and may even lead to a bottleneck in AEI network. However, this is not necessary right now as there is a better way to solve the problem - just add more machines with WU generators on the server side, or split WUs somehow different way. But this is suitable only for the next run I think.

Stranger7777

Joined: 17 Mar 05

Posts: 436

Credit: 429427621

RAC: 75743

Looks like it works. "Tasks

17 Aug 2011 19:34:31 UTC

Message 106217

(moderation:

)

Looks like it works. "Tasks to send" queue for BRP4 is growing now as well as for FGRP1 queue. Nice job!

joe areeda

Joined: 13 Dec 10

Posts: 285

Credit: 320378898

RAC: 0

I was just curious. If

18 Aug 2011 1:45:52 UTC

Message 106218

(moderation:

)

I was just curious.

If it comes to that, I can take an external disk over to Caltech (just down the road and I'm familiar with the Bridge Bldgs) and pick up a couple of TB and donate a good chunk of time on an I7 and/or Phenom IIx6 machine with 16GB RAM.

I was kidding about redeeming credits but I looked and found a place to get logos imprinted on Pocket Protectors for about $0.75 each in batches of 300. I'd be wiling to spring for half if people think it's worthwhile to give stuff like that away. I can't think of anything geekier.

Joe

telegd

Joined: 17 Apr 07

Posts: 91

Credit: 10212522

RAC: 0

RE: As a short term relief

19 Aug 2011 6:20:01 UTC

Message 106219 in response to message 106211

(moderation:

)

Quote:

As a short term relief we hastily set up a new machine that allows us to run another six WUG instances.

Thanks for the update!

Jeroen

Joined: 25 Nov 05

Posts: 379

Credit: 740030628

RAC: 0

RE: * As a short term

19 Aug 2011 18:01:06 UTC

Message 106220 in response to message 106211

(moderation:

)

Quote:

* As a short term relief we hastily set up a new machine that allows us to run another six WUG instances. This means that we are now sending out about 1500 BRP4 tasks per hour (compared to previously 1000). Unfortunately this still doesn't seem to be enough to feed even our GPUs.

* We plan to implement a new WUG that would scale much better, but this will take some more time. With all the things currently going on @AEI (around the BOINC workshop) I would expect this to be done not before the end of next week.

BM

Thanks for the updates.

Stranger7777

Joined: 17 Mar 05

Posts: 436

Credit: 429427621

RAC: 75743

There's something strange

25 Aug 2011 20:59:01 UTC

Message 106221

(moderation:

)

There's something strange happend to BRP4 queue. It is empty now. Something happend with WU generators?

joe areeda

Joined: 13 Dec 10

Posts: 285

Credit: 320378898

RAC: 0

RE: There's something

26 Aug 2011 2:11:11 UTC

Message 106222 in response to message 106221

(moderation:

)

Quote:

There's something strange happend to BRP4 queue. It is empty now. Something happend with WU generators?

I've been watching it for days and it often goes to zero, sometimes get up to a 1000 or more.

I think the WU generators are having trouble keeping up with demand but they do seem to be feeding a steady stream of CUDA tasks.

I wait along side you for a definitive answer. This was just my observation from the same seat you have.

Joe

Stranger7777

Joined: 17 Mar 05

Posts: 436

Credit: 429427621

RAC: 75743

RE: RE: There's something

26 Aug 2011 7:03:46 UTC

Message 106223 in response to message 106222

(moderation:

)

Quote:

Quote:
There's something strange happend to BRP4 queue. It is empty now. Something happend with WU generators?

I've been watching it for days and it often goes to zero, sometimes get up to a 1000 or more.

I think the WU generators are having trouble keeping up with demand but they do seem to be feeding a steady stream of CUDA tasks.

I wait along side you for a definitive answer. This was just my observation from the same seat you have.

Sometimes I have to manually force my machines to request new work when they run out of BRP4 work and occasionaly suspend any communications for a long time just as if it is not enough work on the server side for me. This leads me to a conclusion that the power of current generators is not enough sometimes, but optimal and setting up more machines may be not useful because of limited network bandwidth or something similar. Is it possible to make each BRP4 task bigger than now, so data files can be used even more optimal (I guess they are already is)?

Richard Haselgrove

Joined: 10 Dec 05

Posts: 2143

Credit: 2954606621

RAC: 714505

RE: RE: There's something

26 Aug 2011 9:03:27 UTC

Message 106224 in response to message 106223

(moderation:

)

Quote:

Quote:
There's something strange happend to BRP4 queue. It is empty now. Something happend with WU generators?

This leads me to a conclusion that the power of current generators is not enough sometimes, but optimal and setting up more machines may be not useful because of limited network bandwidth or something similar.

I think Bernd has already documented the difficulties he's been having with the current workunit generation code, and his plans (both hardware and software) to overcome them.

But just at the moment, I think we're suffering fallout from events outside our control. One of the other high-volume BOINC CUDA projects (SETI@Home) has been unable to generate new work for the best part of a week following a storage unit failure, and I suspect many users will have switched to Einstein as a backup project. SETI has now started generating work again, so I expect the Einstein demand will decrease over the course of the weekend as users switch back - that will allow our WUGs to have a better go at maintaining a supply.

Quote:

Is it possible to make each BRP4 task bigger than now, so data files can be used even more optimal (I guess they are already is)?

It's only the Gravity Wave search which re-uses data files. The data for BRP4 is downloaded afresh for each task - and I think 32 MB of download for under two hours computing is probably already as large as can reasonably be handled.

Limiting BRP4 to CUDA machines

Forums › Technical News

Comment viewing options

Forums › Technical News