Limiting BRP4 to CUDA machines

robertmiles
robertmiles
Joined: 8 Oct 09
Posts: 127
Credit: 29370881
RAC: 23831

RE: The original instrument

Quote:

The original instrument data is about 2GB per beam. You probably don't want to download that, especially not for just a few hours of computing time. Furthermore the dedispersion takes a lot of memory that the average user doesn't have or least not wants to donate.

The mid-term plan is to use parts of the Atlas cluster for dedispersion and compression, but for this to work, parts of the current workunit generation needs to be made more scalable. Rest assured that we're working on that.

BM

If you decide to make that into workunits, I'd want an option to download them only at certain times during the night here, and possible only have them run at certain hours of the night. I might consider providing the memory, if you make it a 64-bit application and get the BOINC developers to allow setting separate memory limits for total use by BOINC and 32-bit application use by BOINC - I keep seeing assorted problems if the 32-bit use gets too close to 4 GB, without similar problems when 64-bit use pushes the total past 7 GB (I have 8 GB installed, and have recently ordered another computer with 16 GB installed.)

Or is even that still too little memory?

Stranger7777
Stranger7777
Joined: 17 Mar 05
Posts: 436
Credit: 429711082
RAC: 79406

Actually the problem is not

Actually the problem is not in computer memory at all. The problem is in splitting data files into WUs. So, to be the WU generator you have to download a huge database files, not those "small" files you are receiving as WUs. And this way you have to download this database and after splitting it to upload back again. This will consume the traffic and may even lead to a bottleneck in AEI network. However, this is not necessary right now as there is a better way to solve the problem - just add more machines with WU generators on the server side, or split WUs somehow different way. But this is suitable only for the next run I think.

Stranger7777
Stranger7777
Joined: 17 Mar 05
Posts: 436
Credit: 429711082
RAC: 79406

Looks like it works. "Tasks

Looks like it works. "Tasks to send" queue for BRP4 is growing now as well as for FGRP1 queue. Nice job!

joe areeda
joe areeda
Joined: 13 Dec 10
Posts: 285
Credit: 320378898
RAC: 0

I was just curious. If

I was just curious.

If it comes to that, I can take an external disk over to Caltech (just down the road and I'm familiar with the Bridge Bldgs) and pick up a couple of TB and donate a good chunk of time on an I7 and/or Phenom IIx6 machine with 16GB RAM.

I was kidding about redeeming credits but I looked and found a place to get logos imprinted on Pocket Protectors for about $0.75 each in batches of 300. I'd be wiling to spring for half if people think it's worthwhile to give stuff like that away. I can't think of anything geekier.

Joe

telegd
telegd
Joined: 17 Apr 07
Posts: 91
Credit: 10212522
RAC: 0

RE: As a short term relief

Quote:
As a short term relief we hastily set up a new machine that allows us to run another six WUG instances.

Thanks for the update!

Jeroen
Jeroen
Joined: 25 Nov 05
Posts: 379
Credit: 740030628
RAC: 0

RE: * As a short term

Quote:

* As a short term relief we hastily set up a new machine that allows us to run another six WUG instances. This means that we are now sending out about 1500 BRP4 tasks per hour (compared to previously 1000). Unfortunately this still doesn't seem to be enough to feed even our GPUs.

* We plan to implement a new WUG that would scale much better, but this will take some more time. With all the things currently going on @AEI (around the BOINC workshop) I would expect this to be done not before the end of next week.

BM

Thanks for the updates.

Stranger7777
Stranger7777
Joined: 17 Mar 05
Posts: 436
Credit: 429711082
RAC: 79406

There's something strange

There's something strange happend to BRP4 queue. It is empty now. Something happend with WU generators?

joe areeda
joe areeda
Joined: 13 Dec 10
Posts: 285
Credit: 320378898
RAC: 0

RE: There's something

Quote:
There's something strange happend to BRP4 queue. It is empty now. Something happend with WU generators?


I've been watching it for days and it often goes to zero, sometimes get up to a 1000 or more.

I think the WU generators are having trouble keeping up with demand but they do seem to be feeding a steady stream of CUDA tasks.

I wait along side you for a definitive answer. This was just my observation from the same seat you have.

Joe

Stranger7777
Stranger7777
Joined: 17 Mar 05
Posts: 436
Credit: 429711082
RAC: 79406

RE: RE: There's something

Quote:
Quote:
There's something strange happend to BRP4 queue. It is empty now. Something happend with WU generators?

I've been watching it for days and it often goes to zero, sometimes get up to a 1000 or more.

I think the WU generators are having trouble keeping up with demand but they do seem to be feeding a steady stream of CUDA tasks.

I wait along side you for a definitive answer. This was just my observation from the same seat you have.

Sometimes I have to manually force my machines to request new work when they run out of BRP4 work and occasionaly suspend any communications for a long time just as if it is not enough work on the server side for me. This leads me to a conclusion that the power of current generators is not enough sometimes, but optimal and setting up more machines may be not useful because of limited network bandwidth or something similar. Is it possible to make each BRP4 task bigger than now, so data files can be used even more optimal (I guess they are already is)?

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 2143
Credit: 2956976384
RAC: 719958

RE: RE: There's something

Quote:
Quote:
There's something strange happend to BRP4 queue. It is empty now. Something happend with WU generators?

This leads me to a conclusion that the power of current generators is not enough sometimes, but optimal and setting up more machines may be not useful because of limited network bandwidth or something similar.


I think Bernd has already documented the difficulties he's been having with the current workunit generation code, and his plans (both hardware and software) to overcome them.

But just at the moment, I think we're suffering fallout from events outside our control. One of the other high-volume BOINC CUDA projects (SETI@Home) has been unable to generate new work for the best part of a week following a storage unit failure, and I suspect many users will have switched to Einstein as a backup project. SETI has now started generating work again, so I expect the Einstein demand will decrease over the course of the weekend as users switch back - that will allow our WUGs to have a better go at maintaining a supply.

Quote:
Is it possible to make each BRP4 task bigger than now, so data files can be used even more optimal (I guess they are already is)?


It's only the Gravity Wave search which re-uses data files. The data for BRP4 is downloaded afresh for each task - and I think 32 MB of download for under two hours computing is probably already as large as can reasonably be handled.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.