BRP4, BRP3, BRP3cuda32

Steveplanetary
Steveplanetary
Joined: 23 Jul 11
Posts: 41
Credit: 32,319,229
RAC: 0
Topic 195961

1) It is my understanding (correct me if I am wrong) that nearly all binary pulsar searches are being performed on new Arecibo data using BRP4, which is a significant improvement over BRP3. Then why (this is a question of a rather new E@H participant) does BOINC Manager indicate the application is BRP3cuda32 (einsteinbinary_BRP4_1.00_windows_intelx86_BRP3cuda32 according to Task Manager)?

2) It is obvious that the 32 in BRP3cuda32 refers to 32 bits, but what I don't know is if only 32 bits at a time are actually being processed. My computer only gets binary pulsar search WUs for my video card, which has a bandwidth of 128 bits. While processing 32 bits at a time is far better than that which could be done by CPU, my questions are these: Is only 1/4 of my video card being used, and if so could that be increased to 1/1?

Thank you for your forebearance and TIA for taking the time to educate me.

Steve

"Remember, nothing that's good works by itself, just to please you. You have to make the damn thing work." Thomas A. Edison

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 2,142
Credit: 2,818,703,029
RAC: 884,152

BRP4, BRP3, BRP3cuda32

Quote:
1) It is my understanding (correct me if I am wrong) that nearly all binary pulsar searches are being performed on new Arecibo data using BRP4, which is a significant improvement over BRP3. Then why (this is a question of a rather new E@H participant) does BOINC Manager indicate the application is BRP3cuda32 (einsteinbinary_BRP4_1.00_windows_intelx86_BRP3cuda32 according to Task Manager)?


What BOINC is actually showing you with "BRP3cuda32" is something called a 'plan_class' - a set of specifications for how BOINC is going to manage the tasks (what files are needed, what resources are going to be used, etc. etc.). The BOINC developers introduced plan classes as hard-coded sections of the applications running on the server, rather than configuration data: that makes then harder, and riskier, to modify. Bernd has explained on these boards that it was quicker, easier and safer to re-cycle the existing BRP3cuda32 and hook it up to the new app, rather than modify and re-compile source code. The confusion between BRP3 and BRP4 is an unfortunate side-effect - I'm not surprised you were caught out.

Quote:

2) It is obvious that the 32 in BRP3cuda32 refers to 32 bits, but what I don't know is if only 32 bits at a time are actually being processed. My computer only gets binary pulsar search WUs for my video card, which has a bandwidth of 128 bits. While processing 32 bits at a time is far better than that which could be done by CPU, my questions are these: Is only 1/4 of my video card being used, and if so could that be increased to 1/1?

Thank you for your forebearance and TIA for taking the time to educate me.

Steve


Actually, no. The 32 in 'BRP3cuda32' actually refers to CUDA version 3.2, and hence determines which version of the NVidia runtime and Fourier Transform libraries to supply.

The app is, as it happens, also a 32-bit app. That means it can only access GPU memory 32 bits at at time, and can only support a maximum of 4 gigabytes of GPU memory, but I don't imagine you're anywhere near that limit yet....

It turns out that GPU processors are so fast that the biggest bottleneck in GPU processing is the memory access. Since the numerical values involved in Einstein processing are small enough to be represented in 32 bits, it's actually more efficient to stick to the smaller size of memory transfer. (64-bit GPU apps have been tried on other projects, and are actually slower than their 32-bit counterparts). And just because the data arrives from memory into the GPU registers 32 bits at a time, it doesn't follow that all subsequent processing is restricted to 32-bit: I think we'll find (though it will take input from someone familiar with the Einstein code to confirm this) that your full 128-bit processing power is used when needed.

Steveplanetary
Steveplanetary
Joined: 23 Jul 11
Posts: 41
Credit: 32,319,229
RAC: 0

Thanks for the in depth

Thanks for the in depth explaination, Richard. It is appreciated.

"Remember, nothing that's good works by itself, just to please you. You have to make the damn thing work." Thomas A. Edison

Bikeman (Heinz-Bernd Eggenstein)
Bikeman (Heinz-...
Moderator
Joined: 28 Aug 06
Posts: 3,522
Credit: 692,158,372
RAC: 2,427

RE: The app is, as it

Quote:

The app is, as it happens, also a 32-bit app. That means it can only access GPU memory 32 bits at at time, and can only support a maximum of 4 gigabytes of GPU memory, but I don't imagine you're anywhere near that limit yet....


Exactly

Quote:

It turns out that GPU processors are so fast that the biggest bottleneck in GPU processing is the memory access. Since the numerical values involved in Einstein processing are small enough to be represented in 32 bits, it's actually more efficient to stick to the smaller size of memory transfer. (64-bit GPU apps have been tried on other projects, and are actually slower than their 32-bit counterparts). And just because the data arrives from memory into the GPU registers 32 bits at a time, it doesn't follow that all subsequent processing is restricted to 32-bit: I think we'll find (though it will take input from someone familiar with the Einstein code to confirm this) that your full 128-bit processing power is used when needed.

32 bit apps just limit the address space, as mentioned above. The width of the data bus connecting the CPU or the GPU to RAM or Video RAM respectively is fixed in hardware. All modern GPUs (and CPUs as well) have data buses considerably wider than 32 bit and will thus load more than one 32 bit word at a time in each transaction. Even in apps compiled in 32 bit mode. Otherwise there would be no way to get the kind of performance we see with hundreds of cores executing in parallel.

This is sometimes confused with "single precision" (32 bit floating point data) and double precision floating point arithmetic. This is a completely separate issue. code compiled in 32 bit mode can just as well include double precision arithmetic.

HB

Ver Greeneyes
Ver Greeneyes
Joined: 26 Mar 09
Posts: 140
Credit: 9,562,235
RAC: 0

Although just having really

Although just having really fast RAM on your GPU compensates for having a smaller bus, which is why you don't see cards with 512 pin buses anymore (that amount of pins is hard, and thus expensive to make).

Robert Pick
Robert Pick
Joined: 24 Nov 05
Posts: 9
Credit: 8,229,446
RAC: 0

I've noticed lately that your

I've noticed lately that your server statis board has dropped the BRP3cuda32 WU's. About a week ago I saw that there were 19 BRP3's to run. I asked my sys. to allow new WU to download what ever I could get. Well I got all 19 of them. Now BRP3's aren't even listed. But today I asked for what ever your sys. would give me and it downloaded 22 BRP3cuda tasks! Are these still being generated or am I picking up reruns???

Amauri
Amauri
Joined: 12 Jul 11
Posts: 7
Credit: 38,582,845
RAC: 16,326

RE: I've noticed lately

Quote:
I've noticed lately that your server statis board has dropped the BRP3cuda32 WU's. About a week ago I saw that there were 19 BRP3's to run. I asked my sys. to allow new WU to download what ever I could get. Well I got all 19 of them. Now BRP3's aren't even listed. But today I asked for what ever your sys. would give me and it downloaded 22 BRP3cuda tasks! Are these still being generated or am I picking up reruns???

No, BRP3cuda32 is the app that runs BRP4 (and BRP3) tasks...

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.