BRP4, BRP3, BRP3cuda32

Steveplanetary

Joined: 23 Jul 11

Posts: 41

Credit: 32319229

RAC: 0

12 Sep 2011 9:03:09 UTC

Topic 195961

(moderation:

)

1) It is my understanding (correct me if I am wrong) that nearly all binary pulsar searches are being performed on new Arecibo data using BRP4, which is a significant improvement over BRP3. Then why (this is a question of a rather new E@H participant) does BOINC Manager indicate the application is BRP3cuda32 (einsteinbinary_BRP4_1.00_windows_intelx86_BRP3cuda32 according to Task Manager)?

2) It is obvious that the 32 in BRP3cuda32 refers to 32 bits, but what I don't know is if only 32 bits at a time are actually being processed. My computer only gets binary pulsar search WUs for my video card, which has a bandwidth of 128 bits. While processing 32 bits at a time is far better than that which could be done by CPU, my questions are these: Is only 1/4 of my video card being used, and if so could that be increased to 1/1?

Thank you for your forebearance and TIA for taking the time to educate me.

Steve

"Remember, nothing that's good works by itself, just to please you. You have to make the damn thing work." Thomas A. Edison

Richard Haselgrove

Joined: 10 Dec 05

Posts: 2143

Credit: 2992929455

RAC: 708360

BRP4, BRP3, BRP3cuda32

12 Sep 2011 9:29:20 UTC

Message 106641

(moderation:

)

Quote:

1) It is my understanding (correct me if I am wrong) that nearly all binary pulsar searches are being performed on new Arecibo data using BRP4, which is a significant improvement over BRP3. Then why (this is a question of a rather new E@H participant) does BOINC Manager indicate the application is BRP3cuda32 (einsteinbinary_BRP4_1.00_windows_intelx86_BRP3cuda32 according to Task Manager)?

What BOINC is actually showing you with "BRP3cuda32" is something called a 'plan_class' - a set of specifications for how BOINC is going to manage the tasks (what files are needed, what resources are going to be used, etc. etc.). The BOINC developers introduced plan classes as hard-coded sections of the applications running on the server, rather than configuration data: that makes then harder, and riskier, to modify. Bernd has explained on these boards that it was quicker, easier and safer to re-cycle the existing BRP3cuda32 and hook it up to the new app, rather than modify and re-compile source code. The confusion between BRP3 and BRP4 is an unfortunate side-effect - I'm not surprised you were caught out.

Quote:

2) It is obvious that the 32 in BRP3cuda32 refers to 32 bits, but what I don't know is if only 32 bits at a time are actually being processed. My computer only gets binary pulsar search WUs for my video card, which has a bandwidth of 128 bits. While processing 32 bits at a time is far better than that which could be done by CPU, my questions are these: Is only 1/4 of my video card being used, and if so could that be increased to 1/1?

Thank you for your forebearance and TIA for taking the time to educate me.

Steve

Actually, no. The 32 in 'BRP3cuda32' actually refers to CUDA version 3.2, and hence determines which version of the NVidia runtime and Fourier Transform libraries to supply.

The app is, as it happens, also a 32-bit app. That means it can only access GPU memory 32 bits at at time, and can only support a maximum of 4 gigabytes of GPU memory, but I don't imagine you're anywhere near that limit yet....

It turns out that GPU processors are so fast that the biggest bottleneck in GPU processing is the memory access. Since the numerical values involved in Einstein processing are small enough to be represented in 32 bits, it's actually more efficient to stick to the smaller size of memory transfer. (64-bit GPU apps have been tried on other projects, and are actually slower than their 32-bit counterparts). And just because the data arrives from memory into the GPU registers 32 bits at a time, it doesn't follow that all subsequent processing is restricted to 32-bit: I think we'll find (though it will take input from someone familiar with the Einstein code to confirm this) that your full 128-bit processing power is used when needed.

Steveplanetary

Joined: 23 Jul 11

Posts: 41

Credit: 32319229

RAC: 0

Thanks for the in depth

12 Sep 2011 22:34:46 UTC

Message 106642 in response to message 106641

(moderation:

)

Thanks for the in depth explaination, Richard. It is appreciated.

"Remember, nothing that's good works by itself, just to please you. You have to make the damn thing work." Thomas A. Edison

Bikeman (Heinz-...

Moderator

Joined: 28 Aug 06

Posts: 3522

Credit: 783625595

RAC: 1246541

RE: The app is, as it

13 Sep 2011 22:28:14 UTC

Message 106643 in response to message 106641

(moderation:

)

Quote:

The app is, as it happens, also a 32-bit app. That means it can only access GPU memory 32 bits at at time, and can only support a maximum of 4 gigabytes of GPU memory, but I don't imagine you're anywhere near that limit yet....

Exactly

Quote:

It turns out that GPU processors are so fast that the biggest bottleneck in GPU processing is the memory access. Since the numerical values involved in Einstein processing are small enough to be represented in 32 bits, it's actually more efficient to stick to the smaller size of memory transfer. (64-bit GPU apps have been tried on other projects, and are actually slower than their 32-bit counterparts). And just because the data arrives from memory into the GPU registers 32 bits at a time, it doesn't follow that all subsequent processing is restricted to 32-bit: I think we'll find (though it will take input from someone familiar with the Einstein code to confirm this) that your full 128-bit processing power is used when needed.

32 bit apps just limit the address space, as mentioned above. The width of the data bus connecting the CPU or the GPU to RAM or Video RAM respectively is fixed in hardware. All modern GPUs (and CPUs as well) have data buses considerably wider than 32 bit and will thus load more than one 32 bit word at a time in each transaction. Even in apps compiled in 32 bit mode. Otherwise there would be no way to get the kind of performance we see with hundreds of cores executing in parallel.

This is sometimes confused with "single precision" (32 bit floating point data) and double precision floating point arithmetic. This is a completely separate issue. code compiled in 32 bit mode can just as well include double precision arithmetic.

Ver Greeneyes

Joined: 26 Mar 09

Posts: 140

Credit: 9562235

RAC: 0

Although just having really

14 Sep 2011 0:42:02 UTC

Message 106644

(moderation:

)

Although just having really fast RAM on your GPU compensates for having a smaller bus, which is why you don't see cards with 512 pin buses anymore (that amount of pins is hard, and thus expensive to make).

Robert Pick

Joined: 24 Nov 05

Posts: 9

Credit: 8229446

RAC: 0

I've noticed lately that your

28 Sep 2011 22:55:01 UTC

Message 106645

(moderation:

)

I've noticed lately that your server statis board has dropped the BRP3cuda32 WU's. About a week ago I saw that there were 19 BRP3's to run. I asked my sys. to allow new WU to download what ever I could get. Well I got all 19 of them. Now BRP3's aren't even listed. But today I asked for what ever your sys. would give me and it downloaded 22 BRP3cuda tasks! Are these still being generated or am I picking up reruns???

Amauri

Joined: 12 Jul 11

Posts: 7

Credit: 41926514

RAC: 14663

RE: I've noticed lately

29 Sep 2011 5:40:44 UTC

Message 106646 in response to message 106645

(moderation:

)

Quote:

I've noticed lately that your server statis board has dropped the BRP3cuda32 WU's. About a week ago I saw that there were 19 BRP3's to run. I asked my sys. to allow new WU to download what ever I could get. Well I got all 19 of them. Now BRP3's aren't even listed. But today I asked for what ever your sys. would give me and it downloaded 22 BRP3cuda tasks! Are these still being generated or am I picking up reruns???

No, BRP3cuda32 is the app that runs BRP4 (and BRP3) tasks...

BRP4, BRP3, BRP3cuda32

Forums › Cruncher's Corner

BRP4, BRP3, BRP3cuda32

Thanks for the in depth

RE: The app is, as it

Although just having really

I've noticed lately that your

RE: I've noticed lately

Comment viewing options

Forums › Cruncher's Corner