1) It is my understanding (correct me if I am wrong) that nearly all binary pulsar searches are being performed on new Arecibo data using BRP4, which is a significant improvement over BRP3. Then why (this is a question of a rather new E@H participant) does BOINC Manager indicate the application is BRP3cuda32 (einsteinbinary_BRP4_1.00_windows_intelx86_BRP3cuda32 according to Task Manager)?
2) It is obvious that the 32 in BRP3cuda32 refers to 32 bits, but what I don't know is if only 32 bits at a time are actually being processed. My computer only gets binary pulsar search WUs for my video card, which has a bandwidth of 128 bits. While processing 32 bits at a time is far better than that which could be done by CPU, my questions are these: Is only 1/4 of my video card being used, and if so could that be increased to 1/1?
Thank you for your forebearance and TIA for taking the time to educate me.
Steve
"Remember, nothing that's good works by itself, just to please you. You have to make the damn thing work." Thomas A. Edison
Copyright © 2024 Einstein@Home. All rights reserved.
BRP4, BRP3, BRP3cuda32
)
What BOINC is actually showing you with "BRP3cuda32" is something called a 'plan_class' - a set of specifications for how BOINC is going to manage the tasks (what files are needed, what resources are going to be used, etc. etc.). The BOINC developers introduced plan classes as hard-coded sections of the applications running on the server, rather than configuration data: that makes then harder, and riskier, to modify. Bernd has explained on these boards that it was quicker, easier and safer to re-cycle the existing BRP3cuda32 and hook it up to the new app, rather than modify and re-compile source code. The confusion between BRP3 and BRP4 is an unfortunate side-effect - I'm not surprised you were caught out.
Actually, no. The 32 in 'BRP3cuda32' actually refers to CUDA version 3.2, and hence determines which version of the NVidia runtime and Fourier Transform libraries to supply.
The app is, as it happens, also a 32-bit app. That means it can only access GPU memory 32 bits at at time, and can only support a maximum of 4 gigabytes of GPU memory, but I don't imagine you're anywhere near that limit yet....
It turns out that GPU processors are so fast that the biggest bottleneck in GPU processing is the memory access. Since the numerical values involved in Einstein processing are small enough to be represented in 32 bits, it's actually more efficient to stick to the smaller size of memory transfer. (64-bit GPU apps have been tried on other projects, and are actually slower than their 32-bit counterparts). And just because the data arrives from memory into the GPU registers 32 bits at a time, it doesn't follow that all subsequent processing is restricted to 32-bit: I think we'll find (though it will take input from someone familiar with the Einstein code to confirm this) that your full 128-bit processing power is used when needed.
Thanks for the in depth
)
Thanks for the in depth explaination, Richard. It is appreciated.
"Remember, nothing that's good works by itself, just to please you. You have to make the damn thing work." Thomas A. Edison
RE: The app is, as it
)
Exactly
32 bit apps just limit the address space, as mentioned above. The width of the data bus connecting the CPU or the GPU to RAM or Video RAM respectively is fixed in hardware. All modern GPUs (and CPUs as well) have data buses considerably wider than 32 bit and will thus load more than one 32 bit word at a time in each transaction. Even in apps compiled in 32 bit mode. Otherwise there would be no way to get the kind of performance we see with hundreds of cores executing in parallel.
This is sometimes confused with "single precision" (32 bit floating point data) and double precision floating point arithmetic. This is a completely separate issue. code compiled in 32 bit mode can just as well include double precision arithmetic.
HB
Although just having really
)
Although just having really fast RAM on your GPU compensates for having a smaller bus, which is why you don't see cards with 512 pin buses anymore (that amount of pins is hard, and thus expensive to make).
I've noticed lately that your
)
I've noticed lately that your server statis board has dropped the BRP3cuda32 WU's. About a week ago I saw that there were 19 BRP3's to run. I asked my sys. to allow new WU to download what ever I could get. Well I got all 19 of them. Now BRP3's aren't even listed. But today I asked for what ever your sys. would give me and it downloaded 22 BRP3cuda tasks! Are these still being generated or am I picking up reruns???
RE: I've noticed lately
)
No, BRP3cuda32 is the app that runs BRP4 (and BRP3) tasks...