it's a custom on-off CUDA app, not even openCL. really just made for learning purposes and to see what a high end GPU could do.
the BRP source code is public. you could create an ATI/openCL app if one was inclined.
I don't have plans really to release this one either. Bernd has implied that the BRP4 server/database is already being hammered by the many devices running, i'm sure they don't need thousands more even faster devices hammering the BRP4G system, it would be many times worse than the BRP4 situation.
but I hope this gives the team some perspective on how they could size the upcoming BRP7 tasks so that these fast GPUs don't hammer that system too.
Bernd, will the BRP7 application be different than the BRP4/G applications? or will the existing BRP4 apps process BRP7 as well, only differing in the dataset?
but I hope this gives the team some perspective on how they could size the upcoming BRP7 tasks so that these fast GPUs don't hammer that system too.
Bernd, will the BRP7 application be different than the BRP4/G applications? or will the existing BRP4 apps process BRP7 as well, only differing in the dataset?
Ian, I wouldn't be surprised at all if Bernd & company will need to update there server again, or at least expand it to accommodate us with the newer CPU/GPU systems.
Bernd, will the BRP7 application be different than the BRP4/G applications? or will the existing BRP4 apps process BRP7 as well, only differing in the dataset?
In terms of calculation the app will be the same, but we'll use a different format for one of the input files (template bank), so we'll need to make some changes at least o the I/O code.
On a Linux Virtual Machine running SuSE Tumbleweed with a 5.17.4 kernel the first Arecibo fast task failed because of EXIT_TIME_LIMIT_EXCEEDED. Its run time was 65k s and CPU time 59k s. CPU is an AMD Ryzen 5 1400F. All preceding Arecibo tasks and GW tasks on CPU were successful.
Tullio
Thanks for reporting!
Indeed the first workunits were generated with a too low "flops estimation" value. This has been fixed, but only for workunits that are generated from now on.
It looks like “Binary Radio Pulsar Search (Arecibo, fast) v1.33 () windows_x86_64” tasks generate ~ 30% less credit per unit time compared to the “Gamma-ray pulsar search #5 v1.08 () windows_intelx86” tasks. I afraid some people may deselect Arecibo project because of this.
Well, a few are (Intel GPUs). But that naming struck me, too, but I couldn't think of a much better name, so I left it. I modified it to "Arcecibo, fast", which is not much more descriptive, but should be less confusing.
How about "large" and "small" or even more complete "(Arecibo, large WUs)" and "(Arecibo, small WUs)"? That's how Moo! Wrapper names their WUs depending on how many blocks are bundled into one WU, so pretty much same thing you do. They have tiny, small, normal and huge WUs, I think that's a lot better description than "fast" (fast computation with lower precision or for fast devices or...). Considering there are GPUs in the normal BRP4 and CPUs in BRP4G this "fast" might be very confusing for the avarage user.
Indeed the first workunits were generated with a too low "flops estimation" value. This has been fixed, but only for workunits that are generated from now on.
It looks like the initial "low flops estimates" tasks are still being sent out as resends when/if they fail.
If they continue being resent, I imagine they might keep failing until the hard error limit is reached.
I just fired up a spare i5-6500 machine with a very small work cache size, limited to just a single core, for testing purposes. It got one of the tasks (a _3 resend version) estimated to take 40 mins. It's now at 30% done after 2 hrs so this must be one of those bad tasks and might end up as "TIME LIMIT EXCEEDED" as well.
I can't check the status of the _0, _1 and _2 copies of this task since clicking on the workunit on the website just (very unhelpfully) tells me that, "Tasks are pending for this workunit." without anything to show if any at all have succeeded or what the errors might be.
EDIT: Since these tasks are 8x the size of BRP4, why don't you call them something like "BRP4_x8"? In any case, I agree that "large" would be much better than "fast".
Not until Bernd compiles and
)
Not until Bernd compiles and releases one. Or compile the source code and make one for yourself.
Keith Myers wrote: Not until
)
Oh well, I thought I'd ask since there's an unofficial Nvidia. Maybe I'd get lucky.
Unfortunately, compiling/programming/etc is really not my thing.
it's a custom on-off CUDA
)
it's a custom on-off CUDA app, not even openCL. really just made for learning purposes and to see what a high end GPU could do.
the BRP source code is public. you could create an ATI/openCL app if one was inclined.
I don't have plans really to release this one either. Bernd has implied that the BRP4 server/database is already being hammered by the many devices running, i'm sure they don't need thousands more even faster devices hammering the BRP4G system, it would be many times worse than the BRP4 situation.
but I hope this gives the team some perspective on how they could size the upcoming BRP7 tasks so that these fast GPUs don't hammer that system too.
Bernd, will the BRP7 application be different than the BRP4/G applications? or will the existing BRP4 apps process BRP7 as well, only differing in the dataset?
_________________________________________________________________________
Ian&Steve C. wrote: but I
)
Ian, I wouldn't be surprised at all if Bernd & company will need to update there server again, or at least expand it to accommodate us with the newer CPU/GPU systems.
I'm just sayin'... It's only my opinion.
Proud member of the Old Farts Association
Ian&Steve C. wrote:Bernd,
)
In terms of calculation the app will be the same, but we'll use a different format for one of the input files (template bank), so we'll need to make some changes at least o the I/O code.
BM
tullio wrote: On a Linux
)
Thanks for reporting!
Indeed the first workunits were generated with a too low "flops estimation" value. This has been fixed, but only for workunits that are generated from now on.
BM
It looks like “Binary Radio
)
It looks like “Binary Radio Pulsar Search (Arecibo, fast) v1.33 () windows_x86_64” tasks generate ~ 30% less credit per unit time compared to the “Gamma-ray pulsar search #5 v1.08 () windows_intelx86” tasks. I afraid some people may deselect Arecibo project because of this.
Thanks Bernd. The next task
)
Thanks Bernd. The next task on the Linux Virtual Machine was completed and validated.
Tullio
Well, not yet validated but the times seem good.
Bernd Machenschalk
)
How about "large" and "small" or even more complete "(Arecibo, large WUs)" and "(Arecibo, small WUs)"? That's how Moo! Wrapper names their WUs depending on how many blocks are bundled into one WU, so pretty much same thing you do. They have tiny, small, normal and huge WUs, I think that's a lot better description than "fast" (fast computation with lower precision or for fast devices or...). Considering there are GPUs in the normal BRP4 and CPUs in BRP4G this "fast" might be very confusing for the avarage user.
.
Bernd Machenschalk
)
It looks like the initial "low flops estimates" tasks are still being sent out as resends when/if they fail.
If they continue being resent, I imagine they might keep failing until the hard error limit is reached.
I just fired up a spare i5-6500 machine with a very small work cache size, limited to just a single core, for testing purposes. It got one of the tasks (a _3 resend version) estimated to take 40 mins. It's now at 30% done after 2 hrs so this must be one of those bad tasks and might end up as "TIME LIMIT EXCEEDED" as well.
I can't check the status of the _0, _1 and _2 copies of this task since clicking on the workunit on the website just (very unhelpfully) tells me that, "Tasks are pending for this workunit." without anything to show if any at all have succeeded or what the errors might be.
EDIT: Since these tasks are 8x the size of BRP4, why don't you call them something like "BRP4_x8"? In any case, I agree that "large" would be much better than "fast".
Cheers,
Gary.