Indeed I disabled all app versions, there is no use for these now. The CUDA App needs fixing beyond simple configuration, and we got enough results from the OpenCL App versions. I'm currently focusing on validation. It looks like the OpenCL results from NVidia don't agree very well with that of AMD, this will also need a deeper look.
However, there's something more urgent with higher priority I have to work on, deadlined next Wednesday.
Thank you for your contributions, testing, reports and patience!
The CUDA App needs fixing beyond simple configuration
If you have time, reach out to Petri33. he's well versed in CUDA applications (he hand wrote the SETI special CUDA app, and successfully built a BRP4G CUDA app) and might be able to help getting the CUDA app at least working. He doesn't really write Windows apps, but he knows a lot of the best practices for CUDA optimization and compiling that your team could translate to Windows, and at the very least get a nice Linux CUDA app going, if you are interested in that.
but is there a reason you are going the CUDA route for Windows/Nvidia, but OpenCL for Linux? or are you just trying different configurations to test the waters before deciding which to stick with?
CUDA has advantages for speed on Nvidia devices, but will have the caveat of needing a new application to support new devices (unless they are built to include PTX versions of their kernels).
OpenCL always will induce some overhead on Nvidia in the translation from OpenCL to CUDA, but has the advantage of being more portable across devices, so less application updates would be necessary. the overhead can be at least reduced/minimized with optimization however.
If you have time, reach out to Petri33. he's well versed in CUDA applications (he hand wrote the SETI special CUDA app, and successfully built a BRP4G CUDA app) and might be able to help getting the CUDA app at least working.
If anyone knows how to contact him, he might be interested in the SiDock work also. They could use his skills.
If you have time, reach out to Petri33. he's well versed in CUDA applications (he hand wrote the SETI special CUDA app, and successfully built a BRP4G CUDA app) and might be able to help getting the CUDA app at least working.
If anyone knows how to contact him, he might be interested in the SiDock work also. They could use his skills.
I believe he is or was a Team Mate of Keith Myers but don't know how much influence Keith has on what Petri wants to work on, I do agree though that EVERY project could use his skills to at least look into making things more efficient but also understand that could involve ALOT of work outside of whatever else he does.
Last I've heard from Petri was back in April when he finished up the FGRPB1G "special sauce" "AIO" app he released to the Einstein general public.
Said he was going to start tackling the BRP source code. Know that he has been noodling with a custom BRP4G gpu application. You can see the results from his computers. Typical 200X speed improvement over the stock apps.
The validator for the BRP4G apps is very picky. It only likes results paired with like platforms, apps and OS's. I believe Bernd has mentioned that is an issue that they still haven't resolved well yet. And the current issues they are having with the beta BRP7 apps point to a long development time I think.
But don't know whether he has started on anything related to BRP7 yet. I would expect that unless the BRP7 tasks/apps are completely different from the BRP4G tasks/apps, that the current code he is developing would port over fairly fast.
He has only worked on a single project/app one at a time so far. Doubt he would split resources/time between two or more projects simultaneously. He has tended to take a project app as far as it can go before moving on to the next challenge. I believe BRP4G/BRP7 is his current focus.
The curious thing is that the same tasks run successfully with the very same (Linux) app even on our modern GPU machines - in standalone mode, i.e. without the BOINC client.
The CUDA code of the BRP7 app hasn't changed a bit since BRP4(G) times, where it also worked perfectly. The build process is also unchanged (well, libz and binutils might have been updated, but that shouldn't be relevant here). All that changed is the CPU code (it generates a template bank in memory rather than reading it from a file).
We did re-compile the (Linux) app with CUDA 10 and CUDA 11 (the versions that we use on Atlas) and didn't see a speedup of more than 10%. That isn't worth to lose compatibility. And the Windows cross-build of the CUDA app depends on a couple of patches that I'm not at all sure would work with more recent versions of headers, nvcc and cudart.
Of course, the BOINC version is a newer one, the one we used for the BRP4 App doesn't even build nowadays. That's more likely to be the reason.
Indeed I disabled all app
)
Indeed I disabled all app versions, there is no use for these now. The CUDA App needs fixing beyond simple configuration, and we got enough results from the OpenCL App versions. I'm currently focusing on validation. It looks like the OpenCL results from NVidia don't agree very well with that of AMD, this will also need a deeper look.
However, there's something more urgent with higher priority I have to work on, deadlined next Wednesday.
Thank you for your contributions, testing, reports and patience!
BM
Thanks for the update, Bernd.
)
Thanks for the update, Bernd. Looking forward to the next round of testing :)
_________________________________________________________________________
Bernd Machenschalk
)
If you have time, reach out to Petri33. he's well versed in CUDA applications (he hand wrote the SETI special CUDA app, and successfully built a BRP4G CUDA app) and might be able to help getting the CUDA app at least working. He doesn't really write Windows apps, but he knows a lot of the best practices for CUDA optimization and compiling that your team could translate to Windows, and at the very least get a nice Linux CUDA app going, if you are interested in that.
but is there a reason you are going the CUDA route for Windows/Nvidia, but OpenCL for Linux? or are you just trying different configurations to test the waters before deciding which to stick with?
CUDA has advantages for speed on Nvidia devices, but will have the caveat of needing a new application to support new devices (unless they are built to include PTX versions of their kernels).
OpenCL always will induce some overhead on Nvidia in the translation from OpenCL to CUDA, but has the advantage of being more portable across devices, so less application updates would be necessary. the overhead can be at least reduced/minimized with optimization however.
_________________________________________________________________________
Ian&Steve C. wrote:If you
)
If anyone knows how to contact him, he might be interested in the SiDock work also. They could use his skills.
https://www.sidock.si/sidock/forum_thread.php?id=207&postid=1679#1679
Jim1348 wrote: Ian&Steve C.
)
I believe he is or was a Team Mate of Keith Myers but don't know how much influence Keith has on what Petri wants to work on, I do agree though that EVERY project could use his skills to at least look into making things more efficient but also understand that could involve ALOT of work outside of whatever else he does.
Last I've heard from Petri
)
Last I've heard from Petri was back in April when he finished up the FGRPB1G "special sauce" "AIO" app he released to the Einstein general public.
Said he was going to start tackling the BRP source code. Know that he has been noodling with a custom BRP4G gpu application. You can see the results from his computers. Typical 200X speed improvement over the stock apps.
The validator for the BRP4G apps is very picky. It only likes results paired with like platforms, apps and OS's. I believe Bernd has mentioned that is an issue that they still haven't resolved well yet. And the current issues they are having with the beta BRP7 apps point to a long development time I think.
But don't know whether he has started on anything related to BRP7 yet. I would expect that unless the BRP7 tasks/apps are completely different from the BRP4G tasks/apps, that the current code he is developing would port over fairly fast.
He has only worked on a single project/app one at a time so far. Doubt he would split resources/time between two or more projects simultaneously. He has tended to take a project app as far as it can go before moving on to the next challenge. I believe BRP4G/BRP7 is his current focus.
The Einstein applications
)
The Einstein applications page:
https://einsteinathome.org/apps.php
currently shows under BRP7 a Linux BRP7-cuda55 application marked as having been created a bit over two hours ago.
For days now the BRP7 section has been empty each time I looked, so this may indicate Bernd has found some time to get back to this particular matter.
i got a few dozen of them.
)
i got a few dozen of them. all failed with the same errors from the windows systems. first RSA checksum errors, then a
like before.
cuda55 is a bust i think. try a recent cuda version, or stick to the openCL version.
_________________________________________________________________________
attempts with a
)
attempts with a cuda55-appropriate GTX 550Ti failed in a similar fashion.
_________________________________________________________________________
The curious thing is that the
)
The curious thing is that the same tasks run successfully with the very same (Linux) app even on our modern GPU machines - in standalone mode, i.e. without the BOINC client.
The CUDA code of the BRP7 app hasn't changed a bit since BRP4(G) times, where it also worked perfectly. The build process is also unchanged (well, libz and binutils might have been updated, but that shouldn't be relevant here). All that changed is the CPU code (it generates a template bank in memory rather than reading it from a file).
We did re-compile the (Linux) app with CUDA 10 and CUDA 11 (the versions that we use on Atlas) and didn't see a speedup of more than 10%. That isn't worth to lose compatibility. And the Windows cross-build of the CUDA app depends on a couple of patches that I'm not at all sure would work with more recent versions of headers, nvcc and cudart.
Of course, the BOINC version is a newer one, the one we used for the BRP4 App doesn't even build nowadays. That's more likely to be the reason.
BM