Six of them ran to completion with elapsed times between 24 and 46 seconds and CPU times between 11 and 14 seconds. They were reported and currently show status as "waiting for validation". Several have quorum partners which have also got successfully returned tasks, with a mix of linux and Windows partners. The seventh is caught in suspension to a daily suspend window I have set in BOINC to limit room heating and power consumption.
This is a machine with an AMD 5700 GPU running under heavy throttling, so if these ran properly, they are very low compute-effort tasks.
On the tasks page, the application is reported has having been:
Binary Radio Pulsar Search (MeerKAT) v0.01 (BRP7-opencl-ati)
windows_x86_64
A representative WU name is:
M22_1_cfbf00002_segment_1_dms_200_2_4800
(edit to add:
After I posted that first bit, I noticed that a second one of my machines has received and run 12 Meerkat tasks)
This one has a more capable GPU, AMD RX 6800, which is running under a very heavy clock rate reduction:
Those tasks all ran to completion reporting 30 to 32 elapsed seconds and 12 to 14 CPU seconds. All reported successfully and currently are reported as "completed, waiting for validation".
I have been running exclusively GRP GPU tasks, with target task queues near 3 days. The short deadlines on the MeerKAT tasks result in my GRP tasks suspending immediately when the MeerKAT tasks arrive, which then run in High Priority.)
These are undoubtedly shorter-than-production test units. My 3080Ti is going through them in about 10-15 seconds. My guess is they will ship out bundles of them like they do with BRP4G.
Next new cuda attempt failed rather more seriously with the changed <open_name> in app_version:
<message>
An I/O operation initiated by the registry failed unrecoverably. The registry could not read in, or write out, or flush, one of the files that contain the system's image of the registry.
(0x3f8) - exit code 1016 (0x3f8)</message>
<stderr_txt>
Activated exception handling...
[13:51:53][4964][INFO ] Starting data processing...
[13:51:53][4964][INFO ] CUDA global memory status (initial GPU state, including context):
------> Used in total: 345 MB (5802 MB free / 6147 MB total) -> Used by this application (assuming a single GPU task): 0 MB
[13:51:53][4964][INFO ] Using CUDA device #0 "NVIDIA GeForce GTX 1660 SUPER" (0 CUDA cores / 0.00 GFLOPS)
[13:51:53][4964][INFO ] Version of installed CUDA driver: 11040
[13:51:53][4964][INFO ] Version of CUDA driver API used: 3020
[13:51:53][4964][ERROR] Couldn't load main CUDA device module (error: 301)!
[13:51:53][4964][ERROR] Demodulation failed (error: 1016)!
13:51:53 (4964): called boinc_finish(1016)
I think the downloaded DLLs also have to be marked 'executable' in the file_info (same as the main program files). That could be it.
@ Ian&Steve - your cuda ran correctly? That can happen if you have another copy of the DLLs somewhere in the PATH - e.g. if you have the CUDA toolkit installed. Could you check with Process Explorer (you'll have to be quick!) where your machine is loading the DLLs from?
... cecht is commenting that it's odd that his 5600XT GPU is taking so long.
Right. That M22 task was still "running" this morning, although it was 100% completed, so I aborted it. Got another one waiting to run on that same machine. *fingers crossed*
Ideas are not fixed, nor should they be; we live in model-dependent reality.
I think the downloaded DLLs also have to be marked 'executable' in the file_info (same as the main program files). That could be it.
Thanks, Richard, for your detective work. The problem is something different, though. In the <file_ref>s (in the DB, app_info.xml, client_state.xml etc.) the name the file should have the slots directory is tagged as <open_name>, wheras in the version.xml for "AppVersionNew" specificatios it has to be <logical_name>. My bad, I'll fix this.
@ Ian&Steve - your cuda ran correctly? That can happen if you have another copy of the DLLs somewhere in the PATH - e.g. if you have the CUDA toolkit installed. Could you check with Process Explorer (you'll have to be quick!) where your machine is loading the DLLs from?
the Linux BRP7 app is still OpenCL. So it doesn’t have this CUDA issue. Only the Windows app is CUDA. I did/do have a custom CUDA app for BRP4G that petri built (I think it’s CUDA 11.6?), and it processes fine-ish. Petri built it from the stock BRP4 source code and threw in the modern CUDA, no other changes. Doesn’t do well for validation, but it at least computes.
I still think the admins/devs should recompile this app with a much more modern version of CUDA to support the more modern GPUs and modern CUDA features. CUDA 5.5 supports Tesla, Fermi, and Kepler cards, which are borderline ancient in today’s world. I’d argue that more people have GPUs more modern than this than don’t.
Hmm. 'Executable' wasn't enough - I'm still getting the registry operation error. Maybe Ian's right - CUDA 55 is too early for this machine? Startup specs for the card I'm using are:
03/08/2022 16:06:57 | | CUDA: NVIDIA GPU 0: NVIDIA GeForce GTX 1660 SUPER (driver version 472.12, CUDA version 11.4, compute capability 7.5, 6144MB, 6144MB available, 5153 GFLOPS peak)
Apart from that, I'm using the BOINC v7.20.2 release, Windows 7. Anything else I can add?
I have tasks waiting on four other machines, but they're all of similar specification to this one - I don't think they'll add much.
Linux is no longer a
)
Linux is no longer a requirement for getting these:
I'm all Windows AMD, but I do have Beta settings|Run test applications? set to Yes
A bit under two hours ago the project sent one of my machines 7 Meerkat tasks
https://einsteinathome.org/host/10659288
Six of them ran to completion with elapsed times between 24 and 46 seconds and CPU times between 11 and 14 seconds. They were reported and currently show status as "waiting for validation". Several have quorum partners which have also got successfully returned tasks, with a mix of linux and Windows partners. The seventh is caught in suspension to a daily suspend window I have set in BOINC to limit room heating and power consumption.
This is a machine with an AMD 5700 GPU running under heavy throttling, so if these ran properly, they are very low compute-effort tasks.
On the tasks page, the application is reported has having been:
Binary Radio Pulsar Search (MeerKAT) v0.01 (BRP7-opencl-ati)
windows_x86_64
A representative WU name is:
M22_1_cfbf00002_segment_1_dms_200_2_4800
(edit to add:
After I posted that first bit, I noticed that a second one of my machines has received and run 12 Meerkat tasks)
https://einsteinathome.org/host/12260865
This one has a more capable GPU, AMD RX 6800, which is running under a very heavy clock rate reduction:
Those tasks all ran to completion reporting 30 to 32 elapsed seconds and 12 to 14 CPU seconds. All reported successfully and currently are reported as "completed, waiting for validation".
I have been running exclusively GRP GPU tasks, with target task queues near 3 days. The short deadlines on the MeerKAT tasks result in my GRP tasks suspending immediately when the MeerKAT tasks arrive, which then run in High Priority.)
These are undoubtedly
)
These are undoubtedly shorter-than-production test units. My 3080Ti is going through them in about 10-15 seconds. My guess is they will ship out bundles of them like they do with BRP4G.
_________________________________________________________________________
Next new cuda attempt failed
)
Next new cuda attempt failed rather more seriously with the changed <open_name> in app_version:
<message>
An I/O operation initiated by the registry failed unrecoverably. The registry could not read in, or write out, or flush, one of the files that contain the system's image of the registry.
(0x3f8) - exit code 1016 (0x3f8)</message>
<stderr_txt>
Activated exception handling...
[13:51:53][4964][INFO ] Starting data processing...
[13:51:53][4964][INFO ] CUDA global memory status (initial GPU state, including context):
------> Used in total: 345 MB (5802 MB free / 6147 MB total) -> Used by this application (assuming a single GPU task): 0 MB
[13:51:53][4964][INFO ] Using CUDA device #0 "NVIDIA GeForce GTX 1660 SUPER" (0 CUDA cores / 0.00 GFLOPS)
[13:51:53][4964][INFO ] Version of installed CUDA driver: 11040
[13:51:53][4964][INFO ] Version of CUDA driver API used: 3020
[13:51:53][4964][ERROR] Couldn't load main CUDA device module (error: 301)!
[13:51:53][4964][ERROR] Demodulation failed (error: 1016)!
13:51:53 (4964): called boinc_finish(1016)
(task 1331624432)
I think the downloaded DLLs
)
I think the downloaded DLLs also have to be marked 'executable' in the file_info (same as the main program files). That could be it.
@ Ian&Steve - your cuda ran correctly? That can happen if you have another copy of the DLLs somewhere in the PATH - e.g. if you have the CUDA toolkit installed. Could you check with Process Explorer (you'll have to be quick!) where your machine is loading the DLLs from?
Ian&Steve C. wrote: ...
)
Right. That M22 task was still "running" this morning, although it was 100% completed, so I aborted it. Got another one waiting to run on that same machine. *fingers crossed*
Ideas are not fixed, nor should they be; we live in model-dependent reality.
Richard Haselgrove wrote:I
)
Thanks, Richard, for your detective work. The problem is something different, though. In the <file_ref>s (in the DB, app_info.xml, client_state.xml etc.) the name the file should have the slots directory is tagged as <open_name>, wheras in the version.xml for "AppVersionNew" specificatios it has to be <logical_name>. My bad, I'll fix this.
BM
Richard Haselgrove wrote: @
)
the Linux BRP7 app is still OpenCL. So it doesn’t have this CUDA issue. Only the Windows app is CUDA. I did/do have a custom CUDA app for BRP4G that petri built (I think it’s CUDA 11.6?), and it processes fine-ish. Petri built it from the stock BRP4 source code and threw in the modern CUDA, no other changes. Doesn’t do well for validation, but it at least computes.
I still think the admins/devs should recompile this app with a much more modern version of CUDA to support the more modern GPUs and modern CUDA features. CUDA 5.5 supports Tesla, Fermi, and Kepler cards, which are borderline ancient in today’s world. I’d argue that more people have GPUs more modern than this than don’t.
_________________________________________________________________________
Hmm. 'Executable' wasn't
)
Hmm. 'Executable' wasn't enough - I'm still getting the registry operation error. Maybe Ian's right - CUDA 55 is too early for this machine? Startup specs for the card I'm using are:
03/08/2022 16:06:57 | | CUDA: NVIDIA GPU 0: NVIDIA GeForce GTX 1660 SUPER (driver version 472.12, CUDA version 11.4, compute capability 7.5, 6144MB, 6144MB available, 5153 GFLOPS peak)
Apart from that, I'm using the BOINC v7.20.2 release, Windows 7. Anything else I can add?
I have tasks waiting on four other machines, but they're all of similar specification to this one - I don't think they'll add much.
Nope, didn't work on a
)
Nope, didn't work on a machine with this card, either.
03/08/2022 16:39:57 | | CUDA: NVIDIA GPU 0: GeForce GTX 1050 Ti (driver version 442.74, CUDA version 10.2, compute capability 6.1, 4096MB, 4096MB available, 2138 GFLOPS peak)
I have an old GTX 550Ti I can
)
I have an old GTX 550Ti I can put on my test bench with Windows 10 to try out. it's at least the right generation for CUDA 5.5
_________________________________________________________________________