Thank you for the explanation, Bernd. As my Mac seems happy enough crunching (or perhaps only nibbling at) GRPB1G I didn't realise the situation was different for more recent apps.
Apple dropped support for OpenCL and has issued the last box with NVidia GPU what - ten years ago? Unless some enthusiastic volunteer shows up that would want to port the App to Metal and Apple Silicon GPU I don't see that happen at all. As far as E@H is concerned, GPU computing on Apple is dead. We simply don't have the manpower to dive into Apples own universe.
That sounds like a call to Richard Haselgrove and his contacts might be in order to 1st see if it's feasible and 2nd to see if the time and energy is worth it for the potential number of users that would use it. I have to see users excluded but time marches on and technology changes but...
I think Bernd was referencing the newer Macs with their own silicon (M1, M2, etc). not the older x86 based macs. really old x86 macs and macOS versions could support nvidia GPUs, and the more recent ones only supported AMD.
the x86 AMD openCL app for macos could likely be compiled for macOS with little porting necessary. but I wouldnt bother with an nvidia app.
and as Bernd says, there's no chance for any OpenCL Apple silicon app for the newer macs.
FAST data was and still is in the planning and discussion, however I haven't seen a bit of real data yet, not even simulated. I don't know whether we'll get it at all and when, nor what it will look like. We needed to develop and change our pre-processing and application code quite a bit for the MeerKAT data, I really don't know whether we could process FAST data with the same pipeline.
Got it, thanks!
Ian&Steve C. wrote:
the tensor cores are not the same as FP cores. tensor cores are specialized hardware for inferencing workloads like ML and AI. No BOINC project (yet) uses this hardware.
GA102 die like in your A6000 or higher end GeForce 30-series cards (or any GA10x really) don’t really have dedicated FP64 hardware. Pretty sure they just double up FP32 cores for that. But the higher end Nvidia cards based on the GA100 core like the A100 do have dedicated FP64 cores.
edit, correction:
the GA10x (Geforce 30x0, Ax000 "Quadro", etc) cards have only 2 FP64 cores per SM. but this is not depicted in most architecture diagrams so I missed it, had to dig into the white paper to find that. but with 128 FP32 cores/SM that explains why there's a 1:64 ratio in performance.
while the GA100 (A100) cards have 32 FP64 cores per SM and 64 FP32 cores per SM, for that nice 1:2 ratio in performance.
they basically swapped out the FP64 cores for the Ray Tracing cores on GA10x, which are not present on GA100.
I just saw this edit- this makes sense. Thanks for digging into it. So, and correct me if I am wrong, the GA100 would be fantastic with the MEERKAT data and not great at work like gamma-ray pulsar binary search #1 or any other gpu work on E@H? Does the project code have to be different to work/take advantage of these FP64 cores?
The A100 (GA100 core) is good at everything. it's the fastest card available now, but also costs like $10,000/ea. well maybe the H100 is faster, but not really available.
the new app isnt totally dependent on FP64 speed, but seems to have some component of FP32 and also seems to scale well with memory bandwidth too. the A100 excels in all three.
Well for this rig at least, the Binary Radio Pulsar Search (MeerKAT) v0.13 (BRP7-opencl-nvidia) x86_64-pc-linux-gnu is working a treat since 4th September : consistently validating against Windows, ATI and the v0.12 (BRP7-cuda55) for that matter. It has the usual 1 - 2 per day invalids ie. about 5%, so good job ! Maybe not too much to adjust on your return after all ? ;-)
Cheers, Mike.
I have made this letter longer than usual because I lack the time to make it shorter ...
... and my other CPU is a Ryzen 5950X :-) Blaise Pascal
Gamma-ray pulsar binary search #1 on GPUs v1.22 (FGRPopencl1K-ati) runs ~ 520 s on the same computer.
I also have an AMD 580 and it's doing 1140 seconds and cpu time 186 seconds and 3333 credits. Mine is also running Windows, I would suggest you stop all other cpu tasks and see if yours speeds up, it could be running out of cpu resources if it's using 0% cpu resources.
I have an ASUS Mining GPU with 4GB of RAM which I bought as a RX470 but reports itself as RX570. For the Meerkats the runtime is around 1450s with cpu time around 120s. I also had a few outlier tasks that completed in around 430s. Here are my stats for those curious.
I am beginning to like the 0.12 (BRP7-cuda55). The first one ran for 1,289 seconds on my GTX 1650 Super under Win10.
But I especially like the fact that it used less than 7% of a CPU core (Ryzen 5700X), and the current one less than 4% when I reserve two cores for it. And the card is only at 61C, which makes it feasible for the summer.
As for valids, it looks like they will take care of themselves, though only against Windows and Nvidia thus far. (Maybe Bernd limited them to that?)
Thank you for the
)
Thank you for the explanation, Bernd. As my Mac seems happy enough crunching (or perhaps only nibbling at) GRPB1G I didn't realise the situation was different for more recent apps.
Bernd Machenschalk
)
That sounds like a call to Richard Haselgrove and his contacts might be in order to 1st see if it's feasible and 2nd to see if the time and energy is worth it for the potential number of users that would use it. I have to see users excluded but time marches on and technology changes but...
I think Bernd was referencing
)
I think Bernd was referencing the newer Macs with their own silicon (M1, M2, etc). not the older x86 based macs. really old x86 macs and macOS versions could support nvidia GPUs, and the more recent ones only supported AMD.
the x86 AMD openCL app for macos could likely be compiled for macOS with little porting necessary. but I wouldnt bother with an nvidia app.
and as Bernd says, there's no chance for any OpenCL Apple silicon app for the newer macs.
_________________________________________________________________________
Bernd Machenschalk
)
Got it, thanks!
I just saw this edit- this makes sense. Thanks for digging into it. So, and correct me if I am wrong, the GA100 would be fantastic with the MEERKAT data and not great at work like gamma-ray pulsar binary search #1 or any other gpu work on E@H? Does the project code have to be different to work/take advantage of these FP64 cores?
The A100 (GA100 core) is good
)
The A100 (GA100 core) is good at everything. it's the fastest card available now, but also costs like $10,000/ea. well maybe the H100 is faster, but not really available.
the new app isnt totally dependent on FP64 speed, but seems to have some component of FP32 and also seems to scale well with memory bandwidth too. the A100 excels in all three.
_________________________________________________________________________
(MeerKAT) v0.12 runs
)
(MeerKAT) v0.12 runs extremely slow on my AMD Radeon RX 580, Run time (sec): 29,256 s and GPU utilization is close to 0%.
https://einsteinathome.org/task/1348804697
Gamma-ray pulsar binary search #1 on GPUs v1.22 (FGRPopencl1K-ati) runs ~ 520 s on the same computer.
Well for this rig at least,
)
Well for this rig at least, the Binary Radio Pulsar Search (MeerKAT) v0.13 (BRP7-opencl-nvidia) x86_64-pc-linux-gnu is working a treat since 4th September : consistently validating against Windows, ATI and the v0.12 (BRP7-cuda55) for that matter. It has the usual 1 - 2 per day invalids ie. about 5%, so good job ! Maybe not too much to adjust on your return after all ? ;-)
Cheers, Mike.
I have made this letter longer than usual because I lack the time to make it shorter ...
... and my other CPU is a Ryzen 5950X :-) Blaise Pascal
TRAPPIST-713
)
I also have an AMD 580 and it's doing 1140 seconds and cpu time 186 seconds and 3333 credits. Mine is also running Windows, I would suggest you stop all other cpu tasks and see if yours speeds up, it could be running out of cpu resources if it's using 0% cpu resources.
I have an ASUS Mining GPU
)
I have an ASUS Mining GPU with 4GB of RAM which I bought as a RX470 but reports itself as RX570. For the Meerkats the runtime is around 1450s with cpu time around 120s. I also had a few outlier tasks that completed in around 430s. Here are my stats for those curious.
my blog about raspberry pis and diy life
I am beginning to like this.
)
I am beginning to like the 0.12 (BRP7-cuda55). The first one ran for 1,289 seconds on my GTX 1650 Super under Win10.
But I especially like the fact that it used less than 7% of a CPU core (Ryzen 5700X), and the current one less than 4% when I reserve two cores for it. And the card is only at 61C, which makes it feasible for the summer.
As for valids, it looks like they will take care of themselves, though only against Windows and Nvidia thus far. (Maybe Bernd limited them to that?)
https://einsteinathome.org/host/12871756/tasks/4/0