DAMMIT! Also fails with the same error on Windows (cuLaunchGrid() fails with error 1, although the host code and the call itself is identical to that of app version 0.03, which works).
It seems with v. 0.11 we now have a working CUDA version on Windows. The first step is to get something working at all, then to improve validation, in the end we'll optimize for speed. The 0.03 version is all single precision, which is fast and good enough for the Arecibo data (BRP4), but it's not accurate enough for the MeerKAT (BRP7) data. 0.08/0.11 seems to offer the best compromise of speed and accuracy, let's see how validation goes.
It seems with v. 0.11 we now have a working CUDA version on Windows. The first step is to get something working at all, then to improve validation, in the end we'll optimize for speed. The 0.03 version is all single precision, which is fast and good enough for the Arecibo data (BRP4), but it's not accurate enough for the MeerKAT (BRP7) data. 0.08/0.11 seems to offer the best compromise of speed and accuracy, let's see how validation goes.
Bernd I had shut down my AMD machines, due to the high invalid rate. I don't care about the points, I just wasn't sure if it was helpful to keep piling up invalid tasks. I'm happy to turn them back on and crunch away.
The fp64 calcs explain some of the additional performance on my VIIs.
It seems with v. 0.11 we now have a working CUDA version on Windows. The first step is to get something working at all, then to improve validation, in the end we'll optimize for speed. The 0.03 version is all single precision, which is fast and good enough for the Arecibo data (BRP4), but it's not accurate enough for the MeerKAT (BRP7) data. 0.08/0.11 seems to offer the best compromise of speed and accuracy, let's see how validation goes.
Bernd I had shut down my AMD machines, due to the high invalid rate. I don't care about the points, I just wasn't sure if it was helpful to keep piling up invalid tasks. I'm happy to turn them back on and crunch away.
The fp64 calcs explain some of the additional performance on my VIIs.
were you running your tasks at 1x per GPU? Or some multiple?
It seems with v. 0.11 we now have a working CUDA version on Windows. The first step is to get something working at all, then to improve validation, in the end we'll optimize for speed. The 0.03 version is all single precision, which is fast and good enough for the Arecibo data (BRP4), but it's not accurate enough for the MeerKAT (BRP7) data. 0.08/0.11 seems to offer the best compromise of speed and accuracy, let's see how validation goes.
Throw as much double precision as you like into the AMD ones, I've got the non-deliberately-inhibited proper cards.
If this page takes an hour to load, reduce posts per page to 20 in your settings, then the tinpot 486 Einstein uses can handle it.
We will start attempting to run these again on the rtx a6000 gpus which should excel at these double precision work units. We had some success with the 0.05 but mixed results.
Edit: What is the main limiting factor for these work units? In the end, I will just have to try and see how many I can run concurrently but was curious.
were you running your tasks at 1x per GPU? Or some multiple?
I only ran 122 tasks, and they were a mix of 1x and 2x.
Average runtime of 1x was 254s, and 440s at 2x, on an overclocked Radeon VII. That is about 20-23% faster than the tasks that were run on the stock clocked 6900XT.
If I fire the beta tasks back up, I'll split the different GPUs into their own clients, so that the results page will be a bit less of a mess.
Edit: What is the main limiting factor for these work units? In the end, I will just have to try and see how many I can run concurrently but was curious.
With most projects I find the CPU limits the GPU from running at full power. Hence if you run two tasks on the GPU, the GPU now has access to two CPU cores, and can also be running one task while waiting for the CPU to feed the other. What I do is watch the GPU usage % in something like MSI Afterburner, and if it's not very high, I add more tasks until it is. Also watching you don't run out of GPU RAM, as that really slows things down when it has to revert to system RAM. If you're running CPU asks aswell, reduce the number of those so there's enough spare CPU time to help the GPU, again watching the CPU usage graph in MSI Afterburner. For some reason you don't need an MSI card to use MSI afterburner.
If this page takes an hour to load, reduce posts per page to 20 in your settings, then the tinpot 486 Einstein uses can handle it.
DAMMIT! Also fails with the
)
DAMMIT! Also fails with the same error on Windows (cuLaunchGrid() fails with error 1, although the host code and the call itself is identical to that of app version 0.03, which works).
BM
v0.08 cuda55 works on Linux
)
v0.08 cuda55 works on Linux (except the tasks that get "stuck"), but like v0.05 and 0.07, it's noticeably slower than v0.03
_________________________________________________________________________
It seems with v. 0.11 we now
)
It seems with v. 0.11 we now have a working CUDA version on Windows. The first step is to get something working at all, then to improve validation, in the end we'll optimize for speed. The 0.03 version is all single precision, which is fast and good enough for the Arecibo data (BRP4), but it's not accurate enough for the MeerKAT (BRP7) data. 0.08/0.11 seems to offer the best compromise of speed and accuracy, let's see how validation goes.
BM
What percentage (roughly) of
)
What percentage (roughly) of the calculation is double precision?
_________________________________________________________________________
Bernd Machenschalk wrote: It
)
Bernd I had shut down my AMD machines, due to the high invalid rate. I don't care about the points, I just wasn't sure if it was helpful to keep piling up invalid tasks. I'm happy to turn them back on and crunch away.
The fp64 calcs explain some of the additional performance on my VIIs.
tictoc wrote: Bernd
)
were you running your tasks at 1x per GPU? Or some multiple?
_________________________________________________________________________
Bernd Machenschalk wrote: It
)
Throw as much double precision as you like into the AMD ones, I've got the non-deliberately-inhibited proper cards.
If this page takes an hour to load, reduce posts per page to 20 in your settings, then the tinpot 486 Einstein uses can handle it.
We will start attempting to
)
We will start attempting to run these again on the rtx a6000 gpus which should excel at these double precision work units. We had some success with the 0.05 but mixed results.
Edit: What is the main limiting factor for these work units? In the end, I will just have to try and see how many I can run concurrently but was curious.
Quote:Ian&Steve C.
)
I only ran 122 tasks, and they were a mix of 1x and 2x.
Average runtime of 1x was 254s, and 440s at 2x, on an overclocked Radeon VII. That is about 20-23% faster than the tasks that were run on the stock clocked 6900XT.
If I fire the beta tasks back up, I'll split the different GPUs into their own clients, so that the results page will be a bit less of a mess.
Boca Raton Community HS
)
With most projects I find the CPU limits the GPU from running at full power. Hence if you run two tasks on the GPU, the GPU now has access to two CPU cores, and can also be running one task while waiting for the CPU to feed the other. What I do is watch the GPU usage % in something like MSI Afterburner, and if it's not very high, I add more tasks until it is. Also watching you don't run out of GPU RAM, as that really slows things down when it has to revert to system RAM. If you're running CPU asks aswell, reduce the number of those so there's enough spare CPU time to help the GPU, again watching the CPU usage graph in MSI Afterburner. For some reason you don't need an MSI card to use MSI afterburner.
If this page takes an hour to load, reduce posts per page to 20 in your settings, then the tinpot 486 Einstein uses can handle it.