If the "invalid"s rate on BRP7 is also higher on the 4090, I would also be interested in whether there's a difference in validation between the Windows (CUDA) and the Linux (OpenCL) app.
Both FGRP and BRP Apps are mainly FFT bound, and the FFT happens in a pretty early step. Thus the result of the FFT has a much higher impact on the overall result as e.g. in the GW app.
However, the different Apps use different libraries for the FFT:
* FGRP uses "clFFT", originally developed by AMD for their cards, now OpenSource on GitHub
* BRP CUDA (BRP7 Windows) uses cuFFT
* BRP OpenCL uses an own development based on an Apple OpenCL code example, which seems to be derived from an early cuFFT version
Hm. The overall "invalid" rate of 4090s on BRP7 is <3%, which is even lower than the overall "invalid" average there (~3,4%). However, in the DB i currently only have 4090 results from hosts running Windows. Actually, from Linux I only have 5(!) valid results in total from BRP7.
If the "invalid"s rate on BRP7 is also higher on the 4090, I would also be interested in whether there's a difference in validation between the Windows (CUDA) and the Linux (OpenCL) app.
Both FGRP and BRP Apps are mainly FFT bound, and the FFT happens in a pretty early step. Thus the result of the FFT has a much higher impact on the overall result as e.g. in the GW app.
However, the different Apps use different libraries for the FFT:
* FGRP uses "clFFT", originally developed by AMD for their cards, now OpenSource on GitHub
* BRP CUDA (BRP7 Windows) uses cuFFT
* BRP OpenCL uses an own development based on an Apple OpenCL code example, which seems to be derived from an early cuFFT version
Looks like I have some learning to do about FFT! This is interesting that different versions are used.
Bernd Machenschalk wrote:
Hm. The overall "invalid" rate of 4090s on BRP7 is <3%, which is even lower than the overall "invalid" average there (~3,4%). However, in the DB i currently only have 4090 results from hosts running Windows. Actually, from Linux I only have 5(!) valid results in total from BRP7.
I am going to try and get one of the 4090 systems working on BRP7 this week.
Got it working on this host. It will crunch BRP7 full-time for the rest of the week to give us a good sample size (adding to DF1DX completed work units). It is finishing a BRP7 work unit in ~3:09.
Hm. The overall "invalid" rate of 4090s on BRP7 is <3%, which is even lower than the overall "invalid" average there (~3,4%). However, in the DB i currently only have 4090 results from hosts running Windows. Actually, from Linux I only have 5(!) valid results in total from BRP7.
The OS distinction/selection in my query was somewhat wrong. Actually on BRP7, Linux hosts have roughly 10% invalid results, while Windows hosts only have 0,5%. Judging from the above I'd guess that the problem lies in the OpenCL (compiler in the) driver, the CUDA version of BRP7 seems to work fine.
So if you are on Windows and want to avoid these invalid rates, my recommendation for now would be to restrict yourself (or your hosts) to run BRP7.
Here's the thing with the Linux CUDA version: we found that the gcc version used to build the CPU part of the application is crucial for validation (some data preparation is done beforehand on the CPU, and this needs to yield the exact same results). However I couldn't get the libgcc to link with the CUDA libraries, at least CUDA 5.5. I see if I can get this app to link with a newer CUDA version.
I published a BRP7 Linux app version (0.16) with CUDA 10.2. This was built on an Ubuntu 18.04 and my not run on other systems with older libc. It's Beta anyway. You may want to give it a try.
I published a BRP7 Linux app version (0.16) with CUDA 10.2. This was built on an Ubuntu 18.04 and my not run on other systems with older libc. It's Beta anyway. You may want to give it a try.
On it! The older version of the app gave us the following results (some pending):
Pending (85)
Valid (293)
Invalid (57)
Error (0)
I will enable beta apps and then run more of these on the 4090 for this week. Will it automatically receive the version 0.16 when it requests tasks?
FYI: In the meantime i
)
FYI:
In the meantime i have now calculated over 700 BRP7 WUs on the 4090.
About 11% of these are currently invalid.
From the remaining FGRPB WUs, about 12% are invalids. With the optimized AIO app from petri, there were up to 20 % invalids here.
So far, not a single error with GW-WUs.
If the "invalid"s rate on
)
If the "invalid"s rate on BRP7 is also higher on the 4090, I would also be interested in whether there's a difference in validation between the Windows (CUDA) and the Linux (OpenCL) app.
Both FGRP and BRP Apps are mainly FFT bound, and the FFT happens in a pretty early step. Thus the result of the FFT has a much higher impact on the overall result as e.g. in the GW app.
However, the different Apps use different libraries for the FFT:
* FGRP uses "clFFT", originally developed by AMD for their cards, now OpenSource on GitHub
* BRP CUDA (BRP7 Windows) uses cuFFT
* BRP OpenCL uses an own development based on an Apple OpenCL code example, which seems to be derived from an early cuFFT version
BM
Hm. The overall "invalid"
)
Hm. The overall "invalid" rate of 4090s on BRP7 is <3%, which is even lower than the overall "invalid" average there (~3,4%). However, in the DB i currently only have 4090 results from hosts running Windows. Actually, from Linux I only have 5(!) valid results in total from BRP7.
BM
Bernd Machenschalk wrote: If
)
Looks like I have some learning to do about FFT! This is interesting that different versions are used.
I am going to try and get one of the 4090 systems working on BRP7 this week.
Got it working on this host.
)
Got it working on this host. It will crunch BRP7 full-time for the rest of the week to give us a good sample size (adding to DF1DX completed work units). It is finishing a BRP7 work unit in ~3:09.
Bernd Machenschalk
)
The OS distinction/selection in my query was somewhat wrong. Actually on BRP7, Linux hosts have roughly 10% invalid results, while Windows hosts only have 0,5%. Judging from the above I'd guess that the problem lies in the OpenCL (compiler in the) driver, the CUDA version of BRP7 seems to work fine.
So if you are on Windows and want to avoid these invalid rates, my recommendation for now would be to restrict yourself (or your hosts) to run BRP7.
Here's the thing with the Linux CUDA version: we found that the gcc version used to build the CPU part of the application is crucial for validation (some data preparation is done beforehand on the CPU, and this needs to yield the exact same results). However I couldn't get the libgcc to link with the CUDA libraries, at least CUDA 5.5. I see if I can get this app to link with a newer CUDA version.
BM
Petri has a Linux CUDA 11 and
)
Petri has a Linux CUDA 11 and 12 version of BRP7 that validates well. At least on 30-series cards and earlier. Not sure about 40-series.
_________________________________________________________________________
I published a BRP7 Linux app
)
I published a BRP7 Linux app version (0.16) with CUDA 10.2. This was built on an Ubuntu 18.04 and my not run on other systems with older libc. It's Beta anyway. You may want to give it a try.
BM
Bernd Machenschalk wrote: I
)
On it! The older version of the app gave us the following results (some pending):
Pending (85)
Valid (293)
Invalid (57)
Error (0)
I will enable beta apps and then run more of these on the 4090 for this week. Will it automatically receive the version 0.16 when it requests tasks?
I only get the previous
)
I only get the previous version 0.15 with OpenCL at the moment. Yes, beta is enabled.