I went through a similar conversation with Keith Uplinger, when World Community Grid was still under IBM management. They were introducing an iGPU/OpenCL application for their covid research, and hitting even greater problems than we have here - they're not relevant. But to investigate that problem, I ran some offline tests on their app in a terminal window, and saw that -cl-mad-enable parameter on the command line output - it isn't referenced in the stderr text returned through BOINC.
I'll try and do the same for this 'newer' Beta, and any of the older ones I can still find, and see if the same applies here, That would give us a more certain diagnostic for the presence or absence of the suspect directive.
The finger was pointed at -cl-mad-enable by an intensive period of offline testing of many versions of the equivalent SETI@home app. The SETI volunteers have an offline version of the project validator, which could assess the degree of variation from a reference result for the same task - I don't think we have that here?
Perhaps : as previously mentioned, those cases where mad(a, b, -a*b) = a*b + (-a*b) are seriously non-zero could be used to test the 'sanity' of FMA at runtime .... maybe return a diagnostic message via stderr .....
Cheers, Mike.
( edit ) To have this issue inside the FFT is so acute here at E@H. If one can't trust the transform .....
I have made this letter longer than usual because I lack the time to make it shorter ...
... and my other CPU is a Ryzen 5950X :-) Blaise Pascal
I ran the offline tests last night, but got no additional compiler output in the command window.
I used a cut-down init_data.xml, which I'd prepared for the WCG tests, and which correctly selected the intel GPU:
12:02:00 (4540): Can't set up shared mem: -1. Will run in standalone mode.
wcgrid_beta29_autodockgpu_7.28_windows_x86_64__opencl_intel_gpu_102 -jobs OPNG_0000025_00056.job -input OPNG_0000025_00056.zip -seed 160279976 -wcgruns 1700 -wcgdpf 34
INFO: Using gpu device from app init data 0
INFO:[12:02:17] Start AutoGrid...
autogrid4: Successful Completion.
INFO:[12:02:41] End AutoGrid...
INFO:[12:02:57] Start AutoDock for ZINC000309335454-ACR2.13_RX1--fr2266benz_001--CYS114.dpf(Job #0)...
OpenCL device: Intel(R) HD Graphics 4600
But at Einstein, with the same file, it reports differently:
Activated exception handling...
19:09:10 (7012): Can't set up shared mem: -1. Will run in standalone mode.
[19:09:11][7012][INFO ] Starting data processing...
Trying OpenCL platform provided by: NVIDIA Corporation
[19:09:11][7012][INFO ] Using OpenCL platform provided by: NVIDIA Corporation
[19:09:11][7012][INFO ] Using OpenCL device "GeForce GTX 1050 Ti" by: NVIDIA Corporation
However, in live running, it uses the expected platform:
I don't think any of that will affect the lack of compiler output, so that's as far as I can take this line of enquiry.
While I'm here, you may as well have a results show:
All (520)
In progress (77)
Pending (109)
Validation inconclusive (77)
Valid (233)
Invalid (24)
Error (0)
A Proud member of the O.F.A. (Old Farts Association). Be well, do good work, and keep in touch.® (Garrison Keillor) I want some more patience. RIGHT NOW!
I got 120 of them on my Intel i5 8600K built in graphics. I'm not on any beta test thing Richard is presumably on, but I have ticked all the boxes in settings. It usually does gamma but got 120 BRP (I think they're the normal BRP from Arecibo, not the BRP7 from Meerkat?)
I got 120 of them on my Intel i5 8600K built in graphics. I'm not on any beta test thing Richard is presumably on, but I have ticked all the boxes in settings. It usually does gamma but got 120 BRP (I think they're the normal BRP from Arecibo, not the BRP7 from Meerkat?)
Your last Meerkat tasks were on the 3rd of August like the rest of us except Richard, at least that's all we know of right now
I'd forgotten about those., the AMD ones. They haven't been validated, so I guess that program is earlier in beta and they're concentrating on the Intel chips just now.
If this page takes an hour to load, reduce posts per page to 20 in your settings, then the tinpot 486 Einstein uses can handle it.
Like with Richard, my invalids seem to have been overridden by 2 ARM chips. But my valids are also compared with arms and aarch64.
That (filtered) link shows the BRP4 tasks only. They show with the v1.70 version number, the same as I'm testing. Either you have the Beta box ticked, or Bernd has made the 'newer' app official.
That (filtered) link shows the BRP4 tasks only. They show with the v1.70 version number, the same as I'm testing. Either you have the Beta box ticked, or Bernd has made the 'newer' app official.
Apart from 3 Aug Meerkat AMD GPU ones, I've only had those. I assume you can click the menus to get the whole set.
I have beta ticked. I had a problem with my AMD Fury failing beta gamma tasks, but Ian&Steve suggested I report I don't have OpenCL 2.0, which has worked, so beta is on for all machines.
If this page takes an hour to load, reduce posts per page to 20 in your settings, then the tinpot 486 Einstein uses can handle it.
1) I may have found another possible bug in source code. demod_binary.c has two places it calculates fft_size and it may get it wrong.
Current code:
// if input is N real numbers, output has N/2 + 1 non-redundant entries
//fft_size = (unsigned int)(0.5*data_head.nsamples + 0.5) + 1;
11 does (5.5 + 0.5) + 1 = 7. Any odd number 2n + 1 does floor((2n+1)/2 + 0.5) + 1 = floor(n + 0.5 + 0.5) + 1 = n + 1 + 1 = n + 2. So any odd number results to one too much. Even numbers 2n do floor(2n/2 + 0.5) + 1 = n + 1. Those are correct. If data_head.nsamples is always even the error does not show up.
I ran a separate fft (fftw and cufft) tests with 11 samples. Output size was 6. Then with 12. Output size was 7.
Suggested change:
// if input is N real numbers, output has floor(N/2) + 1 non-redundant entries
fft_size = data_head.nsamples/2 + 1;
Integer division does 'floor' i.e. rounding down as it should be.
2) Could I get a link to the beta/new code?
3) I can compile with CUDA 11.7 but there are errors in the NVIDIA cufft library 11.7.1 and 11.7.2 implementations. They do not produce accurate results from the fft.
4) I have added a info logMessage to stderr.txt showing possible signal candidates (BRP4 Arecibo, large).
I went through a similar
)
I went through a similar conversation with Keith Uplinger, when World Community Grid was still under IBM management. They were introducing an iGPU/OpenCL application for their covid research, and hitting even greater problems than we have here - they're not relevant. But to investigate that problem, I ran some offline tests on their app in a terminal window, and saw that -cl-mad-enable parameter on the command line output - it isn't referenced in the stderr text returned through BOINC.
I'll try and do the same for this 'newer' Beta, and any of the older ones I can still find, and see if the same applies here, That would give us a more certain diagnostic for the presence or absence of the suspect directive.
The finger was pointed at -cl-mad-enable by an intensive period of offline testing of many versions of the equivalent SETI@home app. The SETI volunteers have an offline version of the project validator, which could assess the degree of variation from a reference result for the same task - I don't think we have that here?
Perhaps : as previously
)
Perhaps : as previously mentioned, those cases where mad(a, b, -a*b) = a*b + (-a*b) are seriously non-zero could be used to test the 'sanity' of FMA at runtime .... maybe return a diagnostic message via stderr .....
Cheers, Mike.
( edit ) To have this issue inside the FFT is so acute here at E@H. If one can't trust the transform .....
I have made this letter longer than usual because I lack the time to make it shorter ...
... and my other CPU is a Ryzen 5950X :-) Blaise Pascal
I ran the offline tests last
)
I ran the offline tests last night, but got no additional compiler output in the command window.
I used a cut-down init_data.xml, which I'd prepared for the WCG tests, and which correctly selected the intel GPU:
But at Einstein, with the same file, it reports differently:
However, in live running, it uses the expected platform:
I don't think any of that will affect the lack of compiler output, so that's as far as I can take this line of enquiry.
While I'm here, you may as well have a results show:
All (520)
In progress (77)
Pending (109)
Validation inconclusive (77)
Valid (233)
Invalid (24)
Error (0)
So BRP 7? Where are
)
So BRP 7?
Where are we?
Tom M
A Proud member of the O.F.A. (Old Farts Association). Be well, do good work, and keep in touch.® (Garrison Keillor) I want some more patience. RIGHT NOW!
I got 120 of them on my Intel
)
I got 120 of them on my Intel i5 8600K built in graphics. I'm not on any beta test thing Richard is presumably on, but I have ticked all the boxes in settings. It usually does gamma but got 120 BRP (I think they're the normal BRP from Arecibo, not the BRP7 from Meerkat?)
Pending 13
Valid 73
Invalid 14
Error 0
https://einsteinathome.org/host/12862563/tasks/0/19
Like with Richard, my invalids seem to have been overridden by 2 ARM chips. But my valids are also compared with arms and aarch64.
If this page takes an hour to load, reduce posts per page to 20 in your settings, then the tinpot 486 Einstein uses can handle it.
Peter Hucker of the Scottish
)
Your last Meerkat tasks were on the 3rd of August like the rest of us except Richard, at least that's all we know of right now
mikey wrote: Your last
)
I'd forgotten about those., the AMD ones. They haven't been validated, so I guess that program is earlier in beta and they're concentrating on the Intel chips just now.
If this page takes an hour to load, reduce posts per page to 20 in your settings, then the tinpot 486 Einstein uses can handle it.
Peter Hucker of the Scottish
)
That (filtered) link shows the BRP4 tasks only. They show with the v1.70 version number, the same as I'm testing. Either you have the Beta box ticked, or Bernd has made the 'newer' app official.
Richard Haselgrove
)
Apart from 3 Aug Meerkat AMD GPU ones, I've only had those. I assume you can click the menus to get the whole set.
I have beta ticked. I had a problem with my AMD Fury failing beta gamma tasks, but Ian&Steve suggested I report I don't have OpenCL 2.0, which has worked, so beta is on for all machines.
If this page takes an hour to load, reduce posts per page to 20 in your settings, then the tinpot 486 Einstein uses can handle it.
Hi bernd!1) I may have
)
Hi bernd!
1) I may have found another possible bug in source code. demod_binary.c has two places it calculates fft_size and it may get it wrong.
Current code:
11 does (5.5 + 0.5) + 1 = 7. Any odd number 2n + 1 does floor((2n+1)/2 + 0.5) + 1 = floor(n + 0.5 + 0.5) + 1 = n + 1 + 1 = n + 2. So any odd number results to one too much. Even numbers 2n do floor(2n/2 + 0.5) + 1 = n + 1. Those are correct. If data_head.nsamples is always even the error does not show up.
I ran a separate fft (fftw and cufft) tests with 11 samples. Output size was 6. Then with 12. Output size was 7.
Suggested change:
Integer division does 'floor' i.e. rounding down as it should be.
2) Could I get a link to the beta/new code?
3) I can compile with CUDA 11.7 but there are errors in the NVIDIA cufft library 11.7.1 and 11.7.2 implementations. They do not produce accurate results from the fft.
4) I have added a info logMessage to stderr.txt showing possible signal candidates (BRP4 Arecibo, large).
Petri