On a larger scale (overall project), 0.15 (Linux) had an invalid rate of >10% when paired with 0.12 (Windows), while 0.17 has 3-4%. From my perspective this looks like an improvement, though possibly not for each and every host.
Since it's early days, are you also tracking the rate for inconclusives? What ever that rate is now, when enough time has elapsed for the 'decider' to be crunched and returned, 50% of current inconclusives will be extra invalids.
As a case in point, here are my latest figures for the 0.17 test machine:-
Pending = 14
Valid = 64
Invalid = 5
Error = 0
Inconclusive = 19
Because of the Windows/Linux imbalance, my guess is that ~80% of my inconclusives will be future invalids and ~20% will be valid - something like 15 to 4 tasks split for the 19 above. So, my current situation is more likely to become something like 68 valid to 20 invalid. This is actually worse than what I was seeing with the 0.15 app.
Of course, a Windows user wont really see a problem but as a Linux user, I do.
On thinking about the 'inconclusives' problem, I wonder if there might be two possible options for a solution, if there isn't a 'fix the app or validator' way to get a validation match for properly computed results.
Option 1:- Break up BRP7 into 2 separate searches - BRP7_Win and BRP7_Lin, with others (MacOS, etc.) added to whichever is the best likely match for them.
Option 2:- Add functionality to the scheduler to 'know' the OS of the _0 task in an 'unfilled' quorum so that when it looks at allocating the _1 task, it attempts to match the OS to that of the _0. If that could be done (even just most of the time) it should largely eliminate the Windows/Linux pairing that appears to cause so many inconclusives.
My personal preference would be for option 2.
I'm not a programmer so I have no real idea of how complicated this might be. If there are people reading who have understanding of how the scheduler works, I'd like to hear comments on whether or not any of this is feasible.
Judging by how long FGRPB1G lasted, it would seem that BRP7 might be around for a long time as well. Solving the wastage of volunteered contributions (both hardware and electricity) should be not only worthwhile, but pretty much an imperative, in my humble opinion.
I decided to look at the quorum for one of my 5 current invalids. There were 4 hosts, 2xWin 2xLinux. The 2 Win tasks were validated. Out of curiosity, I checked the task stats for the other Linux host. Below are the numbers at the time I looked:-
Pending = 395
Valid = 518
Invalid = 225
Error = 5
Inconclusive = 150
In progress = 25
All tasks = 1,318
The host has dual 1070Ti GPUs and is running an 'anonymous platform' app so maybe that has something to do with the 375 total for invalid plus inconclusive. By looking at the stderr output, the app is listed as:-
BRP7_einsteinbinary_x86_64-pc-linux-gnu__cuda1200
if that means anything to anybody. Is that the 0.16 beta test app?
the beta app is cuda 102, and actually validates really well on my test (3 invalids out of 200 tasks). slightly better than the v0.17 opencl app which has 5 invalids so far for the same number of tasks.
in my analysis, the invalids (as a linux/nvidia user) almost always come from a pair of windows hosts. either win_cuda55+win_cuda55 or win_cuda55+win_ati or win_ati+win_ati. I had one invalid from a win_cuda55+linux_nvopencl pair, but that was an outlier. it's definitely a windows vs linux thing. and due to the relative spread of windows vs linux hosts (many more windows) that puts the linux hosts at a disadvantage.
that user is also running the older version (mostly for speed testing purposes). I recompiled it with cuda1222 using GCC 7.3 per Bernd's comments earlier in this thread.
And from what I have seen the new app compiled with GCC 7.3 has really helped bringing down the inconclusives.
Massive improvement for the one host starting with over 38% inconclusive rate on the older 1830 version I had on the host because I never got to updating to the 1200 version. Now on your 1222 version,
I thought it was a good idea to match how Bernd compiled the latest beta.
completed and validated. ran
)
completed and validated. ran for 4 hours.
https://einsteinathome.org/task/1530744930
_________________________________________________________________________
Just a quick
)
Just a quick question.
I've switched everything over to the BPR7, except the one machine running BPR4 (Intel GPU) and my RAC has dropped from 2.3 million to under 1 million.
Is this to be expected?
Thanks,
Allen
Allen wrote: Just a quick
)
yes, it's expected
_________________________________________________________________________
Ian&Steve C.
)
Ha! That is impressive (both that it worked and how long it actually took!).
Bernd Machenschalk wrote: On
)
Since it's early days, are you also tracking the rate for inconclusives? What ever that rate is now, when enough time has elapsed for the 'decider' to be crunched and returned, 50% of current inconclusives will be extra invalids.
As a case in point, here are my latest figures for the 0.17 test machine:-
Because of the Windows/Linux imbalance, my guess is that ~80% of my inconclusives will be future invalids and ~20% will be valid - something like 15 to 4 tasks split for the 19 above. So, my current situation is more likely to become something like 68 valid to 20 invalid. This is actually worse than what I was seeing with the 0.15 app.
Of course, a Windows user wont really see a problem but as a Linux user, I do.
Cheers,
Gary.
On thinking about the
)
On thinking about the 'inconclusives' problem, I wonder if there might be two possible options for a solution, if there isn't a 'fix the app or validator' way to get a validation match for properly computed results.
Option 1:- Break up BRP7 into 2 separate searches - BRP7_Win and BRP7_Lin, with others (MacOS, etc.) added to whichever is the best likely match for them.
Option 2:- Add functionality to the scheduler to 'know' the OS of the _0 task in an 'unfilled' quorum so that when it looks at allocating the _1 task, it attempts to match the OS to that of the _0. If that could be done (even just most of the time) it should largely eliminate the Windows/Linux pairing that appears to cause so many inconclusives.
My personal preference would be for option 2.
I'm not a programmer so I have no real idea of how complicated this might be. If there are people reading who have understanding of how the scheduler works, I'd like to hear comments on whether or not any of this is feasible.
Judging by how long FGRPB1G lasted, it would seem that BRP7 might be around for a long time as well. Solving the wastage of volunteered contributions (both hardware and electricity) should be not only worthwhile, but pretty much an imperative, in my humble opinion.
Cheers,
Gary.
I decided to look at the
)
I decided to look at the quorum for one of my 5 current invalids. There were 4 hosts, 2xWin 2xLinux. The 2 Win tasks were validated. Out of curiosity, I checked the task stats for the other Linux host. Below are the numbers at the time I looked:-
The host has dual 1070Ti GPUs and is running an 'anonymous platform' app so maybe that has something to do with the 375 total for invalid plus inconclusive. By looking at the stderr output, the app is listed as:-
if that means anything to anybody. Is that the 0.16 beta test app?
Cheers,
Gary.
No that is one of Petri's
)
No that is one of Petri's optimized app versions.
the beta app is cuda 102, and
)
the beta app is cuda 102, and actually validates really well on my test (3 invalids out of 200 tasks). slightly better than the v0.17 opencl app which has 5 invalids so far for the same number of tasks.
in my analysis, the invalids (as a linux/nvidia user) almost always come from a pair of windows hosts. either win_cuda55+win_cuda55 or win_cuda55+win_ati or win_ati+win_ati. I had one invalid from a win_cuda55+linux_nvopencl pair, but that was an outlier. it's definitely a windows vs linux thing. and due to the relative spread of windows vs linux hosts (many more windows) that puts the linux hosts at a disadvantage.
that user is also running the older version (mostly for speed testing purposes). I recompiled it with cuda1222 using GCC 7.3 per Bernd's comments earlier in this thread.
_________________________________________________________________________
And from what I have seen the
)
And from what I have seen the new app compiled with GCC 7.3 has really helped bringing down the inconclusives.
Massive improvement for the one host starting with over 38% inconclusive rate on the older 1830 version I had on the host because I never got to updating to the 1200 version. Now on your 1222 version,
I thought it was a good idea to match how Bernd compiled the latest beta.