Important news on BRP7 and FGRPB1 work on E@H

Ian&Steve C.
Ian&Steve C.
Joined: 19 Jan 20
Posts: 3953
Credit: 46828162642
RAC: 64304993

completed and validated. ran

completed and validated. ran for 4 hours.

https://einsteinathome.org/task/1530744930

_________________________________________________________________________

Allen
Allen
Joined: 23 Jan 06
Posts: 75
Credit: 657551213
RAC: 1217346

Just a quick

Just a quick question.

I've switched everything over to the BPR7, except the one machine running BPR4 (Intel GPU) and my RAC has dropped from 2.3 million to under 1 million.

Is this to be expected?

Thanks,

Allen

Ian&Steve C.
Ian&Steve C.
Joined: 19 Jan 20
Posts: 3953
Credit: 46828162642
RAC: 64304993

Allen wrote: Just a quick

Allen wrote:

Just a quick question.

I've switched everything over to the BPR7, except the one machine running BPR4 (Intel GPU) and my RAC has dropped from 2.3 million to under 1 million.

Is this to be expected?

Thanks,

Allen

yes, it's expected

_________________________________________________________________________

Boca Raton Community HS
Boca Raton Comm...
Joined: 4 Nov 15
Posts: 240
Credit: 10567715586
RAC: 24033833

Ian&Steve C.

Ian&Steve C. wrote:

completed and validated. ran for 4 hours.

https://einsteinathome.org/task/1530744930

Ha! That is impressive (both that it worked and how long it actually took!). 

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5872
Credit: 117586806503
RAC: 35240913

Bernd Machenschalk wrote: On

Bernd Machenschalk wrote:

On a larger scale (overall project), 0.15 (Linux) had an invalid rate of >10% when paired with 0.12 (Windows), while 0.17 has 3-4%. From my perspective this looks like an improvement, though possibly not for each and every host.

Since it's early days, are you also tracking the rate for inconclusives?  What ever that rate is now, when enough time has elapsed for the 'decider' to be crunched and returned, 50% of current inconclusives will be extra invalids.

As a case in point, here are my latest figures for the 0.17 test machine:-

  • Pending        =   14
  • Valid             =   64
  • Invalid          =     5
  • Error             =     0
  • Inconclusive  =   19

Because of the Windows/Linux imbalance, my guess is that ~80% of my inconclusives will be future invalids and ~20% will be valid - something like 15 to 4 tasks split for the 19 above.  So, my current situation is more likely to become something like 68 valid to 20 invalid.  This is actually worse than what I was seeing with the 0.15 app.

Of course, a Windows user wont really see a problem but as a Linux user, I do.

Cheers,
Gary.

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5872
Credit: 117586806503
RAC: 35240913

On thinking about the

On thinking about the 'inconclusives' problem, I wonder if there might be two possible options for a solution, if there isn't a 'fix the app or validator' way to get a validation match for properly computed results.

Option 1:-  Break up BRP7 into 2 separate searches - BRP7_Win and BRP7_Lin, with others (MacOS, etc.) added to whichever is the best likely match for them.

Option 2:-  Add functionality to the scheduler to 'know' the OS of the _0 task in an 'unfilled' quorum so that when it looks at allocating the _1 task, it attempts to match the OS to that of the _0.  If that could be done (even just most of the time) it should largely eliminate the Windows/Linux pairing that appears to cause so many inconclusives.

My personal preference would be for option 2.

I'm not a programmer so I have no real idea of how complicated this might be.  If there are people reading who have understanding of how the scheduler works, I'd like to hear comments on whether or not any of this is feasible.

Judging by how long FGRPB1G lasted, it would seem that BRP7 might be around for a long time as well. Solving the wastage of volunteered contributions (both hardware and electricity) should be not only worthwhile, but pretty much an imperative, in my humble opinion.

Cheers,
Gary.

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5872
Credit: 117586806503
RAC: 35240913

I decided to look at the

I decided to look at the quorum for one of my 5 current invalids.  There were 4 hosts, 2xWin 2xLinux.  The 2 Win tasks were validated.  Out of curiosity, I checked the task stats for the other Linux host.  Below are the numbers at the time I looked:-

  • Pending        =     395
  • Valid             =     518
  • Invalid          =      225
  • Error             =         5
  • Inconclusive  =      150
  • In progress   =        25
  • All tasks       =    1,318

The host has dual 1070Ti GPUs and is running an 'anonymous platform' app so maybe that has something to do with the 375 total for invalid plus inconclusive.  By looking at the stderr output, the app is listed as:-

BRP7_einsteinbinary_x86_64-pc-linux-gnu__cuda1200

if that means anything to anybody.  Is that the 0.16 beta test app?

 

Cheers,
Gary.

Keith Myers
Keith Myers
Joined: 11 Feb 11
Posts: 4964
Credit: 18724477307
RAC: 6563992

No that is one of Petri's

No that is one of Petri's optimized app versions. 

 

Ian&Steve C.
Ian&Steve C.
Joined: 19 Jan 20
Posts: 3953
Credit: 46828162642
RAC: 64304993

the beta app is cuda 102, and

the beta app is cuda 102, and actually validates really well on my test (3 invalids out of 200 tasks). slightly better than the v0.17 opencl app which has 5 invalids so far for the same number of tasks.

in my analysis, the invalids (as a linux/nvidia user) almost always come from a pair of windows hosts. either win_cuda55+win_cuda55 or win_cuda55+win_ati or win_ati+win_ati. I had one invalid from a win_cuda55+linux_nvopencl pair, but that was an outlier. it's definitely a windows vs linux thing. and due to the relative spread of windows vs linux hosts (many more windows) that puts the linux hosts at a disadvantage.

 

that user is also running the older version (mostly for speed testing purposes). I recompiled it with cuda1222 using GCC 7.3 per Bernd's comments earlier in this thread.

 

_________________________________________________________________________

Keith Myers
Keith Myers
Joined: 11 Feb 11
Posts: 4964
Credit: 18724477307
RAC: 6563992

And from what I have seen the

And from what I have seen the new app compiled with GCC 7.3 has really helped bringing down the inconclusives.

Massive improvement for the one host starting with over 38% inconclusive rate on the older 1830 version I had on the host because I never got to updating to the 1200 version. Now on your 1222 version,

I thought it was a good idea to match how Bernd compiled the latest beta.

 

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.