FGRP - High invalid rate on Nvidia 4090?

Boca Raton Community HS
Boca Raton Comm...
Joined: 4 Nov 15
Posts: 216
Credit: 8,591,848,661
RAC: 1,997,583

Bernd Machenschalk

Bernd Machenschalk wrote:

GWGeorge007 wrote:

I just checked the two 4090's running Einstein's Gamma-Ray Pulsar Binary search #1 on GPUs, a.k.a. FGRPB1G, and their invalid rate is 17% - 18%, though the both of them are running Windows.

That's a bit above the average - 4090s seem to have about 15% invalids on average (overall FGRPB1G invalid average is 2.5%).

It's pretty hard to track which card produced which result in ~200k results per day. I looked only into a few such results, and it doesn't look like a precision problem to me right now. Could be, though, that the driver (=compiler) or the kernel scheduler changes the execution order of operations too badly, the comparisons might yield a different result. Not idea how to prevent this, though.

 

It seems that my invalid rates on the two 4090 systems have dropped a little- maybe a recent Nvidia driver update had a small impact- I am really not sure (host and host). I have not really changed anything else. If you ever want us to try anything on our end, we are more than willing to experiment with these GPUs. This is definitely a 40xx series issue- take a look at this host running two 4070 gpus (Pokey- not trying to pick on your pc, just trying to figure out the invalid issue on these GPUs). 

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5,866
Credit: 112,139,801,598
RAC: 35,722,947

Boca Raton Community HS

Boca Raton Community HS wrote:

...

It seems that my invalid rates on the two 4090 systems have dropped a little ...

This is definitely a 40xx series issue- take a look at this host running two 4070 gpus ...

When you examine results on the website, there's an important bit of context that is missing - the ability to easily monitor inconclusive results.  It would be very nice to have a separate column to list these, as well as the errors and invalids.  At any point in time, if you take the number for "All" and subtract all the other categories (In Progress, Pending, Valid, etc), there may be a 'left-over' amount (hopefully very small) which can be labeled as 'Inconclusive'.  These are results described as "Checked but no consensus yet", if you search through the entire list to find them.

For the three hosts you mention, here are the numbers that existed at a certain point a little earlier today:-

HostID  AllIn_ProgressPendingValidInvalidErrorInconclusive
13125618 3946    162  1159 2284   177  16    148
13126606 3820    177  1136 2197   167   0    143
1298694211854    888  2180 7375  1066   0    345

As you can see, there are quite a few Inconclusives.  Some will ultimately become valid whilst others get rejected.  A lot depends on the third host that gets selected as the 'deadlock breaker'.

I've noticed increasing numbers of inconclusives in my hosts.  I ran some very limited checks and found a tendency for inconclusives to happen when the normal app was being matched against the Petri app being run under the 'anonymous platform' mechanism.  In that case, the outcome depends on the third task.  Since there is probably a greater chance it will use the regular app, the host that will probably suffer the most will likely be the one using the anonymous platform app.  Since these hosts are faster and since the number of them is likely to increase, this situation might reverse in the future if the FGRPB1G search keeps going for a while.

There might be a solution but it's probably quite unlikely to happen.  If the FGRPB1G search was split into two separate streams, one for the regular apps and one for anonymous platform apps, the ultimate rate for invalid results might improve quite a bit.  At least it might reverse the rising level of inconclusives that seems to be occurring :-).

Cheers,
Gary.

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4,289
Credit: 245,928,800
RAC: 10,921

I'm curious - does the 4090

I'm curious - does the 4090 exhibit the same behavior on BRP7 and GW (currently O3MDF)?

BM

DF1DX
DF1DX
Joined: 14 Aug 10
Posts: 104
Credit: 3,215,074,491
RAC: 4,695,533

Bernd Machenschalk

Bernd Machenschalk wrote:

I'm curious - does the 4090 exhibit the same behavior on BRP7 and GW (currently O3MDF)?

Thanks for the feedback. I will try BRP7 over the weekend.
BTW: There are no errors with GW-WUs on the 4090.

Boca Raton Community HS
Boca Raton Comm...
Joined: 4 Nov 15
Posts: 216
Credit: 8,591,848,661
RAC: 1,997,583

Gary Roberts wrote: When you

Gary Roberts wrote:

When you examine results on the website, there's an important bit of context that is missing - the ability to easily monitor inconclusive results.  It would be very nice to have a separate column to list these, as well as the errors and invalids.  At any point in time, if you take the number for "All" and subtract all the other categories (In Progress, Pending, Valid, etc), there may be a 'left-over' amount (hopefully very small) which can be labeled as 'Inconclusive'.  These are results described as "Checked but no consensus yet", if you search through the entire list to find them.

For the three hosts you mention, here are the numbers that existed at a certain point a little earlier today:-

HostID  AllIn_ProgressPendingValidInvalidErrorInconclusive
13125618 3946    162  1159 2284   177  16    148
13126606 3820    177  1136 2197   167   0    143
1298694211854    888  2180 7375  1066   0    345

As you can see, there are quite a few Inconclusives.  Some will ultimately become valid whilst others get rejected.  A lot depends on the third host that gets selected as the 'deadlock breaker'.

I've noticed increasing numbers of inconclusives in my hosts.  I ran some very limited checks and found a tendency for inconclusives to happen when the normal app was being matched against the Petri app being run under the 'anonymous platform' mechanism.  In that case, the outcome depends on the third task.  Since there is probably a greater chance it will use the regular app, the host that will probably suffer the most will likely be the one using the anonymous platform app.  Since these hosts are faster and since the number of them is likely to increase, this situation might reverse in the future if the FGRPB1G search keeps going for a while.

There might be a solution but it's probably quite unlikely to happen.  If the FGRPB1G search was split into two separate streams, one for the regular apps and one for anonymous platform apps, the ultimate rate for invalid results might improve quite a bit.  At least it might reverse the rising level of inconclusives that seems to be occurring :-).

 

Thank you for this post! Definitely helpful. Here is my question (and it is hard to put into words). Could two systems come up with the SAME wrong/inconclusive result? Is a result being inconclusive (because it doesn't match the other system and breaker) a product of the calculation done on the local machine incorrectly? Would this calculation be done incorrectly, in the same way, with the same result on a different system?

I hope that makes sense. 

Bernd Machenschalk wrote:

I'm curious - does the 4090 exhibit the same behavior on BRP7 and GW (currently O3MDF)?

I can try next week- these two systems are powered down over weekend (no AC in the building). 

 

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5,866
Credit: 112,139,801,598
RAC: 35,722,947

Boca Raton Community HS

Boca Raton Community HS wrote:
... Could two systems come up with the SAME wrong/inconclusive result?

My simple answer would be NO.  If two systems came up with exactly the SAME set of results, they would be declared 'valid' even if (technically) they were actually wrong :-).  An inconclusive result is neither right nor wrong  It's impossible to know until further results are analysed.

I'm just an ordinary volunteer like yourself.  I have no background in theoretical physics so my knowledge (such as it is) is just from what I've read or listened to.  With that disclaimer in mind, here is my take on the validation process. 

The aim of a task seems to be to return 'candidate signals' which I interpret to be related to gamma ray counts coming from from parts of the sky where there seems to be a potential peak in the count rate above the supposed background rate.  The last part of a task is to re-evaluate the top ten candidates in double precision.  This gives the immediate impression that there will always be some variability arising from different hardware/software environments so every attempt needs to be made to minimise the variations.

I don't know exactly what parameters are compared between the two results undergoing validation but it is expected that there will always be small discrepancies.  The validator uses certain 'tolerances' when doing the comparison.  If the differences are within these tolerances for all the parameters being assessed, then both tasks are declared valid.  If not, the status of both results becomes "Checked but no consensus yet" - in other words 'Inconclusive', rather than immediately invalid.

A third task is then sent out and when those results are sent back, all three sets are compared again.  The most likely outcome is that two will be found that do agree within the tolerances.  However it's entirely possible that all three might 'agree' - the third result fell in between the other two so all three are now close enough - or it could be that there is still no 'close enough' agreement and a 4th task is sent out.

The point of my post was to suggest that the standard app and the anonymous platform app might be returning results with just enough of a difference to be causing a rise in inconclusives.  If so, all parties are being disadvantaged.  The project needs to send out more 'resend' tasks than otherwise and each of the volunteers involved has the chance of an otherwise good result being rejected based on the chance event of what type of app is used to process the resend.

I don't know for sure if there really is a 'rising inconclusives' problem with the FGRPB1G search.  If people responding to Bernd's request for information about other GPU searches are not seeing rising numbers of inconclusives which ultimately leads to rising invalids, then it tends to suggest that there might be.

Cheers,
Gary.

Tom M
Tom M
Joined: 2 Feb 06
Posts: 5,962
Credit: 8,045,962,397
RAC: 5,721,325

It would be interesting to

It would be interesting to see if the "inconclusive" were to go down if a 4090 was using the standard/stock Linux app rather than the AIO/Optimized app.

Tom M

A Proud member of the O.F.A.  (Old Farts Association).  Be well, do good work, and keep in touch.® (Garrison Keillor)

Ian&Steve C.
Ian&Steve C.
Joined: 19 Jan 20
Posts: 3,799
Credit: 37,399,711,831
RAC: 50,537,603

Tom M wrote:It would be

Tom M wrote:

It would be interesting to see if the "inconclusive" were to go down if a 4090 was using the standard/stock Linux app rather than the AIO/Optimized app.

Tom M

read back in the thread. that's been tried and no it makes no difference. still high invalids with the stock app. Same with Windows, which does not have an optimized app.

I have a gut feeling that the problem lies in the Nvidia driver, not in the application(s).

_________________________________________________________________________

Tom M
Tom M
Joined: 2 Feb 06
Posts: 5,962
Credit: 8,045,962,397
RAC: 5,721,325

Ian&Steve C. wrote: Tom M

Ian&Steve C. wrote:

Tom M wrote:

It would be interesting to see if the "inconclusive" were to go down if a 4090 was using the standard/stock Linux app rather than the AIO/Optimized app.

Tom M

read back in the thread. that's been tried and no it makes no difference. still high invalids with the stock app. Same with Windows, which does not have an optimized app.

I have a gut feeling that the problem lies in the Nvidia driver, not in the application(s).

Thank you. I missed that post.

Tom M

A Proud member of the O.F.A.  (Old Farts Association).  Be well, do good work, and keep in touch.® (Garrison Keillor)

Boca Raton Community HS
Boca Raton Comm...
Joined: 4 Nov 15
Posts: 216
Credit: 8,591,848,661
RAC: 1,997,583

Gary Roberts wrote: Boca

Gary Roberts wrote:

Boca Raton Community HS wrote:
... Could two systems come up with the SAME wrong/inconclusive result?

My simple answer would be NO.  If two systems came up with exactly the SAME set of results, they would be declared 'valid' even if (technically) they were actually wrong :-).  An inconclusive result is neither right nor wrong  It's impossible to know until further results are analysed.

I'm just an ordinary volunteer like yourself.  I have no background in theoretical physics so my knowledge (such as it is) is just from what I've read or listened to.  With that disclaimer in mind, here is my take on the validation process. 

The aim of a task seems to be to return 'candidate signals' which I interpret to be related to gamma ray counts coming from from parts of the sky where there seems to be a potential peak in the count rate above the supposed background rate.  The last part of a task is to re-evaluate the top ten candidates in double precision.  This gives the immediate impression that there will always be some variability arising from different hardware/software environments so every attempt needs to be made to minimise the variations.

I don't know exactly what parameters are compared between the two results undergoing validation but it is expected that there will always be small discrepancies.  The validator uses certain 'tolerances' when doing the comparison.  If the differences are within these tolerances for all the parameters being assessed, then both tasks are declared valid.  If not, the status of both results becomes "Checked but no consensus yet" - in other words 'Inconclusive', rather than immediately invalid.

A third task is then sent out and when those results are sent back, all three sets are compared again.  The most likely outcome is that two will be found that do agree within the tolerances.  However it's entirely possible that all three might 'agree' - the third result fell in between the other two so all three are now close enough - or it could be that there is still no 'close enough' agreement and a 4th task is sent out.

The point of my post was to suggest that the standard app and the anonymous platform app might be returning results with just enough of a difference to be causing a rise in inconclusives.  If so, all parties are being disadvantaged.  The project needs to send out more 'resend' tasks than otherwise and each of the volunteers involved has the chance of an otherwise good result being rejected based on the chance event of what type of app is used to process the resend.

I don't know for sure if there really is a 'rising inconclusives' problem with the FGRPB1G search.  If people responding to Bernd's request for information about other GPU searches are not seeing rising numbers of inconclusives which ultimately leads to rising invalids, then it tends to suggest that there might be.

 

This is an amazingly thought out explanation and makes complete sense. Thank you for taking the time to write this post. I wonder if the 4090 is coming up with a [somewhat] different list of candidates or if the issue is in the double precision evaluation. 

I will be trying one of the 4090 GPUs on some of the other GPU tasks this upcoming week to see what happens. 

Ian&Steve C. wrote:

I have a gut feeling that the problem lies in the Nvidia driver, not in the application(s).

I just installed new drivers a day or two ago- it will be interesting to see if anything changes. Just out of curiosity- would the workstation/professional version of the nvidia driver work on the 4090 (Linux Mint)? The A6000 Ada is the same chip, but I have no idea if anything would change or if this is possible on Linux. 

 

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.