I just checked the two 4090's running Einstein's Gamma-Ray Pulsar Binary search #1 on GPUs, a.k.a. FGRPB1G, and their invalid rate is 17% - 18%, though the both of them are running Windows.
That's a bit above the average - 4090s seem to have about 15% invalids on average (overall FGRPB1G invalid average is 2.5%).
It's pretty hard to track which card produced which result in ~200k results per day. I looked only into a few such results, and it doesn't look like a precision problem to me right now. Could be, though, that the driver (=compiler) or the kernel scheduler changes the execution order of operations too badly, the comparisons might yield a different result. Not idea how to prevent this, though.
It seems that my invalid rates on the two 4090 systems have dropped a little- maybe a recent Nvidia driver update had a small impact- I am really not sure (host and host). I have not really changed anything else. If you ever want us to try anything on our end, we are more than willing to experiment with these GPUs. This is definitely a 40xx series issue- take a look at this host running two 4070 gpus (Pokey- not trying to pick on your pc, just trying to figure out the invalid issue on these GPUs).
It seems that my invalid rates on the two 4090 systems have dropped a little ...
This is definitely a 40xx series issue- take a look at this host running two 4070 gpus ...
When you examine results on the website, there's an important bit of context that is missing - the ability to easily monitor inconclusive results. It would be very nice to have a separate column to list these, as well as the errors and invalids. At any point in time, if you take the number for "All" and subtract all the other categories (In Progress, Pending, Valid, etc), there may be a 'left-over' amount (hopefully very small) which can be labeled as 'Inconclusive'. These are results described as "Checked but no consensus yet", if you search through the entire list to find them.
For the three hosts you mention, here are the numbers that existed at a certain point a little earlier today:-
HostID
All
In_Progress
Pending
Valid
Invalid
Error
Inconclusive
13125618
3946
162
1159
2284
177
16
148
13126606
3820
177
1136
2197
167
0
143
12986942
11854
888
2180
7375
1066
0
345
As you can see, there are quite a few Inconclusives. Some will ultimately become valid whilst others get rejected. A lot depends on the third host that gets selected as the 'deadlock breaker'.
I've noticed increasing numbers of inconclusives in my hosts. I ran some very limited checks and found a tendency for inconclusives to happen when the normal app was being matched against the Petri app being run under the 'anonymous platform' mechanism. In that case, the outcome depends on the third task. Since there is probably a greater chance it will use the regular app, the host that will probably suffer the most will likely be the one using the anonymous platform app. Since these hosts are faster and since the number of them is likely to increase, this situation might reverse in the future if the FGRPB1G search keeps going for a while.
There might be a solution but it's probably quite unlikely to happen. If the FGRPB1G search was split into two separate streams, one for the regular apps and one for anonymous platform apps, the ultimate rate for invalid results might improve quite a bit. At least it might reverse the rising level of inconclusives that seems to be occurring :-).
When you examine results on the website, there's an important bit of context that is missing - the ability to easily monitor inconclusive results. It would be very nice to have a separate column to list these, as well as the errors and invalids. At any point in time, if you take the number for "All" and subtract all the other categories (In Progress, Pending, Valid, etc), there may be a 'left-over' amount (hopefully very small) which can be labeled as 'Inconclusive'. These are results described as "Checked but no consensus yet", if you search through the entire list to find them.
For the three hosts you mention, here are the numbers that existed at a certain point a little earlier today:-
HostID
All
In_Progress
Pending
Valid
Invalid
Error
Inconclusive
13125618
3946
162
1159
2284
177
16
148
13126606
3820
177
1136
2197
167
0
143
12986942
11854
888
2180
7375
1066
0
345
As you can see, there are quite a few Inconclusives. Some will ultimately become valid whilst others get rejected. A lot depends on the third host that gets selected as the 'deadlock breaker'.
I've noticed increasing numbers of inconclusives in my hosts. I ran some very limited checks and found a tendency for inconclusives to happen when the normal app was being matched against the Petri app being run under the 'anonymous platform' mechanism. In that case, the outcome depends on the third task. Since there is probably a greater chance it will use the regular app, the host that will probably suffer the most will likely be the one using the anonymous platform app. Since these hosts are faster and since the number of them is likely to increase, this situation might reverse in the future if the FGRPB1G search keeps going for a while.
There might be a solution but it's probably quite unlikely to happen. If the FGRPB1G search was split into two separate streams, one for the regular apps and one for anonymous platform apps, the ultimate rate for invalid results might improve quite a bit. At least it might reverse the rising level of inconclusives that seems to be occurring :-).
Thank you for this post! Definitely helpful. Here is my question (and it is hard to put into words). Could two systems come up with the SAME wrong/inconclusive result? Is a result being inconclusive (because it doesn't match the other system and breaker) a product of the calculation done on the local machine incorrectly? Would this calculation be done incorrectly, in the same way, with the same result on a different system?
I hope that makes sense.
Bernd Machenschalk wrote:
I'm curious - does the 4090 exhibit the same behavior on BRP7 and GW (currently O3MDF)?
I can try next week- these two systems are powered down over weekend (no AC in the building).
... Could two systems come up with the SAME wrong/inconclusive result?
My simple answer would be NO. If two systems came up with exactly the SAME set of results, they would be declared 'valid' even if (technically) they were actually wrong :-). An inconclusive result is neither right nor wrong It's impossible to know until further results are analysed.
I'm just an ordinary volunteer like yourself. I have no background in theoretical physics so my knowledge (such as it is) is just from what I've read or listened to. With that disclaimer in mind, here is my take on the validation process.
The aim of a task seems to be to return 'candidate signals' which I interpret to be related to gamma ray counts coming from from parts of the sky where there seems to be a potential peak in the count rate above the supposed background rate. The last part of a task is to re-evaluate the top ten candidates in double precision. This gives the immediate impression that there will always be some variability arising from different hardware/software environments so every attempt needs to be made to minimise the variations.
I don't know exactly what parameters are compared between the two results undergoing validation but it is expected that there will always be small discrepancies. The validator uses certain 'tolerances' when doing the comparison. If the differences are within these tolerances for all the parameters being assessed, then both tasks are declared valid. If not, the status of both results becomes "Checked but no consensus yet" - in other words 'Inconclusive', rather than immediately invalid.
A third task is then sent out and when those results are sent back, all three sets are compared again. The most likely outcome is that two will be found that do agree within the tolerances. However it's entirely possible that all three might 'agree' - the third result fell in between the other two so all three are now close enough - or it could be that there is still no 'close enough' agreement and a 4th task is sent out.
The point of my post was to suggest that the standard app and the anonymous platform app might be returning results with just enough of a difference to be causing a rise in inconclusives. If so, all parties are being disadvantaged. The project needs to send out more 'resend' tasks than otherwise and each of the volunteers involved has the chance of an otherwise good result being rejected based on the chance event of what type of app is used to process the resend.
I don't know for sure if there really is a 'rising inconclusives' problem with the FGRPB1G search. If people responding to Bernd's request for information about other GPU searches are not seeing rising numbers of inconclusives which ultimately leads to rising invalids, then it tends to suggest that there might be.
It would be interesting to see if the "inconclusive" were to go down if a 4090 was using the standard/stock Linux app rather than the AIO/Optimized app.
Tom M
A Proud member of the O.F.A. (Old Farts Association). Be well, do good work, and keep in touch.® (Garrison Keillor) I want some more patience. RIGHT NOW!
It would be interesting to see if the "inconclusive" were to go down if a 4090 was using the standard/stock Linux app rather than the AIO/Optimized app.
Tom M
read back in the thread. that's been tried and no it makes no difference. still high invalids with the stock app. Same with Windows, which does not have an optimized app.
I have a gut feeling that the problem lies in the Nvidia driver, not in the application(s).
It would be interesting to see if the "inconclusive" were to go down if a 4090 was using the standard/stock Linux app rather than the AIO/Optimized app.
Tom M
read back in the thread. that's been tried and no it makes no difference. still high invalids with the stock app. Same with Windows, which does not have an optimized app.
I have a gut feeling that the problem lies in the Nvidia driver, not in the application(s).
Thank you. I missed that post.
Tom M
A Proud member of the O.F.A. (Old Farts Association). Be well, do good work, and keep in touch.® (Garrison Keillor) I want some more patience. RIGHT NOW!
... Could two systems come up with the SAME wrong/inconclusive result?
My simple answer would be NO. If two systems came up with exactly the SAME set of results, they would be declared 'valid' even if (technically) they were actually wrong :-). An inconclusive result is neither right nor wrong It's impossible to know until further results are analysed.
I'm just an ordinary volunteer like yourself. I have no background in theoretical physics so my knowledge (such as it is) is just from what I've read or listened to. With that disclaimer in mind, here is my take on the validation process.
The aim of a task seems to be to return 'candidate signals' which I interpret to be related to gamma ray counts coming from from parts of the sky where there seems to be a potential peak in the count rate above the supposed background rate. The last part of a task is to re-evaluate the top ten candidates in double precision. This gives the immediate impression that there will always be some variability arising from different hardware/software environments so every attempt needs to be made to minimise the variations.
I don't know exactly what parameters are compared between the two results undergoing validation but it is expected that there will always be small discrepancies. The validator uses certain 'tolerances' when doing the comparison. If the differences are within these tolerances for all the parameters being assessed, then both tasks are declared valid. If not, the status of both results becomes "Checked but no consensus yet" - in other words 'Inconclusive', rather than immediately invalid.
A third task is then sent out and when those results are sent back, all three sets are compared again. The most likely outcome is that two will be found that do agree within the tolerances. However it's entirely possible that all three might 'agree' - the third result fell in between the other two so all three are now close enough - or it could be that there is still no 'close enough' agreement and a 4th task is sent out.
The point of my post was to suggest that the standard app and the anonymous platform app might be returning results with just enough of a difference to be causing a rise in inconclusives. If so, all parties are being disadvantaged. The project needs to send out more 'resend' tasks than otherwise and each of the volunteers involved has the chance of an otherwise good result being rejected based on the chance event of what type of app is used to process the resend.
I don't know for sure if there really is a 'rising inconclusives' problem with the FGRPB1G search. If people responding to Bernd's request for information about other GPU searches are not seeing rising numbers of inconclusives which ultimately leads to rising invalids, then it tends to suggest that there might be.
This is an amazingly thought out explanation and makes complete sense. Thank you for taking the time to write this post. I wonder if the 4090 is coming up with a [somewhat] different list of candidates or if the issue is in the double precision evaluation.
I will be trying one of the 4090 GPUs on some of the other GPU tasks this upcoming week to see what happens.
Ian&Steve C. wrote:
I have a gut feeling that the problem lies in the Nvidia driver, not in the application(s).
I just installed new drivers a day or two ago- it will be interesting to see if anything changes. Just out of curiosity- would the workstation/professional version of the nvidia driver work on the 4090 (Linux Mint)? The A6000 Ada is the same chip, but I have no idea if anything would change or if this is possible on Linux.
Bernd Machenschalk
)
It seems that my invalid rates on the two 4090 systems have dropped a little- maybe a recent Nvidia driver update had a small impact- I am really not sure (host and host). I have not really changed anything else. If you ever want us to try anything on our end, we are more than willing to experiment with these GPUs. This is definitely a 40xx series issue- take a look at this host running two 4070 gpus (Pokey- not trying to pick on your pc, just trying to figure out the invalid issue on these GPUs).
Boca Raton Community HS
)
When you examine results on the website, there's an important bit of context that is missing - the ability to easily monitor inconclusive results. It would be very nice to have a separate column to list these, as well as the errors and invalids. At any point in time, if you take the number for "All" and subtract all the other categories (In Progress, Pending, Valid, etc), there may be a 'left-over' amount (hopefully very small) which can be labeled as 'Inconclusive'. These are results described as "Checked but no consensus yet", if you search through the entire list to find them.
For the three hosts you mention, here are the numbers that existed at a certain point a little earlier today:-
As you can see, there are quite a few Inconclusives. Some will ultimately become valid whilst others get rejected. A lot depends on the third host that gets selected as the 'deadlock breaker'.
I've noticed increasing numbers of inconclusives in my hosts. I ran some very limited checks and found a tendency for inconclusives to happen when the normal app was being matched against the Petri app being run under the 'anonymous platform' mechanism. In that case, the outcome depends on the third task. Since there is probably a greater chance it will use the regular app, the host that will probably suffer the most will likely be the one using the anonymous platform app. Since these hosts are faster and since the number of them is likely to increase, this situation might reverse in the future if the FGRPB1G search keeps going for a while.
There might be a solution but it's probably quite unlikely to happen. If the FGRPB1G search was split into two separate streams, one for the regular apps and one for anonymous platform apps, the ultimate rate for invalid results might improve quite a bit. At least it might reverse the rising level of inconclusives that seems to be occurring :-).
Cheers,
Gary.
I'm curious - does the 4090
)
I'm curious - does the 4090 exhibit the same behavior on BRP7 and GW (currently O3MDF)?
BM
Bernd Machenschalk
)
Thanks for the feedback. I will try BRP7 over the weekend.
BTW: There are no errors with GW-WUs on the 4090.
Gary Roberts wrote: When you
)
Thank you for this post! Definitely helpful. Here is my question (and it is hard to put into words). Could two systems come up with the SAME wrong/inconclusive result? Is a result being inconclusive (because it doesn't match the other system and breaker) a product of the calculation done on the local machine incorrectly? Would this calculation be done incorrectly, in the same way, with the same result on a different system?
I hope that makes sense.
I can try next week- these two systems are powered down over weekend (no AC in the building).
Boca Raton Community HS
)
My simple answer would be NO. If two systems came up with exactly the SAME set of results, they would be declared 'valid' even if (technically) they were actually wrong :-). An inconclusive result is neither right nor wrong It's impossible to know until further results are analysed.
I'm just an ordinary volunteer like yourself. I have no background in theoretical physics so my knowledge (such as it is) is just from what I've read or listened to. With that disclaimer in mind, here is my take on the validation process.
The aim of a task seems to be to return 'candidate signals' which I interpret to be related to gamma ray counts coming from from parts of the sky where there seems to be a potential peak in the count rate above the supposed background rate. The last part of a task is to re-evaluate the top ten candidates in double precision. This gives the immediate impression that there will always be some variability arising from different hardware/software environments so every attempt needs to be made to minimise the variations.
I don't know exactly what parameters are compared between the two results undergoing validation but it is expected that there will always be small discrepancies. The validator uses certain 'tolerances' when doing the comparison. If the differences are within these tolerances for all the parameters being assessed, then both tasks are declared valid. If not, the status of both results becomes "Checked but no consensus yet" - in other words 'Inconclusive', rather than immediately invalid.
A third task is then sent out and when those results are sent back, all three sets are compared again. The most likely outcome is that two will be found that do agree within the tolerances. However it's entirely possible that all three might 'agree' - the third result fell in between the other two so all three are now close enough - or it could be that there is still no 'close enough' agreement and a 4th task is sent out.
The point of my post was to suggest that the standard app and the anonymous platform app might be returning results with just enough of a difference to be causing a rise in inconclusives. If so, all parties are being disadvantaged. The project needs to send out more 'resend' tasks than otherwise and each of the volunteers involved has the chance of an otherwise good result being rejected based on the chance event of what type of app is used to process the resend.
I don't know for sure if there really is a 'rising inconclusives' problem with the FGRPB1G search. If people responding to Bernd's request for information about other GPU searches are not seeing rising numbers of inconclusives which ultimately leads to rising invalids, then it tends to suggest that there might be.
Cheers,
Gary.
It would be interesting to
)
It would be interesting to see if the "inconclusive" were to go down if a 4090 was using the standard/stock Linux app rather than the AIO/Optimized app.
Tom M
A Proud member of the O.F.A. (Old Farts Association). Be well, do good work, and keep in touch.® (Garrison Keillor) I want some more patience. RIGHT NOW!
Tom M wrote:It would be
)
read back in the thread. that's been tried and no it makes no difference. still high invalids with the stock app. Same with Windows, which does not have an optimized app.
I have a gut feeling that the problem lies in the Nvidia driver, not in the application(s).
_________________________________________________________________________
Ian&Steve C. wrote: Tom M
)
Thank you. I missed that post.
Tom M
A Proud member of the O.F.A. (Old Farts Association). Be well, do good work, and keep in touch.® (Garrison Keillor) I want some more patience. RIGHT NOW!
Gary Roberts wrote: Boca
)
This is an amazingly thought out explanation and makes complete sense. Thank you for taking the time to write this post. I wonder if the 4090 is coming up with a [somewhat] different list of candidates or if the issue is in the double precision evaluation.
I will be trying one of the 4090 GPUs on some of the other GPU tasks this upcoming week to see what happens.
I just installed new drivers a day or two ago- it will be interesting to see if anything changes. Just out of curiosity- would the workstation/professional version of the nvidia driver work on the 4090 (Linux Mint)? The A6000 Ada is the same chip, but I have no idea if anything would change or if this is possible on Linux.