Because you had hosts running Linux as wingmen on the WU.
The is a known issue with the S5R2 app at this point, and the team has been working hard to nail down the exact cause and fix it for about a month.
Keep in mind that S5R2 is essentially a Beta run for the next full scale science runs to follow.
Alinator
Thanks for the quick reply. Does this mean that I am not going to get any credit for that WU. I hope not as it takes about a day to process these WUs. Anything I can do?
Unfortunately, that's what it means at this point, and most of us have gotten burned on the cross platform validation issue at least once. OTOH, since you are on Windows, the odds are in your favor your wingman will be another Winbox.
As I said, the team is working on it but this app is an all new approach to the way we analyze the data and getting all the 'pieces/parts' to work together smoothly seems to be proving a little tough. Also there really isn't anything you can do at your end except keep an eye on the NC forum for any new news from Bernd or the rest of the team regarding the status.
I know it's little discouraging to get 'zipped' with the runtimes such as they are right at the moment, but the payback will be a much better app when we get to the next full length science run later on.
As Alinator explained, Windows and Linux sometimes produce slightly different answers, just different enough for the result to be sent to a third computer to get another opinion. The law of averages says that the third opinion is likely to come from another Windows box so it is usually the Linux box that gets shafted.
If you look in detail at other "pendings" in your results list you will see that this is going to happen again but you are unlikely to miss out again. This time the third box is another Windows box so you should be safe and the Linux guy will be shafted - as usual :).
EDIT: To help you find it quickly, here is a link to the WU where the Linux guy is likely to miss out. If you actually look at his full results list, you will find he has 16 completed results showing. There is one where he has already received zero credit and there are three further pendings which are "checked but no consensus yet". It is quite likely that he will miss out on 25% of the credit in his current list because of problems with the validator. As a significant Linux user myself, I'm very frustrated with this ongoing waste of resources.
EDIT: To help you find it quickly, here is a link to the WU where the Linux guy is likely to miss out. If you actually look at his full results list, you will find he has 16 completed results showing. There is one where he has already received zero credit and there are three further pendings which are "checked but no consensus yet". It is quite likely that he will miss out on 25% of the credit in his current list because of problems with the validator. As a significant Linux user myself, I'm very frustrated with this ongoing waste of resources.
As a Linux user myself, I would suggest this user to upgrade his 2.14.xx kernel to a 2.16.xx kernel. I never have any validation problem with Windows users on my PII Deschutes running SuSE Linux 10.1.
Tullio
In this workunit, the three computers are Windows Intel (first two were Pentium 4 windows XP, third one was a Celeron W2k).
Even the version (4.17) of the app was the same between the boxes.
1,
Intel Pentium 4 CPU 2.80GHz [x86 Family 15 Model 2 Stepping 9]
Microsoft Windows XP Professional Edition, Service Pack 2, (05.01.2600.00)
core client version: 5.8.15
time: 245,290.30
granted credit: 0.00
2,
Intel Pentium 4 CPU 3.00GHz [x86 Family 15 Model 6 Stepping 2]
Microsoft Windows XP Professional Edition, Service Pack 2, (05.01.2600.00)
core client version: 5.9.10
time: 254,179.84
granted credit: 502.45
3,
Intel Celeron CPU 2.60GHz [x86 Family 15 Model 2 Stepping 9]
Microsoft Windows 2000 Professional Edition, Service Pack 4, (05.00.2195.00)
core client version: 5.10.1
time: 222,296.13
granted credit: 502.45
Even the version (4.17) of the app was the same between the boxes.
Maybe they are focusing in the wrong issue...
IIRC, Bernd mentioned that there are two known issues wrt validation:
- One is the x-platform validation problem. This is about Windows boxes computing different results than Linux or Darwin boxes in some cases. It's a fact and can be reproduced. This issue is not solved yet and is under investigation.
- Another issue has something to do with the way the app writes and reads checkpoint files, AFAIK you can get slightly different results depending on the number of times your app terminated and restarted during the course of a workunit. This can prevent hosts of the same platform to vailidate against each other. Hosts that are configured not to retain the application in memory while suspended should be affected more frequently. This particular issue was fixed in the beta, IIRC.
Right??
Apart from that, you will always see ab few same-platform validation errors from boxes that were overclocked just a bit too aggressively.
Even the version (4.17) of the app was the same between the boxes.
Maybe they are focusing in the wrong issue...
IIRC, Bernd mentioned that there are two known issues wrt validation:
- One is the x-platform validation problem. This is about Windows boxes computing different results than Linux or Darwin boxes in some cases. It's a fact and can be reproduced. This issue is not solved yet and is under investigation.
- Another issue has something to do with the way the app writes and reads checkpoint files, AFAIK you can get slightly different results depending on the number of times your app terminated and restarted during the course of a workunit. This can prevent hosts of the same platform to vailidate against each other. Hosts that are configured not to retain the application in memory while suspended should be affected more frequently. This particular issue was fixed in the beta, IIRC.
Right??
Apart from that, you will always see ab few same-platform validation errors from boxes that were overclocked just a bit too aggressively.
As a Linux user myself, I would suggest this user to upgrade his 2.14.xx kernel to a 2.16.xx kernel. I never have any validation problem with Windows users on my PII Deschutes running SuSE Linux 10.1.
Tullio
That's won't help. Just got 0 credit running the 2.6.20 kernel. Besides, the OP is running Windows, not Linux.
Why did I get zero credit on this WU?
)
Because you had hosts running Linux as wingmen on the WU.
The is a known issue with the S5R2 app at this point, and the team has been working hard to nail down the exact cause and fix it for about a month.
Keep in mind that S5R2 is essentially a Beta run for the next full scale science runs to follow.
Alinator
RE: Because you had hosts
)
Thanks for the quick reply. Does this mean that I am not going to get any credit for that WU. I hope not as it takes about a day to process these WUs. Anything I can do?
Unfortunately, that's what it
)
Unfortunately, that's what it means at this point, and most of us have gotten burned on the cross platform validation issue at least once. OTOH, since you are on Windows, the odds are in your favor your wingman will be another Winbox.
As I said, the team is working on it but this app is an all new approach to the way we analyze the data and getting all the 'pieces/parts' to work together smoothly seems to be proving a little tough. Also there really isn't anything you can do at your end except keep an eye on the NC forum for any new news from Bernd or the rest of the team regarding the status.
I know it's little discouraging to get 'zipped' with the runtimes such as they are right at the moment, but the payback will be a much better app when we get to the next full length science run later on.
Alinator
RE: http://einstein.phys.uw
)
As Alinator explained, Windows and Linux sometimes produce slightly different answers, just different enough for the result to be sent to a third computer to get another opinion. The law of averages says that the third opinion is likely to come from another Windows box so it is usually the Linux box that gets shafted.
If you look in detail at other "pendings" in your results list you will see that this is going to happen again but you are unlikely to miss out again. This time the third box is another Windows box so you should be safe and the Linux guy will be shafted - as usual :).
EDIT: To help you find it quickly, here is a link to the WU where the Linux guy is likely to miss out. If you actually look at his full results list, you will find he has 16 completed results showing. There is one where he has already received zero credit and there are three further pendings which are "checked but no consensus yet". It is quite likely that he will miss out on 25% of the credit in his current list because of problems with the validator. As a significant Linux user myself, I'm very frustrated with this ongoing waste of resources.
Cheers,
Gary.
RE: EDIT: To help you
)
As a Linux user myself, I would suggest this user to upgrade his 2.14.xx kernel to a 2.16.xx kernel. I never have any validation problem with Windows users on my PII Deschutes running SuSE Linux 10.1.
Tullio
Besides, it's not really a
)
Besides, it's not really a platform validation problem.
http://einsteinathome.org/workunit/33732455
In this workunit, the three computers are Windows Intel (first two were Pentium 4 windows XP, third one was a Celeron W2k).
Even the version (4.17) of the app was the same between the boxes.
Maybe they are focusing in the wrong issue...
Nice
)
Nice hit!
1,
Intel Pentium 4 CPU 2.80GHz [x86 Family 15 Model 2 Stepping 9]
Microsoft Windows XP Professional Edition, Service Pack 2, (05.01.2600.00)
core client version: 5.8.15
time: 245,290.30
granted credit: 0.00
2,
Intel Pentium 4 CPU 3.00GHz [x86 Family 15 Model 6 Stepping 2]
Microsoft Windows XP Professional Edition, Service Pack 2, (05.01.2600.00)
core client version: 5.9.10
time: 254,179.84
granted credit: 502.45
3,
Intel Celeron CPU 2.60GHz [x86 Family 15 Model 2 Stepping 9]
Microsoft Windows 2000 Professional Edition, Service Pack 4, (05.00.2195.00)
core client version: 5.10.1
time: 222,296.13
granted credit: 502.45
RE: Even the version
)
IIRC, Bernd mentioned that there are two known issues wrt validation:
- One is the x-platform validation problem. This is about Windows boxes computing different results than Linux or Darwin boxes in some cases. It's a fact and can be reproduced. This issue is not solved yet and is under investigation.
- Another issue has something to do with the way the app writes and reads checkpoint files, AFAIK you can get slightly different results depending on the number of times your app terminated and restarted during the course of a workunit. This can prevent hosts of the same platform to vailidate against each other. Hosts that are configured not to retain the application in memory while suspended should be affected more frequently. This particular issue was fixed in the beta, IIRC.
Right??
Apart from that, you will always see ab few same-platform validation errors from boxes that were overclocked just a bit too aggressively.
CU
BRM
RE: RE: Even the version
)
No overclocking here :(
RE: As a Linux user myself,
)
That's won't help. Just got 0 credit running the 2.6.20 kernel. Besides, the OP is running Windows, not Linux.
http://einsteinathome.org/workunit/34017383