Hello, recently some of my wus are marked as invalid. Checking on the validators' computers, they're all Radeons. Couldn't the cause be a bug in nvidia application?
I've clocked the gpus to factory settings (still they're factory oc'ed). OS Ubuntu 16,04 LTS and driver 375.26. Regards.
Sorry, I've changed the necessary privacy setting. Now it should be visible. Considering my high successful/invalid ratio, I'm just trying to nail the problem. Regards.
It seems that the wus duration has increased since yesterday, is it correct?
Not systematically, i.e. in the "size" of the workunits under control by us.
However, the duration of the last part of the computation is data-dependent. If your GPU isn't capable of double precision computation, this part is done on the CPU and will have a noticeable contribution to the overall runtime.
I've noted around a 35 % increase in my HD7950 (crunching 3 simultaneous units) and around 15 % in my GTX1080 (crunching 4 at a time). No changes in my side, rebooting did not restore previous times and the HD7950 double precision capabitliy is very good even compared to GTX 1080's.
Sorry, I've changed the necessary privacy setting. Now it should be visible. Considering my high successful/invalid ratio, I'm just trying to nail the problem. Regards.
I had a look at your host with the GTX 1080 GPU. Your original message didn't really make it clear how low the invalid rate really is - 4 tasks (about 15 mins crunch time) in the last 6 days. That's an extremely low rate and is most likely due to some implementation differences between NVIDIA and AMD versions of OpenCL.. I didn't delve into the individual quorums to check your statement about "they're all Radeons" as that seems to be a likely scenario :-).
At this very early stage of testing these new apps, the validation process on the results returned may be overly sensitive to minor differences. The Devs will be monitoring this closely (as they always have in the past) to optimise the validator so that 'good' results with minor and unimportant differences are not needlessly rejected.
It's a similar story with the GTX 980 - 6 invalid results over the same time period. Apart from one aborted task, you have no 'error' results so your computers are in good shape. You should continue to monitor your results and raise a query if you suddenly see larger numbers (for example several percent) of your results being marked as invalid.
There is a daily maximum task download limit enforced by the project. It varies a bit by machine (I have a quad-core host with a 1060 and a 1070 which has been restricted by a limit 640).
If you reach the point of running out of work, try a manual update, in case the "new day" has arrived. The automatic imposition of deferral scheme was not properly matched to the actual server "new day" boundary for my system. It might not be for yours either. I don't know when the boundary was, but it was not at midnight UTC, but somewhere between 3 and 13 UTC in my case.
One more Invalid on MAC due to OpenCL Bug. Total of 5 Invalids listed on Web Results.
I will continue monitoring and reporting.
Windows is continuing to do well with the new 1.16 FGRPB1G Units. Still completing tasks 2 at a time in 1 Hr 2 Min, and 1 Hr 3 Min respectively.
I've noticed some Arecibo BRP4G Units NOT Validating... "Validation Error" is coming up on the two in the Web Results List. When looking at the Units, they've gone out several more times to more computers; and ALL are getting "Validation Error". ONE is marked as "WU Cancelled", the other is NOT yet marked as such. I still have MANY Pending BRP4G Units that now may end up with the same "Validation Error" from the Server. If ALL of these were cancelled, the Server(s) SHOULD have made contact with our machines to Stop Crunching these Units if they're ALL cancelled. Instead, my machine, (and others), crunched through a plethora of these Units.
I need some help here - I'm getting wildly different runtimes on two computers with identical R9 280x's. One is getting runtimes of about 975s where the other gets about 420s. Luxmark is showing similar behaviour (4300 vs 12700). Both machines are running exclusively FGRP tasks (one at a time for now until I have sorted this out). Both machines are running idle apart from E@H and have ample cpu and ram to support the gpus (the slower one has 2x L5630, 16 threads, the faster one 2x X5670, so 24 threads, both have 24Gb ram). Both are running OSX El Capitan. I have tried swapping the gpus out to exclude the possibilty of a dodgy gpu, and got identical results, the slow machine remaining the slower one. BIOS settings are identical.
The only difference I can see is that one has it's gpu in a PCIE x8 slot (a server board without any x16 slots) where the other has the gpu a proper x16 slot. My understanding so far however was that the reduced bandwidth of the x8 slot shouldnt affect opencl performance, at least not to this extent.
Does anyone have any ideas what I could try to at least reduce the difference between computation performance?
Hello, recently some of my
)
Hello, recently some of my wus are marked as invalid. Checking on the validators' computers, they're all Radeons. Couldn't the cause be a bug in nvidia application?
I've clocked the gpus to factory settings (still they're factory oc'ed). OS Ubuntu 16,04 LTS and driver 375.26. Regards.
Arif Mert Kapicioglu
)
Not likely. If there were such a bug, wouldn't all nvidia GPUs be returning invalid results?
Your computers are hidden so none of us can look at your returned results to see if there might be any hints as to other possible reasons.
Cheers,
Gary.
Sorry, I've changed the
)
Sorry, I've changed the necessary privacy setting. Now it should be visible. Considering my high successful/invalid ratio, I'm just trying to nail the problem. Regards.
Bernd Machenschalk
)
I've noted around a 35 % increase in my HD7950 (crunching 3 simultaneous units) and around 15 % in my GTX1080 (crunching 4 at a time). No changes in my side, rebooting did not restore previous times and the HD7950 double precision capabitliy is very good even compared to GTX 1080's.
Arif Mert Kapicioglu
)
I had a look at your host with the GTX 1080 GPU. Your original message didn't really make it clear how low the invalid rate really is - 4 tasks (about 15 mins crunch time) in the last 6 days. That's an extremely low rate and is most likely due to some implementation differences between NVIDIA and AMD versions of OpenCL.. I didn't delve into the individual quorums to check your statement about "they're all Radeons" as that seems to be a likely scenario :-).
At this very early stage of testing these new apps, the validation process on the results returned may be overly sensitive to minor differences. The Devs will be monitoring this closely (as they always have in the past) to optimise the validator so that 'good' results with minor and unimportant differences are not needlessly rejected.
It's a similar story with the GTX 980 - 6 invalid results over the same time period. Apart from one aborted task, you have no 'error' results so your computers are in good shape. You should continue to monitor your results and raise a query if you suddenly see larger numbers (for example several percent) of your results being marked as invalid.
Cheers,
Gary.
Somebody could please check
)
Somebody could please check this numbers and explain me what is happening.
1 WU crunch in about 800 secs, running 3@ on each 1070 GPU (2 on this host)..
So it´s 6 WU crunched each 800 secs or 648WU!!!! in a day.
With 693 credit each that gives about 440K per day. Almost the double with BRP4G
But somebody dicedes that not enought and the host starts to show this msg... and gets no new work
17/12/2016 15:17:55 | Einstein@Home | (reached daily quota of 768 tasks)
Did my mind bugs?
juan BFP wrote: (reached
)
There is a daily maximum task download limit enforced by the project. It varies a bit by machine (I have a quad-core host with a 1060 and a 1070 which has been restricted by a limit 640).
If you reach the point of running out of work, try a manual update, in case the "new day" has arrived. The automatic imposition of deferral scheme was not properly matched to the actual server "new day" boundary for my system. It might not be for yours either. I don't know when the boundary was, but it was not at midnight UTC, but somewhere between 3 and 13 UTC in my case.
[Update:]One more Invalid
)
[Update:]
One more Invalid on MAC due to OpenCL Bug. Total of 5 Invalids listed on Web Results.
I will continue monitoring and reporting.
Windows is continuing to do well with the new 1.16 FGRPB1G Units. Still completing tasks 2 at a time in 1 Hr 2 Min, and 1 Hr 3 Min respectively.
I've noticed some Arecibo BRP4G Units NOT Validating... "Validation Error" is coming up on the two in the Web Results List. When looking at the Units, they've gone out several more times to more computers; and ALL are getting "Validation Error". ONE is marked as "WU Cancelled", the other is NOT yet marked as such. I still have MANY Pending BRP4G Units that now may end up with the same "Validation Error" from the Server. If ALL of these were cancelled, the Server(s) SHOULD have made contact with our machines to Stop Crunching these Units if they're ALL cancelled. Instead, my machine, (and others), crunched through a plethora of these Units.
TL
TimeLord04
Have TARDIS, will travel...
Come along K-9!
Join SETI Refugees
Hi all, I need some help
)
Hi all,
I need some help here - I'm getting wildly different runtimes on two computers with identical R9 280x's. One is getting runtimes of about 975s where the other gets about 420s. Luxmark is showing similar behaviour (4300 vs 12700). Both machines are running exclusively FGRP tasks (one at a time for now until I have sorted this out). Both machines are running idle apart from E@H and have ample cpu and ram to support the gpus (the slower one has 2x L5630, 16 threads, the faster one 2x X5670, so 24 threads, both have 24Gb ram). Both are running OSX El Capitan. I have tried swapping the gpus out to exclude the possibilty of a dodgy gpu, and got identical results, the slow machine remaining the slower one. BIOS settings are identical.
The only difference I can see is that one has it's gpu in a PCIE x8 slot (a server board without any x16 slots) where the other has the gpu a proper x16 slot. My understanding so far however was that the reduced bandwidth of the x8 slot shouldnt affect opencl performance, at least not to this extent.
Does anyone have any ideas what I could try to at least reduce the difference between computation performance?
Many thanks in advance,
Kailee.
Since E@H is verhy
)
Since E@H is verhy bandwith-hungry I guess it's caused by the different PCI-lanes. But I can't say more about it since I'm no expert.
Proud member of SETI.Germany