Gamma-ray pulsar binary search #1 on GPUs

Arif Mert Kapicioglu
Arif Mert Kapicioglu
Joined: 16 Jul 09
Posts: 7
Credit: 823300983
RAC: 0

Hello, recently some of my

Hello, recently some of my wus are marked as invalid. Checking on the validators' computers, they're all Radeons. Couldn't the cause be a bug in nvidia application?

I've clocked the gpus to factory settings (still they're factory oc'ed). OS Ubuntu 16,04 LTS and driver 375.26. Regards.

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5872
Credit: 117628099569
RAC: 35210296

Arif Mert Kapicioglu

Arif Mert Kapicioglu wrote:
Couldn't the cause be a bug in nvidia application?

Not likely.  If there were such a bug, wouldn't all nvidia GPUs be returning invalid results?

Your computers are hidden so none of us can look at your returned results to see if there might be any hints as to other possible reasons.

 

Cheers,
Gary.

Arif Mert Kapicioglu
Arif Mert Kapicioglu
Joined: 16 Jul 09
Posts: 7
Credit: 823300983
RAC: 0

Sorry, I've changed the

Sorry, I've changed the necessary privacy setting. Now it should be visible. Considering my high successful/invalid ratio, I'm just trying to nail the problem. Regards.

Trotador
Trotador
Joined: 2 May 13
Posts: 58
Credit: 2122643213
RAC: 0

Bernd Machenschalk

Bernd Machenschalk wrote:
Trotador wrote:
It seems that the wus duration has increased since yesterday,  is it correct?

Not systematically, i.e. in the "size" of the workunits under control by us.

However, the duration of the last part of the computation is data-dependent. If your GPU isn't capable of double precision computation, this part is done on the CPU and will have a noticeable contribution to the overall runtime.

I've noted around a 35 % increase in my HD7950 (crunching 3 simultaneous units)  and around 15 % in my GTX1080 (crunching 4 at a time). No changes in my side, rebooting did not restore previous times and the HD7950 double precision capabitliy is very good even compared to GTX 1080's.

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5872
Credit: 117628099569
RAC: 35210296

Arif Mert Kapicioglu

Arif Mert Kapicioglu wrote:
Sorry, I've changed the necessary privacy setting. Now it should be visible. Considering my high successful/invalid ratio, I'm just trying to nail the problem. Regards.

I had a look at your host with the GTX 1080 GPU.  Your original message didn't really make it clear how low the invalid rate really is - 4 tasks (about 15 mins crunch time) in the last 6 days.  That's an extremely low rate and is most likely due to some implementation differences between NVIDIA and AMD versions of OpenCL..  I didn't delve into the individual quorums to check your statement about "they're all Radeons" as that seems to be a likely scenario :-).

At this very early stage of testing these new apps, the validation process on the results returned may be overly sensitive to minor differences.  The Devs will be monitoring this closely (as they always have in the past) to optimise the validator so that 'good' results with minor and unimportant differences are not needlessly rejected.

It's a similar story with the GTX 980 - 6 invalid results over the same time period.  Apart from one aborted task, you have no 'error' results so your computers are in good shape.  You should continue to monitor your results and raise a query if you suddenly see larger numbers (for example several percent) of your results being marked as invalid.

 

Cheers,
Gary.

juan BFP
juan BFP
Joined: 18 Nov 11
Posts: 839
Credit: 421443712
RAC: 0

Somebody could please check

Somebody could please check this numbers and explain me what is happening.

1 WU crunch in about 800 secs, running 3@ on each 1070 GPU (2 on this host)..

So it´s 6 WU crunched each 800 secs or  648WU!!!!  in a day. 

With 693 credit each that gives about 440K per day.   Almost the double with BRP4G 

But somebody dicedes that not enought  and the host starts to show this msg... and gets no new work

17/12/2016 15:17:55 | Einstein@Home | (reached daily quota of 768 tasks)

Did my mind bugs?

 

 

lHj2ixL.jpg

 

archae86
archae86
Joined: 6 Dec 05
Posts: 3157
Credit: 7223054931
RAC: 969191

juan BFP wrote: (reached

juan BFP wrote:
(reached daily quota of 768 tasks)

There is a daily maximum task download limit enforced by the project.  It varies a bit by machine (I have a quad-core host with a 1060 and a 1070 which has been restricted by a limit 640).

If you reach the point of running out of work, try a manual update, in case the "new day" has arrived.  The automatic imposition of deferral scheme was not properly matched to the actual server "new day" boundary for my system.  It might not be for yours either.  I don't know when the boundary was, but it was not at midnight UTC, but somewhere between 3 and 13 UTC in my case.

TimeLord04
TimeLord04
Joined: 8 Sep 06
Posts: 1442
Credit: 72378840
RAC: 0

[Update:]One more Invalid

[Update:]

One more Invalid on MAC due to OpenCL Bug.  Total of 5 Invalids listed on Web Results.

I will continue monitoring and reporting.
Windows is continuing to do well with the new 1.16 FGRPB1G Units.  Still completing tasks 2 at a time in 1 Hr 2 Min, and 1 Hr 3 Min respectively.

I've noticed some Arecibo BRP4G Units NOT Validating...  "Validation Error" is coming up on the two in the Web Results List.  When looking at the Units, they've gone out several more times to more computers; and ALL are getting "Validation Error".  ONE is marked as "WU Cancelled", the other is NOT yet marked as such.  I still have MANY Pending BRP4G Units that now may end up with the same "Validation Error" from the Server.  If ALL of these were cancelled, the Server(s) SHOULD have made contact with our machines to Stop Crunching these Units if they're ALL cancelled.  Instead, my machine, (and others), crunched through a plethora of these Units.

 

TL

TimeLord04
Have TARDIS, will travel...
Come along K-9!
Join SETI Refugees

Kailee71
Kailee71
Joined: 22 Nov 16
Posts: 35
Credit: 42623563
RAC: 0

Hi all, I need some help

Hi all,

I need some help here - I'm getting wildly different runtimes on two computers with identical R9 280x's. One is getting runtimes of about 975s where the other gets about 420s. Luxmark is showing similar behaviour (4300 vs 12700). Both machines are running exclusively FGRP tasks (one at a time for now until I have sorted this out). Both machines are running idle apart from E@H and have ample cpu and ram to support the gpus (the slower one has 2x L5630, 16 threads, the faster one 2x X5670, so 24 threads, both have 24Gb ram). Both are running OSX El Capitan. I have tried swapping the gpus out to exclude the possibilty of a dodgy gpu, and got identical results, the slow machine remaining the slower one. BIOS settings are identical.

The only difference I can see is that one has it's gpu in a PCIE x8 slot (a server board without any x16 slots) where the other has the gpu a proper x16 slot. My understanding so far however was that the reduced bandwidth of the x8 slot shouldnt affect opencl performance, at least not to this extent.

Does anyone have any ideas what I could try to at least reduce the difference between computation performance?

Many thanks in advance,

 

Kailee.

Defender
Defender
Joined: 17 Jul 12
Posts: 19
Credit: 316332683
RAC: 84351

Since E@H is verhy

Since E@H is verhy bandwith-hungry I guess it's caused by the different PCI-lanes. But I can't say more about it since I'm no expert.

Proud member of SETI.Germany

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.