Gamma-ray pulsar binary search #1 on GPUs

Arif Mert Kapicioglu

Joined: 16 Jul 09

Posts: 7

Credit: 823300983

RAC: 0

Hello, recently some of my

17 Dec 2016 10:23:27 UTC

Message 152972

(moderation:

)

Hello, recently some of my wus are marked as invalid. Checking on the validators' computers, they're all Radeons. Couldn't the cause be a bug in nvidia application?

I've clocked the gpus to factory settings (still they're factory oc'ed). OS Ubuntu 16,04 LTS and driver 375.26. Regards.

Gary Roberts

Moderator

Joined: 9 Feb 05

Posts: 5872

Credit: 117497987036

RAC: 35477312

Arif Mert Kapicioglu

17 Dec 2016 10:43:39 UTC

Message 152975 in response to message 152972

(moderation:

)

Arif Mert Kapicioglu wrote:

Couldn't the cause be a bug in nvidia application?

Not likely. If there were such a bug, wouldn't all nvidia GPUs be returning invalid results?

Your computers are hidden so none of us can look at your returned results to see if there might be any hints as to other possible reasons.

Cheers,
Gary.

Arif Mert Kapicioglu

Joined: 16 Jul 09

Posts: 7

Credit: 823300983

RAC: 0

Sorry, I've changed the

17 Dec 2016 11:53:26 UTC

Message 152978

(moderation:

)

Sorry, I've changed the necessary privacy setting. Now it should be visible. Considering my high successful/invalid ratio, I'm just trying to nail the problem. Regards.

Trotador

Joined: 2 May 13

Posts: 58

Credit: 2122643213

RAC: 0

Bernd Machenschalk

17 Dec 2016 15:33:18 UTC

Message 152992 in response to message 152967

(moderation:

)

Bernd Machenschalk wrote:

Trotador wrote:
It seems that the wus duration has increased since yesterday, is it correct?

Not systematically, i.e. in the "size" of the workunits under control by us.

However, the duration of the last part of the computation is data-dependent. If your GPU isn't capable of double precision computation, this part is done on the CPU and will have a noticeable contribution to the overall runtime.

I've noted around a 35 % increase in my HD7950 (crunching 3 simultaneous units) and around 15 % in my GTX1080 (crunching 4 at a time). No changes in my side, rebooting did not restore previous times and the HD7950 double precision capabitliy is very good even compared to GTX 1080's.

Gary Roberts

Moderator

Joined: 9 Feb 05

Posts: 5872

Credit: 117497987036

RAC: 35477312

Arif Mert Kapicioglu

17 Dec 2016 20:08:49 UTC

Message 153003 in response to message 152978

(moderation:

)

Arif Mert Kapicioglu wrote:

Sorry, I've changed the necessary privacy setting. Now it should be visible. Considering my high successful/invalid ratio, I'm just trying to nail the problem. Regards.

I had a look at your host with the GTX 1080 GPU. Your original message didn't really make it clear how low the invalid rate really is - 4 tasks (about 15 mins crunch time) in the last 6 days. That's an extremely low rate and is most likely due to some implementation differences between NVIDIA and AMD versions of OpenCL.. I didn't delve into the individual quorums to check your statement about "they're all Radeons" as that seems to be a likely scenario :-).

At this very early stage of testing these new apps, the validation process on the results returned may be overly sensitive to minor differences. The Devs will be monitoring this closely (as they always have in the past) to optimise the validator so that 'good' results with minor and unimportant differences are not needlessly rejected.

It's a similar story with the GTX 980 - 6 invalid results over the same time period. Apart from one aborted task, you have no 'error' results so your computers are in good shape. You should continue to monitor your results and raise a query if you suddenly see larger numbers (for example several percent) of your results being marked as invalid.

Cheers,
Gary.

juan BFP

Joined: 18 Nov 11

Posts: 839

Credit: 421443712

RAC: 0

Somebody could please check

17 Dec 2016 20:47:11 UTC

Message 153004

(moderation:

)

Somebody could please check this numbers and explain me what is happening.

1 WU crunch in about 800 secs, running 3@ on each 1070 GPU (2 on this host)..

So it´s 6 WU crunched each 800 secs or 648WU!!!! in a day.

With 693 credit each that gives about 440K per day. Almost the double with BRP4G

But somebody dicedes that not enought and the host starts to show this msg... and gets no new work

17/12/2016 15:17:55 | Einstein@Home | (reached daily quota of 768 tasks)

Did my mind bugs?

archae86

Joined: 6 Dec 05

Posts: 3157

Credit: 7219624931

RAC: 975722

juan BFP wrote: (reached

17 Dec 2016 22:14:00 UTC

Message 153005 in response to message 153004

(moderation:

)

juan BFP wrote:

(reached daily quota of 768 tasks)

There is a daily maximum task download limit enforced by the project. It varies a bit by machine (I have a quad-core host with a 1060 and a 1070 which has been restricted by a limit 640).

If you reach the point of running out of work, try a manual update, in case the "new day" has arrived. The automatic imposition of deferral scheme was not properly matched to the actual server "new day" boundary for my system. It might not be for yours either. I don't know when the boundary was, but it was not at midnight UTC, but somewhere between 3 and 13 UTC in my case.

TimeLord04

Joined: 8 Sep 06

Posts: 1442

Credit: 72378840

RAC: 0

[Update:]One more Invalid

17 Dec 2016 23:25:29 UTC

Message 153008

(moderation:

)

[Update:]

One more Invalid on MAC due to OpenCL Bug. Total of 5 Invalids listed on Web Results.

I will continue monitoring and reporting.
Windows is continuing to do well with the new 1.16 FGRPB1G Units. Still completing tasks 2 at a time in 1 Hr 2 Min, and 1 Hr 3 Min respectively.

I've noticed some Arecibo BRP4G Units NOT Validating... "Validation Error" is coming up on the two in the Web Results List. When looking at the Units, they've gone out several more times to more computers; and ALL are getting "Validation Error". ONE is marked as "WU Cancelled", the other is NOT yet marked as such. I still have MANY Pending BRP4G Units that now may end up with the same "Validation Error" from the Server. If ALL of these were cancelled, the Server(s) SHOULD have made contact with our machines to Stop Crunching these Units if they're ALL cancelled. Instead, my machine, (and others), crunched through a plethora of these Units.

TimeLord04
Have TARDIS, will travel...
Come along K-9!
Join SETI Refugees

Kailee71

Joined: 22 Nov 16

Posts: 35

Credit: 42623563

RAC: 0

Hi all, I need some help

18 Dec 2016 9:50:55 UTC

Message 153017

(moderation:

)

Hi all,

I need some help here - I'm getting wildly different runtimes on two computers with identical R9 280x's. One is getting runtimes of about 975s where the other gets about 420s. Luxmark is showing similar behaviour (4300 vs 12700). Both machines are running exclusively FGRP tasks (one at a time for now until I have sorted this out). Both machines are running idle apart from E@H and have ample cpu and ram to support the gpus (the slower one has 2x L5630, 16 threads, the faster one 2x X5670, so 24 threads, both have 24Gb ram). Both are running OSX El Capitan. I have tried swapping the gpus out to exclude the possibilty of a dodgy gpu, and got identical results, the slow machine remaining the slower one. BIOS settings are identical.

The only difference I can see is that one has it's gpu in a PCIE x8 slot (a server board without any x16 slots) where the other has the gpu a proper x16 slot. My understanding so far however was that the reduced bandwidth of the x8 slot shouldnt affect opencl performance, at least not to this extent.

Does anyone have any ideas what I could try to at least reduce the difference between computation performance?

Many thanks in advance,

Kailee.

Defender

Joined: 17 Jul 12

Posts: 19

Credit: 315949091

RAC: 76255

Since E@H is verhy

18 Dec 2016 10:03:17 UTC

Message 153018

(moderation:

)

Since E@H is verhy bandwith-hungry I guess it's caused by the different PCI-lanes. But I can't say more about it since I'm no expert.

Proud member of SETI.Germany

Gamma-ray pulsar binary search #1 on GPUs

Forums › Technical News

Comment viewing options

Forums › Technical News