Latest data file for FGRPB1G GPU tasks

Keith Myers

Joined: 11 Feb 11

Posts: 4968

Credit: 18766695999

RAC: 7149527

Which amuses and bemuses me

12 Feb 2019 19:32:22 UTC

Message 169470

(moderation:

)

Which amuses and bemuses me why one host at one time had a 1060 6GB, 1070, 1080 and 1080 Ti and which always reported itself as having four (4) GTX 1060 cards.

The all were CC 6.1 cards.

They all were running the same drivers (software version?)

The 1060 had 6GB. The 1070 and 1080 had 8GB and the 1080Ti had 11GB of memory.

The 1080 Ti clocked at the fastest speed compared to the others.

So why was the lowly 1060 used to define the system as highest capability of all the cards?

archae86

Joined: 6 Dec 05

Posts: 3157

Credit: 7229168221

RAC: 1141529

Bernd Machenschalk

12 Feb 2019 20:19:52 UTC

Message 169472 in response to message 169460

(moderation:

)

Bernd Machenschalk wrote:

Bernd Machenschalk wrote:
We'll try to figure out a way to prevent sending the scheduler older tasks to such cards.

This should be in effect now. Such cards should get app versions with plan class "FGRPopenclTV-nvidia" and tasks from "old" WUs for these app versions should be rejected.

In case other users are interested, I can report how this looks currently on my three systems. The two systems which are Turing-free continue to receive tasks for which the tasks column in BoincTasks shows "1.20 Gamma-ray pulsar binary search #1 on GPUs (FGRPopencl1K-nvidia)".

On both categories of systems, newly received tasks are from the 1047L data file. New issue seems to have skipped 1046L for the moment, transitioning from 1045L half a day ago.

For my mixed Turing+Pascal system all new work shows in the tasks column as "1.20 Gamma-ray pulsar binary search #1 on GPUs (FGRPopenclTV-nvidia)".

BOINC downloaded a fresh executable, named hsgamma_FGRPB1G_1.20_windows_x86_64__FGRPopenclTV-nvidia.exe. This executable was used when I bumped a task into early execution. Unsurprisingly, the new executable is byte-identical to the previous hsgamma_FGRPB1G_1.20_windows_x86_64__FGRPopencl1K-nvidia.exe.

I can't vouch from personal observation for success in blocking non-Turing capable tasks from my Turing system, as those are currently in sporadic re-issue of previously issued work. I assume this will prove successful.

Gary Roberts

Moderator

Joined: 9 Feb 05

Posts: 5872

Credit: 117763515378

RAC: 34784422

Keith Myers wrote:Which

12 Feb 2019 22:28:31 UTC

Message 169476 in response to message 169470

(moderation:

)

Keith Myers wrote:

Which amuses and bemuses me why one host at one time had a 1060 6GB, 1070, 1080 and 1080 Ti and which always reported itself as having four (4) GTX 1060 cards.

Might be better to claim 4 x 1060 than 4 x 1080Ti :-).

Keith Myers wrote:

So why was the lowly 1060 used to define the system as highest capability of all the cards?

It's quite easy for the comments in the code to claim one thing but for the code itself to do something different :-)

The section of code that Richard linked to shows the comparison of two GPU instances and that's about the maximum of my ability to figure out what is going on :-). Presumably, with 4 GPUs, the code should iterate 3 times and compare the 'winner' of the 'first' comparison against #3 and then #4 to arrive at the final answer for the 'best' GPU - or something along those lines.

If you still had all the hardware you mentioned, It should help anyone who was prepared to debug this if you were prepared to run 3 individual tests to see what the code would detect as the 'winner' for the following combinations

GTX 1060 with GTX 1070 (maybe this would get the 'right' answer).
GTX 1060 plus GTX 1070 plus GTX 1080 (maybe this would get it wrong and so reveal where to look for the bug).
All 4 cards - for which we already know the answer - probably worth checking if the current version of BOINC isn't the same as when you originally noticed the problem.

Obviously this is quite an imposition, even if you still have all the hardware. I'm just 'thinking out loud' about what might be helpful information. Also, others reading this who have the first two combinations of either 2 or 3 different GPUs from the same vendor might already be able to comment about their own personal experiences. I have some hosts with 2 GPUs but all of mine have identical GPUs.

PS: I've just remembered that cecht has been playing with an RX 570 and an RX 460 together. Checking his computer list shows the machine as having 2 x RX 570s, both under Windows and Linux. That suggests the bug shows up only when you go to 3 or more GPUs.

Cheers,
Gary.

Keith Myers

Joined: 11 Feb 11

Posts: 4968

Credit: 18766695999

RAC: 7149527

Right now that host has a

12 Feb 2019 23:50:24 UTC

Message 169481

(moderation:

)

Right now that host has a 1070 Ti, two 1080's and a 1080 Ti. Pretty sure it identifies as four (4) 1080's but without the website available right now to check, I am not 100% positive that is the case.

Until Richard posted that snippet of code, I always believed that how BOINC identifies the gpus in the system was based on busID since I have observed the host being identified differently depending on which slots the cards were plugged into. Would take some time to work through all the variables in testing.

kb9skw

Joined: 25 Feb 05

Posts: 21

Credit: 376410512

RAC: 7261

My dual GPU host with two RX

13 Feb 2019 0:20:31 UTC

Message 169483

(moderation:

)

My dual GPU host with two RX 570s reports [2] AMD Radeon RX 570 Series (8192MB). The second RX 570 has only 4GB and is in the 8x slot. Does physically moving the cards around in the board make any difference in how the OS reports GPUs to BOINC?

Keith Myers

Joined: 11 Feb 11

Posts: 4968

Credit: 18766695999

RAC: 7149527

I always thought so. But

13 Feb 2019 0:37:27 UTC

Message 169485

(moderation:

)

I always thought so. But Richard's post of the code suggests that it does not.

Bernd Machenschalk

Moderator

Administrator

Joined: 15 Oct 04

Posts: 4312

Credit: 250653586

RAC: 34371

Richard Haselgrove wrote:Good

13 Feb 2019 6:19:04 UTC

Message 169492 in response to message 169467

(moderation:

)

Richard Haselgrove wrote:

Good guess - you're right, plus there are extra factors considered lower down the priority order.
// return 1/-1/0 if device 1 is more/less/same capable than device 2.
// factors (decreasing priority):
// - compute capability
// - software version
// - memory
// - speed
from https://github.com/BOINC/boinc/blob/master/client/gpu_nvidia.cpp#L134

That's funny - how can the "software version" be different for two cards in the same system?

Richie

Joined: 7 Mar 14

Posts: 656

Credit: 1702989778

RAC: 0

What's the difference between

13 Feb 2019 6:28:04 UTC

Message 169493

(moderation:

)

What's the difference between compute capability and speed ? What is that 'speed' ?

Bernd Machenschalk

Moderator

Administrator

Joined: 15 Oct 04

Posts: 4312

Credit: 250653586

RAC: 34371

Richie wrote:What's the

13 Feb 2019 6:41:29 UTC

Message 169494 in response to message 169493

(moderation:

)

Richie wrote:

What's the difference between compute capability and speed ? What is that 'speed' ?

These are orthogonal. Essentially "compute capability" defines what operations a device is capable of (e.g. double precision floating point math), "speed" is how fast it can do these (IIRC clock rate * #multiprocessors).

Richard Haselgrove

Joined: 10 Dec 05

Posts: 2143

Credit: 2960682680

RAC: 706707

Bernd Machenschalk

13 Feb 2019 8:00:44 UTC

Message 169496 in response to message 169492

(moderation:

)

Bernd Machenschalk wrote:

That's funny - how can the "software version" be different for two cards in the same system?

Firmware? No, I don't really think so either. He might have been thinking about drivers at the time.

I'll walk through the code sometime, but not today.

Latest data file for FGRPB1G GPU tasks

Forums › Cruncher's Corner

Comment viewing options

Forums › Cruncher's Corner