Latest data file for FGRPB1G GPU tasks

Keith Myers
Keith Myers
Joined: 11 Feb 11
Posts: 5049
Credit: 19060199943
RAC: 6474223

Which amuses and bemuses me

Which amuses and bemuses me why one host at one time had a 1060 6GB, 1070, 1080 and 1080 Ti and which always reported itself as having four (4) GTX 1060 cards.

The all were CC 6.1 cards.

They all were running the same drivers (software version?)

The 1060 had 6GB.  The 1070 and 1080 had 8GB and the 1080Ti had 11GB of memory.

The 1080 Ti clocked at the fastest speed compared to the others.

So why was the lowly 1060 used to define the system as highest capability of all the cards?

 

archae86
archae86
Joined: 6 Dec 05
Posts: 3162
Credit: 7319878354
RAC: 2310777

Bernd Machenschalk

Bernd Machenschalk wrote:
Bernd Machenschalk wrote:
We'll try to figure out a way to prevent sending the scheduler older tasks to such cards.

This should be in effect now. Such cards should get app versions with plan class "FGRPopenclTV-nvidia" and tasks from "old" WUs for these app versions should be rejected.

In case other users are interested, I can report how this looks currently on my three systems.  The two systems which are Turing-free continue to receive tasks for which the tasks column in BoincTasks shows "1.20 Gamma-ray pulsar binary search #1 on GPUs (FGRPopencl1K-nvidia)".

On both categories of systems, newly received tasks are from the 1047L data file.  New issue seems to have skipped 1046L for the moment, transitioning from 1045L half a day ago.

For my mixed Turing+Pascal system all new work shows in the tasks column as "1.20 Gamma-ray pulsar binary search #1 on GPUs (FGRPopenclTV-nvidia)".

BOINC downloaded a fresh executable, named hsgamma_FGRPB1G_1.20_windows_x86_64__FGRPopenclTV-nvidia.exe.  This executable was used when I bumped a task into early execution.  Unsurprisingly, the new executable is byte-identical to the previous hsgamma_FGRPB1G_1.20_windows_x86_64__FGRPopencl1K-nvidia.exe.

I can't vouch from personal observation for success in blocking non-Turing capable tasks from my Turing system, as those are currently in sporadic re-issue of previously issued work.  I assume this will prove successful.

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5877
Credit: 118764166549
RAC: 21280873

Keith Myers wrote:Which

Keith Myers wrote:
Which amuses and bemuses me why one host at one time had a 1060 6GB, 1070, 1080 and 1080 Ti and which always reported itself as having four (4) GTX 1060 cards.

Might be better to claim 4 x 1060 than 4 x 1080Ti :-).

Keith Myers wrote:
So why was the lowly 1060 used to define the system as highest capability of all the cards?

It's quite easy for the comments in the code to claim one thing but for the code itself to do something different :-)

The section of code that Richard linked to shows the comparison of two GPU instances and that's about the maximum of my ability to figure out what is going on :-).  Presumably, with 4 GPUs, the code should iterate 3 times and compare the 'winner' of the 'first' comparison against #3 and then #4 to arrive at the final answer for the 'best' GPU - or something along those lines.

If you still had all the hardware you mentioned, It should help anyone who was prepared to debug this if you were prepared to run 3 individual tests to see what the code would detect as the 'winner' for the following combinations

  1. GTX 1060 with GTX 1070  (maybe this would get the 'right' answer).
  2. GTX 1060 plus GTX 1070 plus GTX 1080 (maybe this would get it wrong and so reveal where to look for the bug).
  3. All 4 cards - for which we already know the answer - probably worth checking if the current version of BOINC isn't the same as when you originally noticed the problem.

Obviously this is quite an imposition, even if you still have all the hardware.  I'm just 'thinking out loud' about what might be helpful information.  Also, others reading this who have the first two combinations of either 2 or 3 different GPUs from the same vendor might already be able to comment about their own personal experiences.  I have some hosts with 2 GPUs but all of mine have identical GPUs.

PS:  I've just remembered that cecht has been playing with an RX 570 and an RX 460 together.  Checking his computer list shows the machine as having 2 x RX 570s, both under Windows and Linux.  That suggests the bug shows up only when you go to 3 or more GPUs.

 

Cheers,
Gary.

Keith Myers
Keith Myers
Joined: 11 Feb 11
Posts: 5049
Credit: 19060199943
RAC: 6474223

Right now that host has a

Right now that host has a 1070 Ti, two 1080's and a 1080 Ti.  Pretty sure it identifies as four (4) 1080's but without the website available right now to check, I am not 100% positive that is the case.

Until Richard posted that snippet of code, I always believed that how BOINC identifies the gpus in the system was based on busID since I have observed the host being identified differently depending on which slots the cards were plugged into.   Would take some time to work through all the variables in testing.

 

kb9skw
kb9skw
Joined: 25 Feb 05
Posts: 21
Credit: 377775486
RAC: 24463

My dual GPU host with two RX

My dual GPU host with two RX 570s reports [2] AMD Radeon RX 570 Series (8192MB). The second RX 570 has only 4GB and is in the 8x slot. Does physically moving the cards around in the board make any difference in how the OS reports GPUs to BOINC?

Keith Myers
Keith Myers
Joined: 11 Feb 11
Posts: 5049
Credit: 19060199943
RAC: 6474223

I always thought so.  But

I always thought so.  But Richard's post of the code suggests that it does not.

 

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4334
Credit: 252317233
RAC: 34409

Richard Haselgrove wrote:Good

Richard Haselgrove wrote:

Good guess - you're right, plus there are extra factors considered lower down the priority order.

// return 1/-1/0 if device 1 is more/less/same capable than device 2.
// factors (decreasing priority):
// - compute capability
// - software version
// - memory
// - speed

from https://github.com/BOINC/boinc/blob/master/client/gpu_nvidia.cpp#L134

That's funny - how can the "software version" be different for two cards in the same system?

BM

Richie
Richie
Joined: 7 Mar 14
Posts: 656
Credit: 1702989778
RAC: 0

What's the difference between

What's the difference between compute capability and speed ? What is that 'speed' ?

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4334
Credit: 252317233
RAC: 34409

Richie wrote:What's the

Richie wrote:
What's the difference between compute capability and speed ? What is that 'speed' ?

These are orthogonal. Essentially "compute capability" defines what operations a device is capable of (e.g. double precision floating point math), "speed" is how fast it can do these (IIRC clock rate * #multiprocessors).

BM

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 2143
Credit: 2994242657
RAC: 710173

Bernd Machenschalk

Bernd Machenschalk wrote:
That's funny - how can the "software version" be different for two cards in the same system?

Firmware? No, I don't really think so either. He might have been thinking about drivers at the time.

I'll walk through the code sometime, but not today.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.