Any way to identify the gpu a task ran on?

Keith Myers

Joined: 11 Feb 11

Posts: 5035

Credit: 18994643344

RAC: 6740823

30 Nov 2017 4:35:14 UTC

Topic 211400

(moderation:

)

I have multiple gpu systems. But the stderr.txt output of a validated task doesn't show any identifier of which gpu it ran on. Is there any logging option for the FGRPopencl1K-nvidia application to indicate which gpu a task is run on?

Keith Myers

Joined: 11 Feb 11

Posts: 5035

Credit: 18994643344

RAC: 6740823

Surprised that no one has

5 Dec 2017 18:56:43 UTC

Message 163241

(moderation:

)

Surprised that no one has answered my question or at least responded.

Richie

Joined: 7 Mar 14

Posts: 656

Credit: 1702989778

RAC: 0

I don't know how to check

5 Dec 2017 21:25:14 UTC

Message 163243

(moderation:

)

I don't know how to check that if Boinc client has been stopped at some point after uploading a task, but if you run Boinc continuously then I believe the device IDs of your GPUs should still be the same as they were while a task was uploaded.

coproc_debug option for event log would give info of the device ID which is currently running a task. It will produce lines something like

[coproc] NVIDIA instance 0 : confirming ... -xxxxx-... instance for ... -and task name is here-

[coproc] NVIDIA instance 1 : confirming ... -xxxxx-... instance for ... -and task name is here-

I've got a host with 2 x GTX 960 and it looks like 'instance' is the same as 'device' while tasks are running.

Then at the same time coproc_info.xml (in Boinc program folder) has info about GPU IDs. There's pci_info : bus id and nvidia_opencl : device_num. If you have identical cards then it might be difficult to know which is what, but somewhere in Windows there should be available additional info about what exactly is installed in pci_bus number X and so on. Perhaps those pieces could be combined succesfully together.

So If a completed task gets validated while Boinc is still running that same "session", then there might be a chance to identify which card crunched that task.

Keith Myers

Joined: 11 Feb 11

Posts: 5035

Credit: 18994643344

RAC: 6740823

Thanks for the reply. I was

5 Dec 2017 23:44:26 UTC

Message 163244

(moderation:

)

Thanks for the reply. I was feeling rather left out from the lack of response. I have identical cards in systems so the lack of any identifier in the stderr.txt output of a finished task is rather a gamestopper in trying to identify which card a task ran on. I'm rather used to the stderr.txt output of SETI I guess with the Device #1 or Device #2 or Device #3 printed out and I know how BOINC enumerates the cards from the startup of the Event Log.

Will be rather tedious to scan through the Event Log after the fact and try to look at co-proc_debug output and try to figure out which card ran what task. And I don't run E@H all that much with the resource share. The Event Log is long gone since I last ran E@H during the SETI outage last week.

I just wondered if their was some missing or hidden parameter that could be passed to the application like a -verbose tag that would print out the gpu enumeration.

Harri Liljeroos

Joined: 10 Dec 05

Posts: 4490

Credit: 3284385003

RAC: 2028615

Well, BoincTasks does show

6 Dec 2017 10:01:07 UTC

Message 163247

(moderation:

)

Well, BoincTasks does show which device a task is running on and in the history tab you can see it for finished tasks also.

Joseph Stateson

Joined: 7 May 07

Posts: 174

Credit: 3098824441

RAC: 862680

Yes, it would be nice if more

6 Dec 2017 16:33:59 UTC

Message 163255

(moderation:

)

Yes, it would be nice if more information was available in the stderr output. Some systems have different cards and I cant tell which card caused the problem.

Some errors is easy such as "netbios time limit exceeded" which I suspect the card is too slow or whatever.

Other errors such as "printer is out of paper" is really off the wall and no help at all.

Stderr output

<core_client_version>7.6.22</core_client_version>
<![CDATA[
<message>
The printer is out of paper.
 (0x1c) - exit code 28 (0x1c)
</message>
<stderr_txt>
f photon pairs

Other projects (yes, grass is always greener it seems) shows temperatures of each GPU which was useful and I had to add water cooling to one of my gpus to solve that problem.

Note that I am not asking for changes in code to do this.  Heaven forbid that I be sent the source and authorized to add these features.

Keith Myers

Joined: 11 Feb 11

Posts: 5035

Credit: 18994643344

RAC: 6740823

Harri Liljeroos wrote:Well,

6 Dec 2017 21:14:51 UTC

Message 163262 in response to message 163247

(moderation:

)

Harri Liljeroos wrote:

Well, BoincTasks does show which device a task is running on and in the history tab you can see it for finished tasks also.

Will have to make use of that feature in BoincTasks. Didn't think of it till your mention.

Gary Roberts

Moderator

Joined: 9 Feb 05

Posts: 5877

Credit: 118567127587

RAC: 21397801

BeemerBiker wrote:Yes, it

8 Dec 2017 1:37:39 UTC

Message 163278 in response to message 163255

(moderation:

)

BeemerBiker wrote:

Yes, it would be nice if more information was available in the stderr output. Some systems have different cards and I cant tell which card caused the problem.

The following comments are based on a (possibly faulty) recollection of what was explained (probably by Bernd) some years ago. If you wanted to search through his posts, you could probably find what he actually said. I think I've basically got the gist of it though.

GPU tasks, while crunching, do produce a fairly large stderr output that accumulates locally. At the start of this output, it does indeed tell you exactly which GPU was being used. In the processing of current tasks, there are something like 1255 binary points to be analysed and the following output is recorded for each one of these binary points. This example is the second last binary point (1254/1255) for one of my tasks.

% Binary point 1254/1255
% Starting semicoherent search over f0 and f1.
% nf1dots: 31 df1dot: 3.344368011e-15 f1dot_start: -1e-13 f1dot_band: 1e-13
% Filling array of photon pairs

Notice that there are 31 'nf1dots' to be processed for each binary point. As each one is done, a '.' is output to the log. So multiply all this output by 1255 and you can see the log file will be quite large. I believe all of this does accumulate locally.

There is a limit to what BOINC will upload so not all of what accumulates is uploaded. My recollection is the limit is ~64KB. These files exceed that by quite a margin. It used to be that the first 64KB was uploaded and the balance was lost. Apparently the interesting bits are at the end so this was changed to allow the last 64KB to be uploaded with the initial information being lost. If you look at what makes it to the website by clicking on the task ID link, you should see what I'm talking about. So, unfortunately, the startup messages which identify the GPU are not going to be returned for tasks that 'go the distance' and accumulate the full stderr output.

So, while a task is processing and the header part is available locally, it would be possible to grab the GPU details. There would be no reason why a user script couldn't be triggered say 30 secs after task start to extract the header information accumulating in the slot directory and so link a task name to a GPU device number. I'm not suggesting you do this if you have a more convenient way (like using BoincTasks history for example) to get that information. I'm just noting that it's possible to get the data automatically through a script if you really wanted to.

Cheers,
Gary.

Keith Myers

Joined: 11 Feb 11

Posts: 5035

Credit: 18994643344

RAC: 6740823

Hey Gary, thanks for the

8 Dec 2017 3:36:34 UTC

Message 163279

(moderation:

)

Hey Gary, thanks for the detailed explanation. Makes sense now. I'm going to have to look at a slot once occupied with a Einstein task to see what the beginning of the stderr.txt file looks like normally. The BoincTasks solution will be the more common method as it is simpler.

Gary Roberts

Moderator

Joined: 9 Feb 05

Posts: 5877

Credit: 118567127587

RAC: 21397801

You're most welcome! I'm

8 Dec 2017 6:34:30 UTC

Message 163282

(moderation:

)

You're most welcome! I'm glad it was of some use in explaining things.

Cheers,
Gary.

Any way to identify the gpu a task ran on?

Forums › Cruncher's Corner

Stderr output

Comment viewing options

Forums › Cruncher's Corner