Any way to identify the gpu a task ran on?

Keith Myers
Keith Myers
Joined: 11 Feb 11
Posts: 208
Credit: 140150915
RAC: 283855
Topic 211400

I have multiple gpu systems.  But the stderr.txt output of a validated task doesn't show any identifier of which gpu it ran on.  Is there any logging option for the  FGRPopencl1K-nvidia application to indicate which gpu a task is run on?

BoincStats

Keith Myers
Keith Myers
Joined: 11 Feb 11
Posts: 208
Credit: 140150915
RAC: 283855

Surprised that no one has

Surprised that no one has answered my question or at least responded.

BoincStats

Richie
Richie
Joined: 7 Mar 14
Posts: 203
Credit: 811479202
RAC: 1523377

I don't know how to check

I don't know how to check that if Boinc client has been stopped at some point after uploading a task, but if you run Boinc continuously then I believe the device IDs of your GPUs should still be the same as they were while a task was uploaded.

coproc_debug option for event log would give info of the device ID which is currently running a task. It will produce lines something like

[coproc] NVIDIA instance 0 : confirming ... -xxxxx-...  instance for ... -and task name is here-

[coproc] NVIDIA instance 1 : confirming ... -xxxxx-...  instance for ... -and task name is here-

I've got a host with 2 x GTX 960 and it looks like 'instance' is the same as 'device' while tasks are running.

Then at the same time coproc_info.xml (in Boinc program folder) has info about GPU IDs. There's pci_info : bus id and nvidia_opencl : device_num. If you have identical cards then it might be difficult to know which is what, but somewhere in Windows there should be available additional info about what exactly is installed in pci_bus number X and so on. Perhaps those pieces could be combined succesfully together.

So If a completed task gets validated while Boinc is still running that same "session", then there might be a chance to identify which card crunched that task.

Keith Myers
Keith Myers
Joined: 11 Feb 11
Posts: 208
Credit: 140150915
RAC: 283855

Thanks for the reply.  I was

Thanks for the reply.  I was feeling rather left out from the lack of response.  I have identical cards in systems so the lack of any identifier in the stderr.txt output of a finished task is rather a gamestopper in trying to identify which card a task ran on.  I'm rather used to the stderr.txt output of SETI I guess with the Device #1 or Device #2 or Device #3 printed out and I know how BOINC enumerates the cards from the startup of the Event Log.

Will be rather tedious to scan through the Event Log after the fact and try to look at co-proc_debug output and try to figure out which card ran what task.  And I don't run E@H all that much with the resource share.  The Event Log is long gone since I last ran E@H during the SETI outage last week.

I just wondered if their was some missing or hidden parameter that could be passed to the application like a -verbose tag that would print out the gpu enumeration.

 

BoincStats

Harri Liljeroos
Harri Liljeroos
Joined: 10 Dec 05
Posts: 59
Credit: 137051589
RAC: 498896

Well, BoincTasks does show

Well, BoincTasks does show which device a task is running on and in the history tab you can see it for finished tasks also.

BeemerBiker
BeemerBiker
Joined: 7 May 07
Posts: 23
Credit: 202912569
RAC: 154121

Yes, it would be nice if more

Yes, it would be nice if more information was available in the stderr output.  Some systems have different cards and I cant tell which card caused the problem.  

Some errors is easy such as "netbios time limit exceeded" which I suspect the card is too slow or whatever.

Other errors such as "printer is out of paper"  is really off the wall and no help at all.

Stderr output

<core_client_version>7.6.22</core_client_version>
<![CDATA[
<message>
The printer is out of paper.
 (0x1c) - exit code 28 (0x1c)
</message>
<stderr_txt>
f photon pairs
Other projects (yes, grass is always greener it seems) shows temperatures of each GPU which was useful and I had to add water cooling to one of my gpus to solve that problem.
Note that I am not asking for changes in code to do this.  Heaven forbid that I be sent the source and authorized to add these features.
Keith Myers
Keith Myers
Joined: 11 Feb 11
Posts: 208
Credit: 140150915
RAC: 283855

Harri Liljeroos wrote:Well,

Harri Liljeroos wrote:
Well, BoincTasks does show which device a task is running on and in the history tab you can see it for finished tasks also.

Will have to make use of that feature in BoincTasks.  Didn't think of it till your mention.

BoincStats

Gary Roberts
Gary Roberts
Joined: 9 Feb 05
Posts: 4191
Credit: 10485360042
RAC: 24332272

BeemerBiker wrote:Yes, it

BeemerBiker wrote:
Yes, it would be nice if more information was available in the stderr output.  Some systems have different cards and I cant tell which card caused the problem.

The following comments are based on a (possibly faulty) recollection of what was explained (probably by Bernd) some years ago.  If you wanted to search through his posts, you could probably find what he actually said.  I think I've basically got the gist of it though.

GPU tasks, while crunching, do produce a fairly large stderr output that accumulates locally.  At the start of this output, it does indeed tell you exactly which GPU was being used. In the processing of current tasks, there are something like 1255 binary points to be analysed and the following output is recorded for each one of these binary points.  This example is the second last binary point (1254/1255) for one of my tasks.

% Binary point 1254/1255
% Starting semicoherent search over f0 and f1.
% nf1dots: 31 df1dot: 3.344368011e-15 f1dot_start: -1e-13 f1dot_band: 1e-13
% Filling array of photon pairs

Notice that there are 31 'nf1dots' to be processed for each binary point.  As each one is done, a '.' is output to the log.  So multiply all this output by 1255 and you can see the log file will be quite large.  I believe all of this does accumulate locally.

There is a limit to what BOINC will upload so not all of what accumulates is uploaded.  My recollection is the limit is ~64KB.  These files exceed that by quite a margin.  It used to be that the first 64KB was uploaded and the balance was lost.  Apparently the interesting bits are at the end so this was changed to allow the last 64KB to be uploaded with the initial information being lost.  If you look at what makes it to the website by clicking on the task ID link, you should see what I'm talking about.  So, unfortunately, the startup messages which identify the GPU are not going to be returned for tasks that 'go the distance' and accumulate the full stderr output.

So, while a task is processing and the header part is available locally, it would be possible to grab the GPU details.  There would be no reason why a user script couldn't be triggered say 30 secs after task start to extract the header information accumulating in the slot directory and so link a task name to a GPU device number.  I'm not suggesting you do this if you have a more convenient way (like using BoincTasks history for example) to get that information.  I'm just noting that it's possible to get the data automatically through a script if you really wanted to.

 

Cheers,
Gary.

Keith Myers
Keith Myers
Joined: 11 Feb 11
Posts: 208
Credit: 140150915
RAC: 283855

Hey Gary, thanks for the

Hey Gary, thanks for the detailed explanation.  Makes sense now.  I'm going to have to look at a slot once occupied with a Einstein task to see what the beginning of the stderr.txt file looks like normally.  The BoincTasks solution will be the more common method as it is simpler.

BoincStats

Gary Roberts
Gary Roberts
Joined: 9 Feb 05
Posts: 4191
Credit: 10485360042
RAC: 24332272

You're most welcome!  I'm

You're most welcome!  I'm glad it was of some use in explaining things.

 

Cheers,
Gary.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.