Diversity in FGRP GPU tasks

Raistmer*
Raistmer*
Joined: 20 Feb 05
Posts: 208
Credit: 179875154
RAC: 113055
Topic 224935

Looking closely on FGRP tasks on NV GPU there are big difference in completion times and in GPU load.

For example currently running one has ~99% GPU load, 13% memory controller load and 0% bus load.

While yesterday I saw task with ~60-74% GPU load, ~8% memory controller load and ~30+ % of bus load.

It seems there is some diversity in FGRP tasks parameters. Where to look? What defines their "type" ?

 

archae86
archae86
Joined: 6 Dec 05
Posts: 3145
Credit: 7052584931
RAC: 1625629

That is curious, Raistmer. 

That is curious, Raistmer.  While the current GW GPU tasks certainly vary considerably in total work content, and thus elapsed time, I've believed the FGRP tasks to be remarkably similar each to the next over weeks of time, with the occasional sudden shift with a new batch.  The most recent such shift came with tasks with names starting LATeah300, a few weeks ago.

If you have seen big variations in completion times, could it be that your system was simultaneously running competing work of some kind, and that the competition was not matched across your comparison?

Regarding variation in GPU and memory controller utilization, and bus usage, there is a strong variation systematically within each individual task.  Startup of the task, the long main portion of the task, and the final phase differ substantially.  The final phase is easily defined pretty accurately as that period within which the reported completion stalls at something very slightly below 90%.  The first phase is not so simply detected, nor so clean a break, but is readily observed by real-time monitoring of GPU utilization, power consumption, or pretty much any other activity indicator.  In all cases, the GPU usage is high and very consistent in the long main portion, and far less to the point of almost being negligible during startup and final phases.

So I wonder for these activity measures whether perhaps you relied on snapshot instantaneous readings, rather than averages over the full task completion?

I admit my experience in the last couple of years is exclusively running these tasks on AMD, not Nvidia, but still I doubt that somehow the Nvidia variants differ so much when the AMD ones differ so very little.

DF1DX
DF1DX
Joined: 14 Aug 10
Posts: 102
Credit: 2953566925
RAC: 1858249

The new series, starting with

The new series, starting with LATeah3003*, are calculated slightly faster on my host (Radeon VII) than the previous ones.

~05:45 versus 06:20 minutes.

IMHO a different method of calculation.

Raistmer*
Raistmer*
Joined: 20 Feb 05
Posts: 208
Credit: 179875154
RAC: 113055

Well, that's possibility I

Well, that's possibility I want to rule out - if it's host state difference or difference between tasks.

Here are results from that host.

 

and

 

I list smallest and longest times.

Full list here: https://einsteinathome.org/host/12862097/tasks/0/40?page=3&sort=asc&order=CPU%20time

 

Quite a big difference. 

I looked on GPU load on the same GPU task while it was in the middle of execution with 1 idle CPU core and all 4 CPU cores busy - not big difference. GPU app has higher priority and took full CPU core % in both cases.

And I saw finish stage - it quite differs indeed so currently i can distinguish between it and middle-time stage.

 

archae86
archae86
Joined: 6 Dec 05
Posts: 3145
Credit: 7052584931
RAC: 1625629

Raistmer* wrote:I looked on

Raistmer* wrote:
I looked on GPU load on the same GPU task while it was in the middle of execution with 1 idle CPU core and all 4 CPU cores busy - not big difference. GPU app has higher priority and took full CPU core % in both cases.

The build path used for NVidia GPU applications here at Einstein running under openCL (not CUDA) uses a coordination mechanism that relies on the CPU support task running a continuous polling loop.  So it will always use 100% of a core unless the OS takes the slot away from it.  Even though, overwhelmingly often, a given trip around the polling loop is performing no useful work, any delay in honoring a request for CPU services, once it has been posted, increases the elapsed time for the task.  So competing work of any kind (BOINC or not) can matter.

You show almost a 4:1 difference between the low and the high end of elapsed times for what I would expect to be tasks extremely similar in actual work content.  Perhaps someone else here with Nvidia hardware and the same general OS as yours could comment on how your observations compare to theirs.

Raistmer*
Raistmer*
Joined: 20 Feb 05
Posts: 208
Credit: 179875154
RAC: 113055

Yeah, I would like some

Yeah, I would like some additional expertise in this matter. This particular host has (sometime) very strong issues with sound while GPU crunching. Sometime not. Currently - not, and 99% GPU utilization currently..

How all this connected don't know for now, need more observations (most time these days BOINC just crunch and I just snooze it if it interferes w/o looking in details)

Tom M
Tom M
Joined: 2 Feb 06
Posts: 5633
Credit: 7717296431
RAC: 2052878

I am running a gtx 1060 3gb

I am running a gtx 1060 3gb and a gtx 1660 Super on this system with Windows 10.  Don't know if it will be helpful or not.

Tom M

A Proud member of the O.F.A.  (Old Farts Association).  Be well, do good work, and keep in touch.® (Garrison Keillor)

Raistmer*
Raistmer*
Joined: 20 Feb 05
Posts: 208
Credit: 179875154
RAC: 113055

I would say it's rather data

I would say it's rather data set (that is, task) dependent than OS/host/device :

 

 

Here one can see switch from "problem" FGRP task to MW (MilkyWay) one.

W/o host reboot or anything else changed. Bus interface load disappeared, GPU usage back to good and sound distortions (I specially run WaveAmp to monitor it) disappeared too.

So,  smth in FGRP task causes very intensive bus communications that slowdown whole system...

 

GPU is NV460SE. Anyone else sees smth alike?

EDIT:

 

And back to FGRP.

After short period of "problems" good operation.

Does smth like Lunatic's test pack exist for E@h FGRP tasks to test them offline? I would like to repeat exactly same task...

 

Betreger
Betreger
Joined: 25 Feb 05
Posts: 987
Credit: 1431858422
RAC: 604224

You are asking a good

You are asking a good question.

Raistmer*
Raistmer*
Joined: 20 Feb 05
Posts: 208
Credit: 179875154
RAC: 113055

To have ability doing offline

To have ability doing offline tests is good thing. For now I did some online ones

Here is the picture of switching from 2 FGRP tasks simultaneously to just one.

Very similar to my issues with some of FGRP tasks....

So, now I think that effects I observe are from GPU memory swapping.

Hence big bus load, sluggish OS as whole and decreased GPU load.

Sometime FGRP task doesn't fit in available GPU RAM on my host (it's 1GB model and time to time GPU-Z shows 940MB load). And driver starts to swap GPU memory to system RAM and back.

Good to know how strongly this influences GPU performance. 4x difference in best and worse times provided FGRP task "compute load" approx the same.

 

Raistmer*
Raistmer*
Joined: 20 Feb 05
Posts: 208
Credit: 179875154
RAC: 113055

Tom M wrote:I am running a

Tom M wrote:

I am running a gtx 1060 3gb and a gtx 1660 Super on this system with Windows 10.  Don't know if it will be helpful or not.

Tom M

And your GPU with much bigger RAM don't experience this issue. (see lower)

There are difference between different bunches of FGRP work though that needed to be accounted.

 

EDIT:

Hm.. and your host results are not so easy to interpret....

Longer result:

18:54:27 (7980): [normal]: % CPU usage: 1.000000, GPU usage: 0.250000

Shorter result:

12:37:07 (12916): [normal]: % CPU usage: 0.500000, GPU usage: 0.333000

So, perhaps 2 different devices and different number of instances also....

 

Your best time in log:

LATeah3002L01_772.0_0_0.0_20132424_2 524130402 17 Feb 2021 10:30:08 UTC 17 Feb 2021 19:12:13 UTC Completed and validated 84 97 3,465

Gamma-ray pulsar binary search #1 on GPUs v1.22 () windows_x86_64

 

12:01:43 (2120): [normal]: % CPU usage: 0.500000, GPU usage: 0.333000

Using OpenCL device "gfx1010" by: Advanced Micro Devices, Inc.
Max allocation limit: 2764046336
Global mem size: 4278190080

 

Worst time:

LATeah3002L03_884.0_0_0.0_6691026_1 526891118 23 Feb 2021 15:32:29 UTC 23 Feb 2021 16:58:45 UTC Completed and validated 2,871 2,866 3,465

Gamma-ray pulsar binary search #1 on GPUs v1.22 () windows_x86_64

 

 

09:32:48 (6008): [normal]: % CPU usage: 0.500000, GPU usage: 0.333000

Using OpenCL device "GeForce GTX 1060 3GB" by: NVIDIA Corporation
Max allocation limit: 805306368
Global mem size: 3221225472

 

Not so uniform as at first glance....

 

I would recommend to look closely with GPU-Z how gtx 1060 3gb behaves running 3 FGRP per card...

 

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.