Diversity in FGRP GPU tasks

Raistmer*

Joined: 20 Feb 05

Posts: 208

Credit: 181355489

RAC: 8680

28 Feb 2021 16:42:57 UTC

Topic 224935

(moderation:

)

Looking closely on FGRP tasks on NV GPU there are big difference in completion times and in GPU load.

For example currently running one has ~99% GPU load, 13% memory controller load and 0% bus load.

While yesterday I saw task with ~60-74% GPU load, ~8% memory controller load and ~30+ % of bus load.

It seems there is some diversity in FGRP tasks parameters. Where to look? What defines their "type" ?

archae86

Joined: 6 Dec 05

Posts: 3157

Credit: 7213984931

RAC: 952180

That is curious, Raistmer.

28 Feb 2021 17:27:28 UTC

Message 183802

(moderation:

)

That is curious, Raistmer. While the current GW GPU tasks certainly vary considerably in total work content, and thus elapsed time, I've believed the FGRP tasks to be remarkably similar each to the next over weeks of time, with the occasional sudden shift with a new batch. The most recent such shift came with tasks with names starting LATeah300, a few weeks ago.

If you have seen big variations in completion times, could it be that your system was simultaneously running competing work of some kind, and that the competition was not matched across your comparison?

Regarding variation in GPU and memory controller utilization, and bus usage, there is a strong variation systematically within each individual task. Startup of the task, the long main portion of the task, and the final phase differ substantially. The final phase is easily defined pretty accurately as that period within which the reported completion stalls at something very slightly below 90%. The first phase is not so simply detected, nor so clean a break, but is readily observed by real-time monitoring of GPU utilization, power consumption, or pretty much any other activity indicator. In all cases, the GPU usage is high and very consistent in the long main portion, and far less to the point of almost being negligible during startup and final phases.

So I wonder for these activity measures whether perhaps you relied on snapshot instantaneous readings, rather than averages over the full task completion?

I admit my experience in the last couple of years is exclusively running these tasks on AMD, not Nvidia, but still I doubt that somehow the Nvidia variants differ so much when the AMD ones differ so very little.

DF1DX

Joined: 14 Aug 10

Posts: 105

Credit: 3816546854

RAC: 4804103

The new series, starting with

28 Feb 2021 17:55:45 UTC

Message 183803

(moderation:

)

The new series, starting with LATeah3003*, are calculated slightly faster on my host (Radeon VII) than the previous ones.

~05:45 versus 06:20 minutes.

IMHO a different method of calculation.

Raistmer*

Joined: 20 Feb 05

Posts: 208

Credit: 181355489

RAC: 8680

Well, that's possibility I

28 Feb 2021 18:19:18 UTC

Message 183804

(moderation:

)

Well, that's possibility I want to rule out - if it's host state difference or difference between tasks.

Here are results from that host.

and

I list smallest and longest times.

Full list here: https://einsteinathome.org/host/12862097/tasks/0/40?page=3&sort=asc&order=CPU%20time

Quite a big difference.

I looked on GPU load on the same GPU task while it was in the middle of execution with 1 idle CPU core and all 4 CPU cores busy - not big difference. GPU app has higher priority and took full CPU core % in both cases.

And I saw finish stage - it quite differs indeed so currently i can distinguish between it and middle-time stage.

archae86

Joined: 6 Dec 05

Posts: 3157

Credit: 7213984931

RAC: 952180

Raistmer* wrote:I looked on

28 Feb 2021 18:35:00 UTC

Message 183806 in response to message 183804

(moderation:

)

Raistmer* wrote:

I looked on GPU load on the same GPU task while it was in the middle of execution with 1 idle CPU core and all 4 CPU cores busy - not big difference. GPU app has higher priority and took full CPU core % in both cases.

The build path used for NVidia GPU applications here at Einstein running under openCL (not CUDA) uses a coordination mechanism that relies on the CPU support task running a continuous polling loop. So it will always use 100% of a core unless the OS takes the slot away from it. Even though, overwhelmingly often, a given trip around the polling loop is performing no useful work, any delay in honoring a request for CPU services, once it has been posted, increases the elapsed time for the task. So competing work of any kind (BOINC or not) can matter.

You show almost a 4:1 difference between the low and the high end of elapsed times for what I would expect to be tasks extremely similar in actual work content. Perhaps someone else here with Nvidia hardware and the same general OS as yours could comment on how your observations compare to theirs.

Raistmer*

Joined: 20 Feb 05

Posts: 208

Credit: 181355489

RAC: 8680

Yeah, I would like some

28 Feb 2021 18:51:25 UTC

Message 183811

(moderation:

)

Yeah, I would like some additional expertise in this matter. This particular host has (sometime) very strong issues with sound while GPU crunching. Sometime not. Currently - not, and 99% GPU utilization currently..

How all this connected don't know for now, need more observations (most time these days BOINC just crunch and I just snooze it if it interferes w/o looking in details)

Tom M

Joined: 2 Feb 06

Posts: 6416

Credit: 9547460319

RAC: 14441552

I am running a gtx 1060 3gb

1 Mar 2021 3:54:46 UTC

Message 183827

(moderation:

)

I am running a gtx 1060 3gb and a gtx 1660 Super on this system with Windows 10. Don't know if it will be helpful or not.

Tom M

A Proud member of the O.F.A. (Old Farts Association). Be well, do good work, and keep in touch.® (Garrison Keillor) I want some more patience. RIGHT NOW!

Raistmer*

Joined: 20 Feb 05

Posts: 208

Credit: 181355489

RAC: 8680

I would say it's rather data

1 Mar 2021 8:15:50 UTC

Message 183833

(moderation:

)

I would say it's rather data set (that is, task) dependent than OS/host/device :

Here one can see switch from "problem" FGRP task to MW (MilkyWay) one.

W/o host reboot or anything else changed. Bus interface load disappeared, GPU usage back to good and sound distortions (I specially run WaveAmp to monitor it) disappeared too.

So, smth in FGRP task causes very intensive bus communications that slowdown whole system...

GPU is NV460SE. Anyone else sees smth alike?

EDIT:

And back to FGRP.

After short period of "problems" good operation.

Does smth like Lunatic's test pack exist for E@h FGRP tasks to test them offline? I would like to repeat exactly same task...

Betreger

Joined: 25 Feb 05

Posts: 992

Credit: 1583279136

RAC: 751163

You are asking a good

1 Mar 2021 10:46:16 UTC

Message 183837 in response to message 183833

(moderation:

)

You are asking a good question.

Raistmer*

Joined: 20 Feb 05

Posts: 208

Credit: 181355489

RAC: 8680

To have ability doing offline

1 Mar 2021 11:17:59 UTC

Message 183839

(moderation:

)

To have ability doing offline tests is good thing. For now I did some online ones

Here is the picture of switching from 2 FGRP tasks simultaneously to just one.

Very similar to my issues with some of FGRP tasks....

So, now I think that effects I observe are from GPU memory swapping.

Hence big bus load, sluggish OS as whole and decreased GPU load.

Sometime FGRP task doesn't fit in available GPU RAM on my host (it's 1GB model and time to time GPU-Z shows 940MB load). And driver starts to swap GPU memory to system RAM and back.

Good to know how strongly this influences GPU performance. 4x difference in best and worse times provided FGRP task "compute load" approx the same.

Raistmer*

Joined: 20 Feb 05

Posts: 208

Credit: 181355489

RAC: 8680

Tom M wrote:I am running a

1 Mar 2021 11:38:59 UTC

Message 183840 in response to message 183827

(moderation:

)

Tom M wrote:

I am running a gtx 1060 3gb and a gtx 1660 Super on this system with Windows 10. Don't know if it will be helpful or not.

Tom M

~~And your GPU with much bigger RAM don't experience this issue.~~ (see lower)

There are difference between different bunches of FGRP work though that needed to be accounted.

EDIT:

Hm.. and your host results are not so easy to interpret....

Longer result:

18:54:27 (7980): [normal]: % CPU usage: 1.000000, GPU usage: 0.250000

Shorter result:

12:37:07 (12916): [normal]: % CPU usage: 0.500000, GPU usage: 0.333000

So, perhaps 2 different devices and different number of instances also....

Your best time in log:

LATeah3002L01_772.0_0_0.0_20132424_2

524130402

17 Feb 2021 10:30:08 UTC

17 Feb 2021 19:12:13 UTC

Completed and validated

3,465

Gamma-ray pulsar binary search #1 on GPUs v1.22 () windows_x86_64

12:01:43 (2120): [normal]: % CPU usage: 0.500000, GPU usage: 0.333000

Using OpenCL device "gfx1010" by: Advanced Micro Devices, Inc.
Max allocation limit: 2764046336
Global mem size: 4278190080

Worst time:

LATeah3002L03_884.0_0_0.0_6691026_1

526891118

23 Feb 2021 15:32:29 UTC

23 Feb 2021 16:58:45 UTC

Completed and validated

2,871

2,866

3,465

Gamma-ray pulsar binary search #1 on GPUs v1.22 () windows_x86_64

09:32:48 (6008): [normal]: % CPU usage: 0.500000, GPU usage: 0.333000

Using OpenCL device "GeForce GTX 1060 3GB" by: NVIDIA Corporation
Max allocation limit: 805306368
Global mem size: 3221225472

Not so uniform as at first glance....

I would recommend to look closely with GPU-Z how gtx 1060 3gb behaves running 3 FGRP per card...

Diversity in FGRP GPU tasks

Forums › Cruncher's Corner

Comment viewing options

Forums › Cruncher's Corner