Looking closely on FGRP tasks on NV GPU there are big difference in completion times and in GPU load.
For example currently running one has ~99% GPU load, 13% memory controller load and 0% bus load.
While yesterday I saw task with ~60-74% GPU load, ~8% memory controller load and ~30+ % of bus load.
It seems there is some diversity in FGRP tasks parameters. Where to look? What defines their "type" ?
Copyright © 2024 Einstein@Home. All rights reserved.
That is curious, Raistmer.
)
That is curious, Raistmer. While the current GW GPU tasks certainly vary considerably in total work content, and thus elapsed time, I've believed the FGRP tasks to be remarkably similar each to the next over weeks of time, with the occasional sudden shift with a new batch. The most recent such shift came with tasks with names starting LATeah300, a few weeks ago.
If you have seen big variations in completion times, could it be that your system was simultaneously running competing work of some kind, and that the competition was not matched across your comparison?
Regarding variation in GPU and memory controller utilization, and bus usage, there is a strong variation systematically within each individual task. Startup of the task, the long main portion of the task, and the final phase differ substantially. The final phase is easily defined pretty accurately as that period within which the reported completion stalls at something very slightly below 90%. The first phase is not so simply detected, nor so clean a break, but is readily observed by real-time monitoring of GPU utilization, power consumption, or pretty much any other activity indicator. In all cases, the GPU usage is high and very consistent in the long main portion, and far less to the point of almost being negligible during startup and final phases.
So I wonder for these activity measures whether perhaps you relied on snapshot instantaneous readings, rather than averages over the full task completion?
I admit my experience in the last couple of years is exclusively running these tasks on AMD, not Nvidia, but still I doubt that somehow the Nvidia variants differ so much when the AMD ones differ so very little.
The new series, starting with
)
The new series, starting with LATeah3003*, are calculated slightly faster on my host (Radeon VII) than the previous ones.
~05:45 versus 06:20 minutes.
IMHO a different method of calculation.
Well, that's possibility I
)
Well, that's possibility I want to rule out - if it's host state difference or difference between tasks.
Here are results from that host.
and
I list smallest and longest times.
Full list here: https://einsteinathome.org/host/12862097/tasks/0/40?page=3&sort=asc&order=CPU%20time
Quite a big difference.
I looked on GPU load on the same GPU task while it was in the middle of execution with 1 idle CPU core and all 4 CPU cores busy - not big difference. GPU app has higher priority and took full CPU core % in both cases.
And I saw finish stage - it quite differs indeed so currently i can distinguish between it and middle-time stage.
Raistmer* wrote:I looked on
)
The build path used for NVidia GPU applications here at Einstein running under openCL (not CUDA) uses a coordination mechanism that relies on the CPU support task running a continuous polling loop. So it will always use 100% of a core unless the OS takes the slot away from it. Even though, overwhelmingly often, a given trip around the polling loop is performing no useful work, any delay in honoring a request for CPU services, once it has been posted, increases the elapsed time for the task. So competing work of any kind (BOINC or not) can matter.
You show almost a 4:1 difference between the low and the high end of elapsed times for what I would expect to be tasks extremely similar in actual work content. Perhaps someone else here with Nvidia hardware and the same general OS as yours could comment on how your observations compare to theirs.
Yeah, I would like some
)
Yeah, I would like some additional expertise in this matter. This particular host has (sometime) very strong issues with sound while GPU crunching. Sometime not. Currently - not, and 99% GPU utilization currently..
How all this connected don't know for now, need more observations (most time these days BOINC just crunch and I just snooze it if it interferes w/o looking in details)
I am running a gtx 1060 3gb
)
I am running a gtx 1060 3gb and a gtx 1660 Super on this system with Windows 10. Don't know if it will be helpful or not.
Tom M
A Proud member of the O.F.A. (Old Farts Association). Be well, do good work, and keep in touch.® (Garrison Keillor)
I would say it's rather data
)
I would say it's rather data set (that is, task) dependent than OS/host/device :
Here one can see switch from "problem" FGRP task to MW (MilkyWay) one.
W/o host reboot or anything else changed. Bus interface load disappeared, GPU usage back to good and sound distortions (I specially run WaveAmp to monitor it) disappeared too.
So, smth in FGRP task causes very intensive bus communications that slowdown whole system...
GPU is NV460SE. Anyone else sees smth alike?
EDIT:
And back to FGRP.
After short period of "problems" good operation.
Does smth like Lunatic's test pack exist for E@h FGRP tasks to test them offline? I would like to repeat exactly same task...
You are asking a good
)
You are asking a good question.
To have ability doing offline
)
To have ability doing offline tests is good thing. For now I did some online ones
Here is the picture of switching from 2 FGRP tasks simultaneously to just one.
Very similar to my issues with some of FGRP tasks....
So, now I think that effects I observe are from GPU memory swapping.
Hence big bus load, sluggish OS as whole and decreased GPU load.
Sometime FGRP task doesn't fit in available GPU RAM on my host (it's 1GB model and time to time GPU-Z shows 940MB load). And driver starts to swap GPU memory to system RAM and back.
Good to know how strongly this influences GPU performance. 4x difference in best and worse times provided FGRP task "compute load" approx the same.
Tom M wrote:I am running a
)
And your GPU with much bigger RAM don't experience this issue.(see lower)There are difference between different bunches of FGRP work though that needed to be accounted.
EDIT:
Hm.. and your host results are not so easy to interpret....
Longer result:
Shorter result:
So, perhaps 2 different devices and different number of instances also....
Your best time in log:
Gamma-ray pulsar binary search #1 on GPUs v1.22 () windows_x86_64
Worst time:
Gamma-ray pulsar binary search #1 on GPUs v1.22 () windows_x86_64
Not so uniform as at first glance....
I would recommend to look closely with GPU-Z how gtx 1060 3gb behaves running 3 FGRP per card...