Hi,
after a long break got back to crunching due to Android Client, which re-sparked my interest.
While I was at it, I saw GPU clients available as well, so I tossed it at my new HD7970.
Although it runs also overnight, its predicted runtime is now something like 4 days (!) - which tells me either this is a true monster-workunit (unlikely) or a defective one ?
It's been several years since I used GPU clients on the very first BOINC GPU projects (those were very early developments prone to some anomalies), hence I'm asking if that's a WorkUnit rather to be aborted (??)
Copyright © 2024 Einstein@Home. All rights reserved.
Expected runtimes for Radeon HD7970
)
My 7970 completes BRP4G units in 40-50 minutes. And that's with 3 running at the same time and my iGPU slowing down the 7970 as well. No way it should take you any longer unless the workunit is defective or your 7970 is slowed down somehow.
Hm, these are more like the
)
Hm, these are more like the figures I expected...
Guess I'd rather abort that Workunit then and see how the others clock in...
-- edit --
I think I found a plausible reason for the slow performance.
It seems the GPU task never invokes the GPU/Mem Clocks into high performance mode.
I can see it chugging along basically using more or less standby clockrates even during phases where it's loaded with an GPU task (?)
I had a quick look... could
)
I had a quick look... could it be only crunching CPU tasks as i could not see any PAS or BRP WUs?
It's running but seems I
)
It's running but seems I found the problem.
The executable einsteinbinary_BRP4G_1.39_windows_x86_64__BRP4G-opencl-ati does not get the assigned CPU time.
I've changed the client_info.xml from reading avg_ncpus 0.5 to 0.9 and yet it seemed not to make any difference.
Only when I change it to 1.0 (effectively blocking one CPU, thus reducing computation from 4 CPU + 1 GPU cores to 3 CPU + 1GPU/1CPU fixed assigned cores) the performance really takes off as expected (?)
BOINC version is the latest (7.0.64 64bit).
Re-Installing the Video Driver package made no difference as well. Seems the distribution of CPU time slots doesn't work very well on my system (??)
Anyway, at least the by far most performant part of the PC is running at max. performance. I may have to edit the client_info.xml to reflect one additional CPU, in order to fully use my 4th CPU core (which now exclusively ships data for the GPU to process, leaving 12-15% unused potential)
Yes you need to free up a cpu
)
Yes you need to free up a cpu core for the gpu. Normally each workunit occupies 0.5cpu cores. Boinc rounds that to 0. An other way to optimize your crunching efficiency is to run multiple workunits in parallel.
When you run N workunits in parallel on the gpu boinc frees up N times 0.5 cpu cores. So 2 workunits in parallel free 1 cpu core, 4 in parallel free 2 cpu cores and so on.
There is no editing of client_info.xml required: For running multiple wu in parallel see the parameter "GPU utilization factor of BRP apps" in your einstein@home preferences. Setting it to 1/N changes the number of parallel running workunits to N.
Ah, copy that, I hadn't
)
Ah, copy that, I hadn't checked out the Einstein Preferences page yet.
I've undone my changes to the client_info.xml and set the mentioned factor to 0.5 in order to run 2 GPU tasks parallel now. Let's see if that helps :)
-- edit --
Although there are several GPU Workunits in queue and manually updated project to fetch the new preferences - I'm now seeing again 4 CPU tasks blocking everything and only 1 GPU task chugging along at the intial minimum speed that brought be here :(
Feels almost like home, the early days where only client_info.xml tweaks made BOINC work the way it was needed and GPU crunching was highly experimental...
RE: Although there are
)
Patients. The change will only take effect after the client has communicated with the server (report finished tasks / receive new).
I manually did that twice (I
)
I manually did that twice (I assume that still works) - all to no avail.
I've manually checked the .xml File, and the factor 0.5 is stored there. The only problem : it has no effect (?)
For now, the only way to get a good utilization of my GPU is to set its avg_ncpus from 0.5 to 1.0, effectively blocking one CPU core for other tasks. Still then, it only computes one task at a time, despite the changed preferences factor which (to my understanding) should now run two instances.
In short, things don't function as expected, leaving the GPU almost idle by default without manual intervention.
I remember CPU time sharing worked alot smoother in earlier BOINC versions (back then, only the manual insertion of optimized CPU clients and GPU clients required manual .xml editing, I used the first optimized SIMD CPU clients and GPUs for very early Alpha/Beta Clients on SETI back then).
Weird, maybe I should let BOINC run dry and re-install it.
Ha, for whatever reason it
)
Ha, for whatever reason it indeed picked up the factor option overnight.
So I guess for now this is as good as it gets, 3 CPU + 2 GPU Clients working.
Thanks for the factoring tip :)
RE: Ha, for whatever reason
)
As Logforme said the factor only gets applied when new GPU work is downloaded, so you probably got some new tasks overnight.
The problem with low utilization of the GPU when running OpenCL programs has been discussed before and seem to be a driver issue, the advice is to free a CPU core or maybe two depending on how many tasks run on the GPU. The performance boost from the GPU should outweigh the loss from freeing one core.