So it is better to abort all WU with app ver prior 1.05 on AMD hosts?
Just noticed that my two rigs were almost idle on 1.02 for last few days(it downloaded few dozen 1.02 WUs and each takes about 10 hour to crunch with near zero GPU load) . I have aborted them yesterday.
Now got bunch of new WUs with mixed versions: v.1.03, 1.04, 1.05
You may certainly abort tasks that are assigned to app versions 1.02 and 1.03. These will still do most of the computation on the CPU, and thus take rather long.
There is a problem with apparently most AMD drivers with app version 1.04, which apparently isn't fully circumvented in 1.05. These problems occur right at the beginning of the computation, so if your host exhibits ths problem, the tasks will error out immediately and no computing time is wasted.
A version 1.06 is underway that should help with the AMD problem.
... These problems occur right at the beginning of the computation, so if your host exhibits ths problem, the tasks will error out immediately and no computing time is wasted.
There are reports (this one, and the one that follows) about tasks that run successfully to completion but then end up as validate errors?
This is the flavor my box (HP XW4600 Core 2 Duo @ 3 GHz) is currently crunching on. When I checked it this morning, the elapsed time was 10 hrs and 17 mins, with an estimate of 4 hrs and 20 mins left to go. I'll post the initial result (that is, running without error) once the task is completed sometime later today.
Now it finally looks like real GPU app. Computation time reduced about x6-x7 fold - from ~10 hours to ~1.6 hours on AMD RX 570.
Although GPU load still relatively low - in 30-40% range and GPU temp about 15-20 degree lower compared to computation of FGRPB WUs on same GPU. While full load of one CPU core (mostly by amdocl64.dll).
Few WUs per GPU in parallel may fix it - will test later after validation of some WUs computed in solo...
P.S.
And ver. 1.05 was just awfull - it behaves much worse compared to all previous. It shows some GPU load (10-30%), but real procession speed is VERY LOW - projection runtimes on ver. 1.05 is about ONE WEEK: only ~10% was done in first 15-17 hours. It even much slower compared to pure CPU version of the same app (about 18-20 hours on same computer).
So i have aborted them all too.
Please realise that the current discussion is about the new GPU based application that is being introduced into the O2 All-Sky search. The 1.01 version is the old CPU version so your results will probably take a lot longer and not be directly comparable (time wise) to the results of the new 1.06 app using a modern GPU.
Well. There may be a bug in new ver. 1.05 and 1.06 that cause HUGE slowdown on older GPU (or may be older drivers). I see same behavior as described in previous post with ver 1.06.
With older GPU: 2 x AMD HD 7870 (GCN 1.0, OpenCL 1.2)
While it work OK on another: https://einsteinathome.org/host/12204611
With AMD RX 570 (GCN 1.3, OpenCL 2.0)
Probably 1.05 will work OK too, but do not have any of them here to test.
It takes about 1.5 hours to crunch on RX570 and about a week on HD 7870. While hardware speed difference about only 50-70% for this pair of GPUs.
Also noticed minor bug with progress display - it shows fast progress in first ~hour (~up to ~50% done) then revert back to ~0% and proceed extremely slow after. But in stderr.txt i can see what progress was extremely very slow from the beginning (sky dot counter).
So it is better to abort all
)
So it is better to abort all WU with app ver prior 1.05 on AMD hosts?
Just noticed that my two rigs were almost idle on 1.02 for last few days(it downloaded few dozen 1.02 WUs and each takes about 10 hour to crunch with near zero GPU load) . I have aborted them yesterday.
Now got bunch of new WUs with mixed versions: v.1.03, 1.04, 1.05
You may certainly abort tasks
)
You may certainly abort tasks that are assigned to app versions 1.02 and 1.03. These will still do most of the computation on the CPU, and thus take rather long.
There is a problem with apparently most AMD drivers with app version 1.04, which apparently isn't fully circumvented in 1.05. These problems occur right at the beginning of the computation, so if your host exhibits ths problem, the tasks will error out immediately and no computing time is wasted.
A version 1.06 is underway that should help with the AMD problem.
BM
Bernd Machenschalk wrote:...
)
There are reports (this one, and the one that follows) about tasks that run successfully to completion but then end up as validate errors?
Is that behaviour something entirely different?
Cheers,
Gary.
https://einsteinathome.org/de
)
https://einsteinathome.org/de/task/871561138
1.06: Still immediate validate error. Only one WU, no other tasks.
Einstein@Home 1.01
)
Einstein@Home 1.01 Continuous Gravitational Wave search O2 All-Sky h1_0518.90_O2C02Cl1In0__O2AS20-500_519.05Hz_548_0
This is the flavor my box (HP XW4600 Core 2 Duo @ 3 GHz) is currently crunching on. When I checked it this morning, the elapsed time was 10 hrs and 17 mins, with an estimate of 4 hrs and 20 mins left to go. I'll post the initial result (that is, running without error) once the task is completed sometime later today.
Matt
DF1DX
)
Thanks for reporting. This looks more like an issue in the validator than in the App. I'll fix that and re-validate the WUs again.
BM
I'm getting validate on all
)
I'm getting validate errors on all finished WUs so far. 1.04 (GW-opencl-nvidia) on linux. Host: https://einsteinathome.org/host/12649987
Edit: They're now validated.
Got some WUs from latest
)
Got some WUs from latest v.1.06 app.
Now it finally looks like real GPU app. Computation time reduced about x6-x7 fold - from ~10 hours to ~1.6 hours on AMD RX 570.
Although GPU load still relatively low - in 30-40% range and GPU temp about 15-20 degree lower compared to computation of FGRPB WUs on same GPU. While full load of one CPU core (mostly by amdocl64.dll).
Few WUs per GPU in parallel may fix it - will test later after validation of some WUs computed in solo...
P.S.
And ver. 1.05 was just awfull - it behaves much worse compared to all previous. It shows some GPU load (10-30%), but real procession speed is VERY LOW - projection runtimes on ver. 1.05 is about ONE WEEK: only ~10% was done in first 15-17 hours. It even much slower compared to pure CPU version of the same app (about 18-20 hours on same computer).
So i have aborted them all too.
ka1bqp wrote:Einstein@Home
)
Hi Matt,
Please realise that the current discussion is about the new GPU based application that is being introduced into the O2 All-Sky search. The 1.01 version is the old CPU version so your results will probably take a lot longer and not be directly comparable (time wise) to the results of the new 1.06 app using a modern GPU.
Cheers,
Gary.
Well. There may be a bug in
)
Well. There may be a bug in new ver. 1.05 and 1.06 that cause HUGE slowdown on older GPU (or may be older drivers). I see same behavior as described in previous post with ver 1.06.
But only on one of the computers - this one: https://einsteinathome.org/host/9354864
With older GPU: 2 x AMD HD 7870 (GCN 1.0, OpenCL 1.2)
While it work OK on another: https://einsteinathome.org/host/12204611
With AMD RX 570 (GCN 1.3, OpenCL 2.0)
Probably 1.05 will work OK too, but do not have any of them here to test.
It takes about 1.5 hours to crunch on RX570 and about a week on HD 7870. While hardware speed difference about only 50-70% for this pair of GPUs.
Also noticed minor bug with progress display - it shows fast progress in first ~hour (~up to ~50% done) then revert back to ~0% and proceed extremely slow after. But in stderr.txt i can see what progress was extremely very slow from the beginning (sky dot counter).