Lots of work unit/task from “Gravitational Wave search O2 Multi-Directional GPU 2.02 (GW-opencl-ati)” application (Windows x86_64) are having a strange behavior.
They “remain time (estimed)” should be 15 min.
After 1 hour, the progress task reaches 100% but the task doesn’t stop.
The progress task goes to 0% and the “remain time (estimed)” begins to quickly increase.
Yesterday a had to abort 70 work units/tasks because the “remain time (estimed)” was 155 days and the deadline was too close to hope closing the task.
Today I have downloaded new work units/tasks from “Gravitational Wave search O2 Multi-Directional GPU 2.02 (GW-opencl-ati)” application and the behavior is the same.
In 2 days, I’m going to evaluate to abort the WUs.
I think may be some issue about the AMD GPU at Windows x86_64 platform. The work units/tasks I had to abort, they were well processed by NVIDIA GPU.
Executable: einstein_O2MDF_2.02_windows_x86_64__GW-opencl-ati.exe
My computer ID: 12796743
CPU type: Authentic AMD FX(tm)-6350 Six-Core Processor [Family 21 Model 2 Stepping 0]
Number of processors: 6
Coprocessors: AMD Radeon R7 200 Series (2048MB)
Operating system: Microsoft Windows 10 Professional x64 Edition, (10.00.18363.00)
BOINC client version:7.14.2
Thank you
Paolo
Copyright © 2024 Einstein@Home. All rights reserved.
This is an addendum. A work
)
This is an addendum.
A work unit/task of mine has just exited after 1 hour.
Outcome: Computation error
Client state: Compute error
Exit status:197 (0x000000C5) EXIT_TIME_LIMIT_EXCEEDED
Error occured on Sunday, January 12, 2020 at 20:19:46.
C:\ProgramData\BOINC\projects\einstein.phys.uwm.edu\einstein_O2MDF_2.02_windows_x86_64__GW-opencl-ati.exe caused a Breakpoint at location c7930192 in module C:\Windows\System32\KERNELBASE.dll.
Call stack:
C7930192 C:\Windows\System32\KERNELBASE.dll:C7930192 DebugBreak
00634F7A C:\ProgramData\BOINC\projects\einstein.phys.uwm.edu\einstein_O2MDF_2.02_windows_x86_64__GW-opencl-ati.exe:00634F7A
00635A3C C:\ProgramData\BOINC\projects\einstein.phys.uwm.edu\einstein_O2MDF_2.02_windows_x86_64__GW-opencl-ati.exe:00635A3C
C8E27BD4 C:\Windows\System32\KERNEL32.DLL:C8E27BD4 BaseThreadInitThunk
C9DCCED1 C:\Windows\SYSTEM32\ntdll.dll:C9DCCED1 RtlUserThreadStart
I think the exit status 197 is linked to the issue "remain time (estimed)" increases abnormally.
Thank you
Paolo
PaoloNasca
)
What is the exact model of your GPU ?
GPU: Sapphire Radeon R7 240
)
GPU: Sapphire Radeon R7 240
PaoloNasca wrote:GPU:
)
R7 240 has GCN 1.0 architecture. Based on user experiences so far, all AMD cards with GCN 1.0 architecture have been observed to be incompatible with this current 'Gravitational Wave search O2 Multi-Directional GPU' app. That's why that card isn't making any real progress and a task will end up crashing in a way or another.
https://en.wikipedia.org/wiki/Radeon_Rx_200_series
PaoloNasca wrote:Lots of work
)
Unfortunately, other volunteers with GCN 1st gen GPUs are seeing pretty much the same behaviour. It may well be the same for GCN 2nd gen, I simply don't know for sure.
Your card (R7 200 series) would be GCN 1st gen if it were any of these: -> R7 240, 250, 250E, 250X or 265. Only 260 or 260X would be 2nd gen. Over the last few months, several volunteers (myself included) have reported these extremely slow crunch speeds, inevitably ending with TIME_LIMIT_EXCEEDED error messages for both Windows and Linux. The behaviour of reaching an apparent high % done and resetting to 0% is simply because that 'progress' was only 'simulated progress' that got reset to 'real progress' once the very first checkpoint was actually written.
There has been no comment from the Devs. I assume they don't use that particular type of GPU. Your only option at the moment is to opt out of GW GPU tasks (abort any that you have - they will never finish) and use the gamma-ray pulsar search instead. It would be interesting to know if your card was a 260 or 260X as that would confirm that GCN 2nd gen is affected by this problem as well. GCN 4th gen certainly isn't and I suspect GCN 3rd gen may not be either.
EDIT: I hadn't seen either of the two previous messages before posting my response. It did take me a bit of time to compose my response :-).
Cheers,
Gary.
Thank you Richie and Gary
)
Thank you Richie and Gary Roberts.
You have been very understandable.
See you soon.
I can pile on what I see with
)
I can pile on what I see with various GCN1 and TerraScale gpus across windows and linux.
In Windows, I see what the user above sees. Tasks will stop progressing at a low percentage, and remaining time will be in the weeks range.
On linux, initially the task will consume 100% of an actual cpu, and drive the sysload up. Eventually this will lead to one of the cpu cores at 100% sysload along with the einstein task at 100% on in the user% cpu.
I forget what leads me to think this, but somewhere along the way I saw something that lead me to believe a hung bus transfer. Which could be given these old gpu sets and drivers. For now, I will opt out of GW GPU tasks as stated above, but if desired, I will re-instate these tasks for troubleshooting.
EXIT_TIME_LIMIT_EXCEEDED on
)
EXIT_TIME_LIMIT_EXCEEDED on O2MDF with Apple
Hi. I'm seeing 12 recent errors of this, hence temporarily disabling this application:
It's also interesting that previous such WUs completed successfully within about an hour (granting 1000 credits), while FGRPB1G produces 3465 credits within 2271 seconds and also uses much less CPU. I'm also contributing for the science and not for the points, but does that mean that the latter is much more efficient and desirable on my architecture than the former?
On a different note, is it normal that I have 41 pending WUs starting from the 27th of January (all of them O2MDF, most of them V2 of those)?
I'm running on a Radeon Pro 560x 4GB, total credit 431955 almost exclusively from GPU.
e2bc
)
Gravitational Wave work on GPUs need a full CPU core to support it for efficient running, make sure your settings doesn't overload your CPU with work. Try reducing "Use at most: XX% of the processors" to free up cores/threads for GPU support.
It's more a situation of app development and maturity. The FGRP GPU app has been around for quite a while and has had chance to be optimized during this time. This means that it's more efficient so more work can be done in a given time, hence more credit. On the other side the Gravitational Wave GPU app is quite new and there hasn't been time to optimize it yet. It might, and probably also is dependent on the data and the type of analysis to be preformed. How well can this analysis be parallelized.
It's quite normal to have pending tasks. The Gravitational Wave search uses something called Locality scheduling to try and reduce the amount of data one needs to download by trying to send tasks that can make use of already downloaded data files or with a minimum of extra downloads. This sometimes leads to a situation where only a few participants get tasks for a given data file set and thus increasing the time before the second task gets sent out and processed by a wingman. At the moment of writing this I have 100 tasks pending validation. Just be patient and the second task will get sent and processed and the WU validated.
Suggest you set resources to
)
Suggest you set resources to 0.0 for Einstein. if too many file error out you will get banned for 24 hours.
Been there, done that.