issue GW-opencl-ati at Windows x86_64: remain time (estimed) increases abnormally

PaoloNasca

Joined: 11 Jul 19

Posts: 4

Credit: 15539784

RAC: 4641

12 Jan 2020 19:37:36 UTC

Topic 220421

(moderation:

)

Lots of work unit/task from “Gravitational Wave search O2 Multi-Directional GPU 2.02 (GW-opencl-ati)” application (Windows x86_64) are having a strange behavior.

They “remain time (estimed)” should be 15 min.
After 1 hour, the progress task reaches 100% but the task doesn’t stop.
The progress task goes to 0% and the “remain time (estimed)” begins to quickly increase.

Yesterday a had to abort 70 work units/tasks because the “remain time (estimed)” was 155 days and the deadline was too close to hope closing the task.

Today I have downloaded new work units/tasks from “Gravitational Wave search O2 Multi-Directional GPU 2.02 (GW-opencl-ati)” application and the behavior is the same.
In 2 days, I’m going to evaluate to abort the WUs.

I think may be some issue about the AMD GPU at Windows x86_64 platform. The work units/tasks I had to abort, they were well processed by NVIDIA GPU.

Executable: einstein_O2MDF_2.02_windows_x86_64__GW-opencl-ati.exe
My computer ID: 12796743
CPU type: Authentic AMD FX(tm)-6350 Six-Core Processor [Family 21 Model 2 Stepping 0]
Number of processors: 6
Coprocessors: AMD Radeon R7 200 Series (2048MB)
Operating system: Microsoft Windows 10 Professional x64 Edition, (10.00.18363.00)
BOINC client version:7.14.2

Thank you
Paolo

PaoloNasca

Joined: 11 Jul 19

Posts: 4

Credit: 15539784

RAC: 4641

This is an addendum. A work

12 Jan 2020 19:58:52 UTC

Message 175176

(moderation:

)

This is an addendum.

A work unit/task of mine has just exited after 1 hour.
Outcome: Computation error
Client state: Compute error
Exit status:197 (0x000000C5) EXIT_TIME_LIMIT_EXCEEDED

Error occured on Sunday, January 12, 2020 at 20:19:46.
C:\ProgramData\BOINC\projects\einstein.phys.uwm.edu\einstein_O2MDF_2.02_windows_x86_64__GW-opencl-ati.exe caused a Breakpoint at location c7930192 in module C:\Windows\System32\KERNELBASE.dll.

Call stack:
C7930192 C:\Windows\System32\KERNELBASE.dll:C7930192 DebugBreak
00634F7A C:\ProgramData\BOINC\projects\einstein.phys.uwm.edu\einstein_O2MDF_2.02_windows_x86_64__GW-opencl-ati.exe:00634F7A
00635A3C C:\ProgramData\BOINC\projects\einstein.phys.uwm.edu\einstein_O2MDF_2.02_windows_x86_64__GW-opencl-ati.exe:00635A3C
C8E27BD4 C:\Windows\System32\KERNEL32.DLL:C8E27BD4 BaseThreadInitThunk
C9DCCED1 C:\Windows\SYSTEM32\ntdll.dll:C9DCCED1 RtlUserThreadStart

I think the exit status 197 is linked to the issue "remain time (estimed)" increases abnormally.

Thank you
Paolo

Richie

Joined: 7 Mar 14

Posts: 656

Credit: 1702989778

RAC: 0

PaoloNasca

12 Jan 2020 20:38:08 UTC

Message 175177

(moderation:

)

PaoloNasca wrote:

Coprocessors: AMD Radeon R7 200 Series (2048MB)

What is the exact model of your GPU ?

PaoloNasca

Joined: 11 Jul 19

Posts: 4

Credit: 15539784

RAC: 4641

GPU: Sapphire Radeon R7 240

12 Jan 2020 22:34:49 UTC

Message 175181

(moderation:

)

GPU: Sapphire Radeon R7 240

Richie

Joined: 7 Mar 14

Posts: 656

Credit: 1702989778

RAC: 0

PaoloNasca wrote:GPU:

12 Jan 2020 22:45:47 UTC

Message 175183 in response to message 175181

(moderation:

)

PaoloNasca wrote:

GPU: Sapphire Radeon R7 240

R7 240 has GCN 1.0 architecture. Based on user experiences so far, all AMD cards with GCN 1.0 architecture have been observed to be incompatible with this current 'Gravitational Wave search O2 Multi-Directional GPU' app. That's why that card isn't making any real progress and a task will end up crashing in a way or another.

https://en.wikipedia.org/wiki/Radeon_Rx_200_series

Gary Roberts

Moderator

Joined: 9 Feb 05

Posts: 5887

Credit: 119552824818

RAC: 24792851

PaoloNasca wrote:Lots of work

12 Jan 2020 22:53:00 UTC

Message 175184

(moderation:

)

PaoloNasca wrote:

Lots of work unit/task from “Gravitational Wave search O2 Multi-Directional GPU 2.02 (GW-opencl-ati)” application (Windows x86_64) are having a strange behavior.

Unfortunately, other volunteers with GCN 1st gen GPUs are seeing pretty much the same behaviour. It may well be the same for GCN 2nd gen, I simply don't know for sure.

Your card (R7 200 series) would be GCN 1st gen if it were any of these: -> R7 240, 250, 250E, 250X or 265. Only 260 or 260X would be 2nd gen. Over the last few months, several volunteers (myself included) have reported these extremely slow crunch speeds, inevitably ending with TIME_LIMIT_EXCEEDED error messages for both Windows and Linux. The behaviour of reaching an apparent high % done and resetting to 0% is simply because that 'progress' was only 'simulated progress' that got reset to 'real progress' once the very first checkpoint was actually written.

There has been no comment from the Devs. I assume they don't use that particular type of GPU. Your only option at the moment is to opt out of GW GPU tasks (abort any that you have - they will never finish) and use the gamma-ray pulsar search instead. It would be interesting to know if your card was a 260 or 260X as that would confirm that GCN 2nd gen is affected by this problem as well. GCN 4th gen certainly isn't and I suspect GCN 3rd gen may not be either.

EDIT: I hadn't seen either of the two previous messages before posting my response. It did take me a bit of time to compose my response :-).

Cheers,
Gary.

PaoloNasca

Joined: 11 Jul 19

Posts: 4

Credit: 15539784

RAC: 4641

Thank you Richie and Gary

12 Jan 2020 23:08:48 UTC

Message 175185

(moderation:

)

Thank you Richie and Gary Roberts.

You have been very understandable.

See you soon.

El Guapo

Joined: 14 Apr 17

Posts: 1

Credit: 458796453

RAC: 0

I can pile on what I see with

15 Jan 2020 14:06:25 UTC

Message 175221

(moderation:

)

I can pile on what I see with various GCN1 and TerraScale gpus across windows and linux.

In Windows, I see what the user above sees. Tasks will stop progressing at a low percentage, and remaining time will be in the weeks range.

On linux, initially the task will consume 100% of an actual cpu, and drive the sysload up. Eventually this will lead to one of the cpu cores at 100% sysload along with the einstein task at 100% on in the user% cpu.

I forget what leads me to think this, but somewhere along the way I saw something that lead me to believe a hung bus transfer. Which could be given these old gpu sets and drivers. For now, I will opt out of GW GPU tasks as stated above, but if desired, I will re-instate these tasks for troubleshooting.

OpenStreetMapper

Joined: 3 Jan 20

Posts: 4

Credit: 2387475

RAC: 0

EXIT_TIME_LIMIT_EXCEEDED on

10 Feb 2020 19:37:32 UTC

Message 175590

(moderation:

)

EXIT_TIME_LIMIT_EXCEEDED on O2MDF with Apple

Hi. I'm seeing 12 recent errors of this, hence temporarily disabling this application:

"exceeded elapsed time limit 14631.12 (5760000.00G/393.68G)"

Gravitational Wave search O2 Multi-Directional GPU v2.02 (GW-opencl-ati)

It's also interesting that previous such WUs completed successfully within about an hour (granting 1000 credits), while FGRPB1G produces 3465 credits within 2271 seconds and also uses much less CPU. I'm also contributing for the science and not for the points, but does that mean that the latter is much more efficient and desirable on my architecture than the former?

On a different note, is it normal that I have 41 pending WUs starting from the 27th of January (all of them O2MDF, most of them V2 of those)?

I'm running on a Radeon Pro 560x 4GB, total credit 431955 almost exclusively from GPU.

Holmis

Joined: 4 Jan 05

Posts: 1118

Credit: 1055935564

RAC: 0

e2bc

10 Feb 2020 21:18:00 UTC

Message 175592 in response to message 175590

(moderation:

)

e2bc wrote:

EXIT_TIME_LIMIT_EXCEEDED on O2MDF with Apple

Hi. I'm seeing 12 recent errors of this, hence temporarily disabling this application:

"exceeded elapsed time limit 14631.12 (5760000.00G/393.68G)"

Gravitational Wave search O2 Multi-Directional GPU v2.02 (GW-opencl-ati)

Gravitational Wave work on GPUs need a full CPU core to support it for efficient running, make sure your settings doesn't overload your CPU with work. Try reducing "Use at most: XX% of the processors" to free up cores/threads for GPU support.

Quote:

It's also interesting that previous such WUs completed successfully within about an hour (granting 1000 credits), while FGRPB1G produces 3465 credits within 2271 seconds and also uses much less CPU. I'm also contributing for the science and not for the points, but does that mean that the latter is much more efficient and desirable on my architecture than the former?

It's more a situation of app development and maturity. The FGRP GPU app has been around for quite a while and has had chance to be optimized during this time. This means that it's more efficient so more work can be done in a given time, hence more credit. On the other side the Gravitational Wave GPU app is quite new and there hasn't been time to optimize it yet. It might, and probably also is dependent on the data and the type of analysis to be preformed. How well can this analysis be parallelized.

Quote:

On a different note, is it normal that I have 41 pending WUs starting from the 27th of January (all of them O2MDF, most of them V2 of those)?

It's quite normal to have pending tasks. The Gravitational Wave search uses something called Locality scheduling to try and reduce the amount of data one needs to download by trying to send tasks that can make use of already downloaded data files or with a minimum of extra downloads. This sometimes leads to a situation where only a few participants get tasks for a given data file set and thus increasing the time before the second task gets sent out and processed by a wingman. At the moment of writing this I have 100 tasks pending validation. Just be patient and the second task will get sent and processed and the WU validated.

Joseph Stateson

Joined: 7 May 07

Posts: 174

Credit: 3136058102

RAC: 912478

Suggest you set resources to

10 Feb 2020 21:44:57 UTC

Message 175594

(moderation:

)

Suggest you set resources to 0.0 for Einstein. if too many file error out you will get banned for 24 hours.

Been there, done that.

issue GW-opencl-ati at Windows x86_64: remain time (estimed) increases abnormally

Forums › Problems and Bug Reports

Comment viewing options

Forums › Problems and Bug Reports