issue GW-opencl-ati at Windows x86_64: remain time (estimed) increases abnormally

PaoloNasca
PaoloNasca
Joined: 11 Jul 19
Posts: 4
Credit: 15343960
RAC: 5272
Topic 220421

Lots of work unit/task from “Gravitational Wave search O2 Multi-Directional GPU 2.02 (GW-opencl-ati)” application (Windows x86_64) are having a strange behavior.

They “remain time (estimed)” should be 15 min.
After 1 hour, the progress task reaches 100% but the task doesn’t stop.
The progress task goes to 0% and the “remain time (estimed)” begins to quickly increase.

Yesterday a had to abort 70 work units/tasks because the “remain time (estimed)” was 155 days and the deadline was too close to hope closing the task.

Today I have downloaded new work units/tasks from “Gravitational Wave search O2 Multi-Directional GPU 2.02 (GW-opencl-ati)” application and the behavior is the same.
In 2 days, I’m going to evaluate to abort the WUs.

I think may be some issue about the AMD GPU at Windows x86_64 platform. The work units/tasks I had to abort, they were well processed by NVIDIA GPU.

Executable: einstein_O2MDF_2.02_windows_x86_64__GW-opencl-ati.exe
My computer ID: 12796743
CPU type: Authentic AMD FX(tm)-6350 Six-Core Processor [Family 21 Model 2 Stepping 0]
Number of processors: 6
Coprocessors: AMD Radeon R7 200 Series (2048MB)
Operating system: Microsoft Windows 10 Professional x64 Edition, (10.00.18363.00)
BOINC client version:7.14.2

Thank you
Paolo

PaoloNasca
PaoloNasca
Joined: 11 Jul 19
Posts: 4
Credit: 15343960
RAC: 5272

This is an addendum. A work

This is an addendum.

A work unit/task of mine has just exited after 1 hour.
Outcome: Computation error
Client state: Compute error
Exit status:197 (0x000000C5) EXIT_TIME_LIMIT_EXCEEDED

Error occured on Sunday, January 12, 2020 at 20:19:46.
C:\ProgramData\BOINC\projects\einstein.phys.uwm.edu\einstein_O2MDF_2.02_windows_x86_64__GW-opencl-ati.exe caused a Breakpoint at location c7930192 in module C:\Windows\System32\KERNELBASE.dll.

Call stack:
C7930192 C:\Windows\System32\KERNELBASE.dll:C7930192 DebugBreak
00634F7A C:\ProgramData\BOINC\projects\einstein.phys.uwm.edu\einstein_O2MDF_2.02_windows_x86_64__GW-opencl-ati.exe:00634F7A
00635A3C C:\ProgramData\BOINC\projects\einstein.phys.uwm.edu\einstein_O2MDF_2.02_windows_x86_64__GW-opencl-ati.exe:00635A3C
C8E27BD4 C:\Windows\System32\KERNEL32.DLL:C8E27BD4 BaseThreadInitThunk
C9DCCED1 C:\Windows\SYSTEM32\ntdll.dll:C9DCCED1 RtlUserThreadStart

I think the exit status 197 is linked to the issue "remain time (estimed)" increases abnormally.

Thank you
Paolo

Richie
Richie
Joined: 7 Mar 14
Posts: 656
Credit: 1702989778
RAC: 0

PaoloNasca

PaoloNasca wrote:
Coprocessors: AMD Radeon R7 200 Series (2048MB)

What is the exact model of your GPU ?

PaoloNasca
PaoloNasca
Joined: 11 Jul 19
Posts: 4
Credit: 15343960
RAC: 5272

GPU: Sapphire Radeon R7 240

GPU: Sapphire Radeon R7 240

Richie
Richie
Joined: 7 Mar 14
Posts: 656
Credit: 1702989778
RAC: 0

PaoloNasca wrote:GPU:

PaoloNasca wrote:
GPU: Sapphire Radeon R7 240

R7 240 has GCN 1.0 architecture. Based on user experiences so far, all AMD cards with GCN 1.0 architecture have been observed to be incompatible with this current 'Gravitational Wave search O2 Multi-Directional GPU' app. That's why that card isn't making any real progress and a task will end up crashing in a way or another.

https://en.wikipedia.org/wiki/Radeon_Rx_200_series

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5877
Credit: 118567207585
RAC: 21120439

PaoloNasca wrote:Lots of work

PaoloNasca wrote:
Lots of work unit/task from “Gravitational Wave search O2 Multi-Directional GPU 2.02 (GW-opencl-ati)” application (Windows x86_64) are having a strange behavior.

Unfortunately, other volunteers with GCN 1st gen GPUs are seeing pretty much the same behaviour.  It may well be the same for GCN 2nd gen, I simply don't know for sure.

Your card (R7 200 series) would be GCN 1st gen if it were any of these: -> R7 240, 250, 250E, 250X or 265.  Only 260 or 260X would be 2nd gen.  Over the last few months, several volunteers (myself included) have reported these extremely slow crunch speeds, inevitably ending with TIME_LIMIT_EXCEEDED error messages for both Windows and Linux.  The behaviour of reaching an apparent high % done and resetting to 0% is simply because that 'progress' was only 'simulated progress' that got reset to 'real progress' once the very first checkpoint was actually written.

There has been no comment from the Devs.  I assume they don't use that particular type of GPU.  Your only option at the moment is to opt out of GW GPU tasks (abort any that you have - they will never finish) and use the gamma-ray pulsar search instead.  It would be interesting to know if your card was a 260 or 260X as that would confirm that GCN 2nd gen is affected by this problem as well.  GCN 4th gen certainly isn't and I suspect GCN 3rd gen may not be either.

EDIT: I hadn't seen either of the two previous messages before posting my response.  It did take me a bit of time to compose my response :-).

Cheers,
Gary.

PaoloNasca
PaoloNasca
Joined: 11 Jul 19
Posts: 4
Credit: 15343960
RAC: 5272

Thank you Richie and Gary

Thank you Richie and Gary Roberts.

You have been very understandable.

See you soon.

El Guapo
El Guapo
Joined: 14 Apr 17
Posts: 1
Credit: 458796453
RAC: 0

I can pile on what I see with

I can pile on what I see with various GCN1 and TerraScale gpus across windows and linux.

In Windows, I see what the user above sees.  Tasks will stop progressing at a low percentage, and remaining time will be in the weeks range.

On linux, initially the task will consume 100% of an actual cpu, and drive the sysload up.  Eventually this will lead to one of the cpu cores at 100% sysload along with the einstein task at 100% on in the user% cpu.

I forget what leads me to think this, but somewhere along the way I saw something that lead me to believe a hung bus transfer.  Which could be given these old gpu sets and drivers.  For now, I will opt out of GW GPU tasks as stated above, but if desired, I will re-instate these tasks for troubleshooting.

OpenStreetMapper
OpenStreetMapper
Joined: 3 Jan 20
Posts: 4
Credit: 2387475
RAC: 0

EXIT_TIME_LIMIT_EXCEEDED on

EXIT_TIME_LIMIT_EXCEEDED on O2MDF with Apple

 

Hi. I'm seeing 12 recent errors of this, hence temporarily disabling this application:

"exceeded elapsed time limit 14631.12 (5760000.00G/393.68G)"

Gravitational Wave search O2 Multi-Directional GPU v2.02 (GW-opencl-ati)

 

It's also interesting that previous such WUs completed successfully within about an hour (granting 1000 credits), while FGRPB1G produces 3465 credits within 2271 seconds and also uses much less CPU. I'm also contributing for the science and not for the points, but does that mean that the latter is much more efficient and desirable on my architecture than the former?

On a different note, is it normal that I have 41 pending WUs starting from the 27th of January (all of them O2MDF, most of them V2 of those)? 

I'm running on a Radeon Pro 560x 4GB, total credit 431955 almost exclusively from GPU.

Holmis
Joined: 4 Jan 05
Posts: 1118
Credit: 1055935564
RAC: 0

e2bc

e2bc wrote:

EXIT_TIME_LIMIT_EXCEEDED on O2MDF with Apple

Hi. I'm seeing 12 recent errors of this, hence temporarily disabling this application:

"exceeded elapsed time limit 14631.12 (5760000.00G/393.68G)"

Gravitational Wave search O2 Multi-Directional GPU v2.02 (GW-opencl-ati)

Gravitational Wave work on GPUs need a full CPU core to support it for efficient running, make sure your settings doesn't overload your CPU with work. Try reducing "Use at most: XX% of the processors" to free up cores/threads for GPU support.

Quote:
It's also interesting that previous such WUs completed successfully within about an hour (granting 1000 credits), while FGRPB1G produces 3465 credits within 2271 seconds and also uses much less CPU. I'm also contributing for the science and not for the points, but does that mean that the latter is much more efficient and desirable on my architecture than the former?

It's more a situation of app development and maturity. The FGRP GPU app has been around for quite a while and has had chance to be optimized during this time. This means that it's more efficient so more work can be done in a given time, hence more credit. On the other side the Gravitational Wave GPU app is quite new and there hasn't been time to optimize it yet. It might, and probably also is dependent on the data and the type of analysis to be preformed. How well can this analysis be parallelized.

Quote:
On a different note, is it normal that I have 41 pending WUs starting from the 27th of January (all of them O2MDF, most of them V2 of those)?

It's quite normal to have pending tasks. The Gravitational Wave search uses something called Locality scheduling to try and reduce the amount of data one needs to download by trying to send tasks that can make use of already downloaded data files or with a minimum of extra downloads. This sometimes leads to a situation where only a few participants get tasks for a given data file set and thus increasing the time before the second task gets sent out and processed by a wingman. At the moment of writing this I have 100 tasks pending validation. Just be patient and the second task will get sent and processed and the WU validated.

Joseph Stateson
Joseph Stateson
Joined: 7 May 07
Posts: 174
Credit: 3098931099
RAC: 867279

Suggest you set resources to

Suggest you set resources to 0.0 for Einstein.  if too many file error out you will get banned for 24 hours.

Been there, done that.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.