Multi-Directional Gravitational Wave Search on O3 data (O3MD1/F)

Michael Goetz

Joined: 11 Feb 05

Posts: 21

Credit: 3067690

RAC: 0

Bernd Machenschalk wrote: We

5 Jan 2023 11:43:11 UTC

Message 206164 in response to message 206162

(moderation:

)

Bernd Machenschalk wrote:

We analyzed the problem and think we fixed it. We issued Windows app version 1.03, currently for Beta test.

It looks to be fixed. Tasks are getting completed and validated now. Thanks!

Want to find one of the largest known primes? Try PrimeGrid. Or help cure disease at WCG.

Bernd Machenschalk

Moderator

Administrator

Joined: 15 Oct 04

Posts: 4350

Credit: 253896808

RAC: 35358

Richard Haselgrove

5 Jan 2023 16:24:00 UTC

Message 206171 in response to message 206163

(moderation:

)

Richard Haselgrove wrote:

Host 1001562 is now running these, and has returned the first task - successful, but not yet validated.

This is one of the machines which started throwing errors when it reached 0414.20: the completed task is from that group, and many previous replications have failed. So the signs are good.

Yep - 414.20Hz is exactly where the numerical overflow happens. The problem is basically that C(99) specifies only minimal widths for datatypes such as "long". When compiling for 64Bit systems, Linux and MacOS silently use 64Bit "long"s, while Windows still uses the minimal specified width here (32Bit). I'm not entirely sure whether this is in the compiler or the runtime math library, but anyway - using a "long long" here (or more precisely: llround() instead of lround()) fixed it. Was pretty hard to track down, though.

archae86

Joined: 6 Dec 05

Posts: 3165

Credit: 7406501687

RAC: 1930192

Of my three machines, one had

5 Jan 2023 16:36:31 UTC

Message 206173

(moderation:

)

Of my three machines, one had 100% fast error rate to work it was issued in early and mid-December. So these were not the 412+ units that troubled others.

https://einsteinathome.org/host/12260865

Happily, this machine has now processed and returned nine WUs today of 477.8 frequency, of which six have already validated. So it appears that the "fixed" application has addressed whatever problem this machine was having with the work. Or possibly the problem it had before was specific to another frequency range.

My other two machines have each processed and returned several tasks newly sent to them. In these two cases the tasks were generally _9 or _10 reissues of 414.n or 420.n tasks which had already failed on several other machines--so just completing them is good news. No validations from these yet, as they appear to be awaiting quorum partner returns.

[AF>EDLS]zOU

Joined: 5 May 15

Posts: 80

Credit: 389867168

RAC: 145622

Cool, i've enabled the app

5 Jan 2023 17:15:19 UTC

Message 206174

(moderation:

)

Cool, i've enabled the app again, we'll see how it goes .

Thank you for the quick fix team !

Jesse Viviano

Joined: 8 Jun 05

Posts: 33

Credit: 133045917

RAC: 0

Bernd Machenschalk

6 Jan 2023 7:26:40 UTC

Message 206208 in response to message 206171

(moderation:

)

Bernd Machenschalk wrote:

Richard Haselgrove wrote:

Host 1001562 is now running these, and has returned the first task - successful, but not yet validated.

This is one of the machines which started throwing errors when it reached 0414.20: the completed task is from that group, and many previous replications have failed. So the signs are good.

Yep - 414.20Hz is exactly where the numerical overflow happens. The problem is basically that C(99) specifies only minimal widths for datatypes such as "long". When compiling for 64Bit systems, Linux and MacOS silently use 64Bit "long"s, while Windows still uses the minimal specified width here (32Bit). I'm not entirely sure whether this is in the compiler or the runtime math library, but anyway - using a "long long" here (or more precisely: llround() instead of lround()) fixed it. Was pretty hard to track down, though.

C99 and later have the header files inttypes.h and stdint.h that allows you to have consistent cross-platform integer types as seen in https://en.wikipedia.org/wiki/C_data_types#inttypes.h and https://stackoverflow.com/questions/7597025/difference-between-stdint-h-and-inttypes-h . The C++ counterparts to inttypes.h and stdint.h are cinttypes and cstdint respectively.

Bernd Machenschalk

Moderator

Administrator

Joined: 15 Oct 04

Posts: 4350

Credit: 253896808

RAC: 35358

I know. LALSuite also has its

6 Jan 2023 10:01:00 UTC

Message 206210 in response to message 206208

(moderation:

)

I know. LALSuite also has its own deterministic size types that we use. However, the 'long' here is not in our code. It is in the definition of standard math functions like lround(). These operate on and return 'long's, whatever that is on the current platform. If 'long' is 32Bit and you use lround() on a (double precision) value that is too large, you get a numerical overflow. This throws a "floating point exception" (FPE), which you have seen on NVidia. Or it leads to absurd values in the following, which you saw as "input domain error" on AMD, because apparently the AMD OpenCL driver disables FPEs (which is not a good thing IMO).

Conan

Joined: 19 Jun 05

Posts: 172

Credit: 8877826

RAC: 7820

With O3MDF now fixed

10 Jan 2023 6:25:25 UTC

Message 206396

(moderation:

)

With O3MDF now fixed (hopefully), is it possible to get O3MD1 CPU work units going again, there is none in the queue and hasn't been for awhile.

I would like to run some more of them please.

Conan

[AF>EDLS]zOU

Joined: 5 May 15

Posts: 80

Credit: 389867168

RAC: 145622

Got a few tasks in error

10 Jan 2023 20:38:46 UTC

Message 206419

(moderation:

)

Got a few tasks in error still:

https://einsteinathome.org/host/12769171/tasks/6/0

Ian&Steve C.

Joined: 19 Jan 20

Posts: 4159

Credit: 50340161542

RAC: 41862673

[AF>EDLS wrote:zOU] Got a

10 Jan 2023 21:22:46 UTC

Message 206421 in response to message 206419

(moderation:

)

[AF>EDLS wrote:

zOU]

Got a few tasks in error still:

https://einsteinathome.org/host/12769171/tasks/6/0

how many tasks are you trying to run at once?

you’re getting :

CL_MEM_OBJECT_ALLOCATION_FAILURE

which means you’re running out of GPU memory.

_________________________________________________________________________

Werinbert

Joined: 31 Dec 12

Posts: 20

Credit: 100156387

RAC: 0

I was getting GPU tasks just

11 Jan 2023 4:34:47 UTC

Message 206429

(moderation:

)

I was getting GPU tasks just fine up until early this morning now I am not getting any tasks at all. All updates that I attempt show in the Boinc log "no tasks sent" but no mention of tasks not available. I haven't changed any settings, so is there something else that change to prevent me getting tasks?

Multi-Directional Gravitational Wave Search on O3 data (O3MD1/F)

Forums › Technical News

Comment viewing options

Forums › Technical News