Multi-Directional Gravitational Wave Search on O3 data (O3MD1/F)

Michael Goetz
Joined: 11 Feb 05
Posts: 21
Credit: 3067690
RAC: 7

Bernd Machenschalk wrote: We

Bernd Machenschalk wrote:

We analyzed the problem and think we fixed it. We issued Windows app version 1.03, currently for Beta test.

It looks to be fixed.  Tasks are getting completed and validated now.  Thanks!

Want to find one of the largest known primes? Try PrimeGrid. Or help cure disease at WCG.

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4312
Credit: 250570845
RAC: 34530

Richard Haselgrove

Richard Haselgrove wrote:

Host 1001562 is now running these, and has returned the first task - successful, but not yet validated.

This is one of the machines which started throwing errors when it reached 0414.20: the completed task is from that group, and many previous replications have failed. So the signs are good.

Yep - 414.20Hz is exactly where the numerical overflow happens. The problem is basically that C(99) specifies only minimal widths for datatypes such as "long". When compiling for 64Bit systems, Linux and MacOS silently use 64Bit "long"s, while Windows still uses the minimal specified width here (32Bit). I'm not entirely sure whether this is in the compiler or the runtime math library, but anyway - using a "long long" here (or more precisely: llround() instead of lround()) fixed it. Was pretty hard to track down, though.

BM

archae86
archae86
Joined: 6 Dec 05
Posts: 3157
Credit: 7225614931
RAC: 1052139

Of my three machines, one had

Of my three machines, one had 100% fast error rate to work it was issued in early and mid-December.  So these were not the 412+ units that troubled others.

https://einsteinathome.org/host/12260865

Happily, this machine has now processed and returned nine WUs today of 477.8 frequency, of which six have already validated.  So it appears that the "fixed" application has addressed whatever problem this machine was having with the work.  Or possibly the problem it had before was specific to another frequency range.

My other two machines have each processed and returned several tasks newly sent to them.  In these two cases the tasks were generally _9 or _10 reissues of 414.n or 420.n tasks which had already failed on several other machines--so just completing them is good news.  No validations from these yet, as they appear to be awaiting quorum partner returns.

 

 

[AF>EDLS]zOU
[AF>EDLS]zOU
Joined: 5 May 15
Posts: 65
Credit: 384235373
RAC: 0

Cool, i've enabled the app

Cool, i've enabled the app again, we'll see how it goes .

 

Thank you for the quick fix team !

Jesse Viviano
Jesse Viviano
Joined: 8 Jun 05
Posts: 33
Credit: 133045917
RAC: 0

Bernd Machenschalk

Bernd Machenschalk wrote:

Richard Haselgrove wrote:

Host 1001562 is now running these, and has returned the first task - successful, but not yet validated.

This is one of the machines which started throwing errors when it reached 0414.20: the completed task is from that group, and many previous replications have failed. So the signs are good.

Yep - 414.20Hz is exactly where the numerical overflow happens. The problem is basically that C(99) specifies only minimal widths for datatypes such as "long". When compiling for 64Bit systems, Linux and MacOS silently use 64Bit "long"s, while Windows still uses the minimal specified width here (32Bit). I'm not entirely sure whether this is in the compiler or the runtime math library, but anyway - using a "long long" here (or more precisely: llround() instead of lround()) fixed it. Was pretty hard to track down, though.

C99 and later have the header files inttypes.h and stdint.h that allows you to have consistent cross-platform integer types as seen in https://en.wikipedia.org/wiki/C_data_types#inttypes.h and https://stackoverflow.com/questions/7597025/difference-between-stdint-h-and-inttypes-h . The C++ counterparts to inttypes.h and stdint.h are cinttypes and cstdint respectively.

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4312
Credit: 250570845
RAC: 34530

I know. LALSuite also has its

I know. LALSuite also has its own deterministic size types that we use. However, the 'long' here is not in our code. It is in the definition of standard math functions like lround(). These operate on and return 'long's, whatever that is on the current platform. If 'long' is 32Bit and you use lround() on a (double precision) value that is too large, you get a numerical overflow. This throws a "floating point exception" (FPE), which you have seen on NVidia. Or it leads to absurd values in the following, which you saw as "input domain error" on AMD, because apparently the AMD OpenCL driver disables FPEs (which is not a good thing IMO).

BM

Conan
Conan
Joined: 19 Jun 05
Posts: 172
Credit: 8333821
RAC: 9948

With O3MDF now fixed

With O3MDF now fixed (hopefully), is it possible to get O3MD1 CPU work units going again, there is none in the queue and hasn't been for awhile.

I would like to run some more of them please.

 

Conan

[AF>EDLS]zOU
[AF>EDLS]zOU
Joined: 5 May 15
Posts: 65
Credit: 384235373
RAC: 0

Got a few tasks in error

Got a few tasks in error still:

 

https://einsteinathome.org/host/12769171/tasks/6/0

Ian&Steve C.
Ian&Steve C.
Joined: 19 Jan 20
Posts: 3958
Credit: 47011582642
RAC: 64955002

[AF>EDLS wrote:zOU] Got a

[AF>EDLS wrote:

zOU]

Got a few tasks in error still:

 

https://einsteinathome.org/host/12769171/tasks/6/0

how many tasks are you trying to run at once? 
 

you’re getting : 

CL_MEM_OBJECT_ALLOCATION_FAILURE

which means you’re running out of GPU memory. 

_________________________________________________________________________

Werinbert
Werinbert
Joined: 31 Dec 12
Posts: 20
Credit: 100156387
RAC: 0

I was getting GPU tasks just

I was getting GPU tasks just fine up until early this morning now I am not getting any tasks at all. All updates that I attempt show in the Boinc log "no tasks sent" but no mention of tasks not available. I haven't changed any settings, so is there something else that change to prevent me getting tasks?

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.