Multi-Directional Gravitational Wave Search on O3 data (O3MD1/F)

Boca Raton Comm...

Joined: 4 Nov 15

Posts: 240

Credit: 10552535586

RAC: 25524524

Styx N Stones wrote: I

20 Dec 2022 19:27:25 UTC

Message 205466 in response to message 205464

(moderation:

)

Styx N Stones wrote:

I know... try it out and see for myself.

You nailed it!

That being said, I would say watch your VRAM and CUDA core usage. I run 3 wu at a time (on the Nvidia systems) and some people run 2. I do not run a 3060 but I know people here do have them- they can speak better to what they can optimally run. But, play around with it- see what works best for your system.

Oliver Behnke

Moderator

Administrator

Joined: 4 Sep 07

Posts: 984

Credit: 25171376

RAC: 35

Yep, that's exactly why we

21 Dec 2022 9:45:08 UTC

Message 205508

(moderation:

)

Yep, that's exactly why we introduced that preference setting in the first place. It's meant to give you freedom to tune things to your personal rig. There are way too many factors at play to design a GPU app that does adjust itself dynamically to the underlying hardware and deliver the best performance, given the vast ecosystem we have to support.

Cheers

Einstein@Home Project

Drago75

Joined: 19 Sep 20

Posts: 15

Credit: 22187191

RAC: 47067

The new MDGW O3 units run

23 Dec 2022 9:56:19 UTC

Message 205582

(moderation:

)

The new MDGW O3 units run well on my hosts except on this one. R9-3900X,Win 10, 32GB Ram, Asus 3070-Ti OC (but throttled down) with the latest Nvidida Studio driver. All units errored out after 10 seconds with "unkown error". Those wus also came back faulty by all wingmen which had a variety of hardware installed. Could it be that there is an incompatibility with the studio driver? Here is an example: Workunit 692551239

Richard Haselgrove

Joined: 10 Dec 05

Posts: 2143

Credit: 2956229792

RAC: 716027

I have one machine (computer

24 Dec 2022 10:36:07 UTC

Message 205644

(moderation:

)

I have one machine (computer 1001564) which has failed over 400 tasks of the new GW run: but it has validated over 500.

Every failure I've investigated has happened in the first 8 - 10 seconds of the run, and is of the type "Float Invalid Operation", discussed earlier in this thread. Machine is an Intel i5, and the errors are happening on the NVidia GTX 1660 Super GPU under Windows 7 with driver 472.12. I have two other identical machines, with the same hardware and software, which are not showing errors. The failing machine is running Gamma-ray pulsar binary tasks just fine on the same GPU.

Many of the tasks I've failed have a high replication count: they fail on other users' machines as well. My conclusion has to be that there are one or more faulty GW datasets out there, and because of the adaptive replication scheduling used here, once you've got a bad batch - you're stuck with it. I got so many errors yesterday that I was given a 24-hour timeout for 'quota exceeded'. I reset the project, in the hope of getting a different dataset, but the errors have continued today (I've opted out of GW on that particular machine for the time being).

In the New Year, can we think about catching this type of systemic error earlier - and nipping it in the bud?

[AF>EDLS]zOU

Joined: 5 May 15

Posts: 65

Credit: 384235373

RAC: 0

It has been answered one page

24 Dec 2022 11:03:10 UTC

Message 205646

(moderation:

)

It has been answered one page before. ;-)

[AF>EDLS]zOU

Joined: 5 May 15

Posts: 65

Credit: 384235373

RAC: 0

Oliver Behnke wrote: Hi

24 Dec 2022 11:03:32 UTC

Message 205648 in response to message 205427

(moderation:

)

Oliver Behnke wrote:

Hi Zou,

We are aware of an issue that can affect the Windows GPU app right now. We'll look into it ASAP but it'll take until the first week of January, unfortunately (see above). We'll update this thread as soon as we think we've resolved the issue. Until then it's of course perfectly fine to opt out of the app for the time being.

Sorry for the hassle, sometimes these bugs only manifest themselves when launching the apps full-scale, despite all beta testing we do.

Best,
Oliver

Richard Haselgrove

Joined: 10 Dec 05

Posts: 2143

Credit: 2956229792

RAC: 716027

[AF>EDLS wrote:zOU] Oliver

24 Dec 2022 11:15:57 UTC

Message 205650 in response to message 205648

(moderation:

)

[AF>EDLS wrote:

zOU]

Oliver Behnke wrote:

We are aware of an issue that can affect the Windows GPU app right now...

Yes, I saw and was aware of that. My reason for posting was to suggest that this is a data error, rather than an app error - before Oliver starts searching for a needle in the wrong haystack.

Ian&Steve C.

Joined: 19 Jan 20

Posts: 3945

Credit: 46748092642

RAC: 64158732

All of the wingmen seem to be

24 Dec 2022 15:43:36 UTC

Message 205655 in response to message 205650

(moderation:

)

All of the wingmen seem to be windows hosts though. I looked through about 30 of your errors and couldn’t find any that had Linux wingmen. I did see some apple/darwin hosts in there, but I didn’t check if the error is the same or not.

and many of the Linux hosts I’ve looked at have been producing all tasks to completion without error.

would be odd that they are sending bad tasks only to windows hosts. Can you find any of your errors where a wingman was Linux with the same error? (ie, not a lack of enough VRAM or something)?

since it only seems to be affecting Windows and maybe Apple/Mac, and not Linux, that would point to an application problem.

_________________________________________________________________________

Greg_BE

Joined: 15 Aug 08

Posts: 90

Credit: 106145625

RAC: 23456

What is causing Float Invalid

25 Dec 2022 11:07:03 UTC

Message 205656

(moderation:

)

What is causing Float Invalid Operation at location 00ac178b?

It looks like I can not process these tasks for some reason.

I have 4 pages of errors now.

Oh I see now...ok...well suspending the runs until January then.

[AF>EDLS]zOU

Joined: 5 May 15

Posts: 65

Credit: 384235373

RAC: 0

Thank you

24 Dec 2022 16:23:31 UTC

Message 205657 in response to message 205655

(moderation:

)

Thank you

Multi-Directional Gravitational Wave Search on O3 data (O3MD1/F)

Forums › Technical News

Comment viewing options

Forums › Technical News