The O2-All Sky Gravitational Wave Search on GPUs - discussion thread.

Bernd Machenschalk

Moderator

Administrator

Joined: 15 Oct 04

Posts: 4312

Credit: 250431026

RAC: 35057

We have two "file upload

9 Sep 2019 5:54:00 UTC

Message 173223

(moderation:

)

We have two "file upload handlers" running (for different file sizes). Occasionally these hang, we still don't know why. We are still investigating, but with rather low priority. For the time being we are automatically restarting these every 6h automatically. For each restart the file upload handlers are offline for 5-10 mins, so we don't want to do this too often.

Betreger

Joined: 25 Feb 05

Posts: 992

Credit: 1588402402

RAC: 761323

My invalids continue on a

9 Sep 2019 15:46:51 UTC

Message 173232

(moderation:

)

My invalids continue on a GTX1060. Is the object of this exercise to develop the app or to do more GW science?

Mad_Max

Joined: 2 Jan 10

Posts: 154

Credit: 2211904743

RAC: 321820

Gary Roberts wrote:Well

12 Sep 2019 1:07:17 UTC

Message 173307 in response to message 172896

(moderation:

)

Gary Roberts wrote:

Well done!! and thanks very much for the detailed explanation. I'm sure you're absolutely right!!

Everything you point out makes perfect sense. It's always very satisfying to find out why these strange things happen so I'm very grateful to you for your persistence in tracking down the cause. I'll certainly be interested in anything you work out about the use of <fraction_done_exact/> if you do go ahead and test that option.

I did this test. And yes - normally working (on newer GPUs) GPU GW tasks also have same "progress reset" behavior as seen on older improperly working GPUs.
Only it just reset back from 3-7% progress to zero depending on GPU and CPU speed, while with abnormally slow older GCN GPUs it may take up to 50-70% of "simulated progress" before progress reset back to 0% when app make a first progress report to BOINC.

But it turned out that <fraction_done_exact/> option in app_config does not alter this situation. Boinc still use and show "simulated" progress including "progress resets" if it does not have actual progress info from the app.

Looks like this option only affect estimation of remaining run time of the task. Without it BOINC use general estimation based on "size of the task", hardware benchmark and project DCF minus already elapsed time (like Size/HW_speed*DFC - elapsed time).

With this option on it uses simple strict formula for running tasks: (elapsed time) / (fraction done), ignoring DFC, benchmarks and "tasks size". (Although it still use them for other tasks waiting in work queue.)

But it does not affect progress bar: it always shows "simulated progress" if app does not report actual progress for any reason.
A little weird decision from BOINC programmers...

Betreger

Joined: 25 Feb 05

Posts: 992

Credit: 1588402402

RAC: 761323

It looks as though the horrid

13 Sep 2019 18:07:21 UTC

Message 173326

(moderation:

)

It looks as though the horrid rate of invalid results on my GTX1060 running 1X has been ameliorated. I'll let it run that way for a day or so and if it remains good I'll try 2X.

Keith Myers

Joined: 11 Feb 11

Posts: 4964

Credit: 18713164664

RAC: 6376599

When the last of my GW gpu

13 Sep 2019 22:31:51 UTC

Message 173328

(moderation:

)

When the last of my GW gpu tasks got into high priority mode due to deadlines, and tasks started running on two cards simultaneously, my crunch times stretched out enormously. Where I would normally run in 1400-1600 seconds, task times stretched out to as high as 20,000 seconds.

Zalster

Joined: 26 Nov 13

Posts: 3117

Credit: 4050672230

RAC: 0

Keith Myers wrote:When the

13 Sep 2019 22:35:14 UTC

Message 173329 in response to message 173328

(moderation:

)

Keith Myers wrote:

When the last of my GW gpu tasks got into high priority mode due to deadlines, and tasks started running on two cards simultaneously, my crunch times stretched out enormously. Where I would normally run in 1400-1600 seconds, task times stretched out to as high as 20,000 seconds.

That is weird. I have a dual machine doing nothing but GW gpu task and the task run normally. Maybe the high priority mode has something to do with it?

Jim1348

Joined: 19 Jan 06

Posts: 463

Credit: 257957147

RAC: 0

I just got my first valid on

14 Sep 2019 13:11:29 UTC

Message 173337

(moderation:

)

I just got my first valid on an RX 570 (Win7 64-bit) after about a dozen invalids. It was validated against a Linux machine.

https://einsteinathome.org/workunit/418963472

Is it a fluke or is Bernd tweaking the validator?

Matt White

Joined: 9 Jul 19

Posts: 120

Credit: 280798376

RAC: 0

Jim1348 wrote:I just got my

14 Sep 2019 14:29:02 UTC

Message 173338 in response to message 173337

(moderation:

)

Jim1348 wrote:

I just got my first valid on an RX 570 (Win7 64-bit) after about a dozen invalids. It was validated against a Linux machine.

https://einsteinathome.org/workunit/418963472

Is it a fluke or is Bernd tweaking the validator?

Since this is a beta task, I would speculate that quite a bit of tweaking is going on in the background.

Clear skies,

Matt

Matt White

Joined: 9 Jul 19

Posts: 120

Credit: 280798376

RAC: 0

However, a significant change

14 Sep 2019 14:31:03 UTC

Message 173339

(moderation:

)

However, a significant change in the application (most likely any change) would be indicated by a revision number change.

Clear skies,

Matt

Jim1348

Joined: 19 Jan 06

Posts: 463

Credit: 257957147

RAC: 0

Jim1348 wrote:I just got my

15 Sep 2019 10:20:34 UTC

Message 173354 in response to message 173337

(moderation:

)

Jim1348 wrote:

I just got my first valid on an RX 570 (Win7 64-bit) after about a dozen invalids. It was validated against a Linux machine.

I now have five validated,and no more invalids. That is encouraging. But the real point of interest is that they have been validated against three Linux machines, a FirePro D500 running under Darwin, and a Titan X running under Windows 10.

If it can do that, it can do anything. I think Bernd has nailed it, and least for this card. I hope it holds true for the others too.

The O2-All Sky Gravitational Wave Search on GPUs - discussion thread.

Forums › Cruncher's Corner

Comment viewing options

Forums › Cruncher's Corner