error condition i am not understanding

Betreger

Joined: 25 Feb 05

Posts: 992

Credit: 1587229081

RAC: 755864

I consider pulsars to be

14 Nov 2019 18:28:28 UTC

Message 174346 in response to message 174345

(moderation:

)

I consider pulsars to be second prize.

Gary Roberts

Moderator

Joined: 9 Feb 05

Posts: 5872

Credit: 117485003797

RAC: 35484747

But they are a very worthy

14 Nov 2019 23:15:34 UTC

Message 174348 in response to message 174346

(moderation:

)

But they are a very worthy 2nd prize. The more you think about them, and about the next stage (black hole) if the star that went 'supernova' happened to be a bit bigger to start with, the more you must surely be impressed by these objects.

By knowing where they are, how massive they are, how fast they're spinning, what forms of radiation they are emitting, etc, it must make attaining the 1st prize a much more achievable outcome.

What does it feel like to be part of a brand new scientific discipline - the observation of the universe in gravitational waves - something that was in fantasy land just a few short years ago?

Think about the many recent scientific advances, like exoplanet discoveries kicked off by the Kepler mission, the confirmation of GW through both BH-BH and NS-NS mergers, the first photograph of a supermassive BH, etc, etc. These sorts of events (and many others) have made this a most exciting time to be alive, to be following and to be helping with the scientific progress. Does it really matter what particular part you're helping with? You're a winner if you're helping at all!

Cheers,
Gary.

Gary Roberts

Moderator

Joined: 9 Feb 05

Posts: 5872

Credit: 117485003797

RAC: 35484747

robl wrote:Rather then sit

14 Nov 2019 23:40:00 UTC

Message 174349 in response to message 174345

(moderation:

)

robl wrote:

Rather then sit idle I have selected "pulsar binary search #1".

That's exactly what I did with mine. I was in the process of gathering performance data for some different hardware types and different task multiplicities to work out what is the best type of system to process these tasks. I'm comparing very modern stuff with 3 particular CPU types. These are Intel 2C/4T, Intel 4C/4T and AMD 6C/12T, all with an RX 570 GPU. I'm comparing from 1x right through to 4x and even at least 5x on the 6C/12T - a Ryzen 2600.

The machine that was affected was the 4C/4T one (Intel i3-9100F) and I had built 3 of these so I'll just convert a second one and start gathering data again.

The biggest problem is something that Bernd mentioned - the fact that the WU generator isn't doing a good job of 'slicing and dicing' tasks of equal work content. The changes in crunch time are quite large when they happen and I don't see any way of selecting ones of 'equal size' over all three host types. I'm basically at the point of gathering many hundreds (possibly thousands) of tasks to try to ensure a believable average crunch time. I suspect the run might end pretty soon so maybe if the WU generator can be 'fixed' for the next iteration, I might be better waiting for that.

Cheers,
Gary.

Holmis

Joined: 4 Jan 05

Posts: 1118

Credit: 1055935564

RAC: 0

robl wrote:Rather then sit

15 Nov 2019 8:21:07 UTC

Message 174357 in response to message 174345

(moderation:

)

robl wrote:

Rather then sit idle I have selected "pulsar binary search #1".

Nice to see you found another option for getting the host working again!
Didn't think of that obvious one. Doh!

Bernd Machenschalk

Moderator

Administrator

Joined: 15 Oct 04

Posts: 4312

Credit: 250379271

RAC: 35000

It's terribly difficult to

15 Nov 2019 13:02:00 UTC

Message 174361

(moderation:

)

It's terribly difficult to get rid of tasks that have been assigned to a computer once. If the application versions or the preferences change, there may be no app version to complete the tasks that the scheduler tries to "resend", however usually it tries over and over again with every work request.

I now changed the scheduler such that it will "expire" a task when there is no app version to process it. I've yet to see a case where this change actually takes effect, though.

Anonymous

Bernd Machenschalk wrote:It's

15 Nov 2019 15:27:08 UTC

Message 174363 in response to message 174361

(moderation:

)

Bernd Machenschalk wrote:

It's terribly difficult to get rid of tasks that have been assigned to a computer after it was assigned to it once. If the application versions or the preferences change, there may be no app version to complete the tasks that the scheduler tries to "resend", however usually it tries over and over again with every work request.

I now changed the scheduler such that it will "expire" a task when there is no app version to process it. I've yet to see a case where this change actually takes effect, though.

Bernd,

I just noticed that on the PC in question I now have "Gravitational Wave search O2 Multi-Directional GPU" work units. I would think that your change to the scheduler might have resolved the problem.

EDIT: I have also noticed that the original 54 jobs are now showing: "Timed out - no response". FYI - not complaining.

Anonymous

Holmis wrote:robl

15 Nov 2019 15:46:16 UTC

Message 174364 in response to message 174357

(moderation:

)

Holmis wrote:

robl wrote:
Rather then sit idle I have selected "pulsar binary search #1".

Nice to see you found another option for getting the host working again!
Didn't think of that obvious one. Doh!

Sometimes I amaze myself!!!

Holmis

Joined: 4 Jan 05

Posts: 1118

Credit: 1055935564

RAC: 0

Bernd Machenschalk wrote:I

15 Nov 2019 20:31:47 UTC

Message 174367 in response to message 174361

(moderation:

)

Bernd Machenschalk wrote:

I now changed the scheduler such that it will "expire" a task when there is no app version to process it.

If this works without any adverse effects I think it's an excellent solution!
Thank you for trying to fix this without turning the "resend lost tasks" feature off.

Gary Roberts

Moderator

Joined: 9 Feb 05

Posts: 5872

Credit: 117485003797

RAC: 35484747

Bernd Machenschalk wrote:I

15 Nov 2019 22:20:00 UTC

Message 174370 in response to message 174361

(moderation:

)

Bernd Machenschalk wrote:

I now changed the scheduler such that it will "expire" a task when there is no app version to process it. I've yet to see a case where this change actually takes effect, though.

Thanks for that. In the tasks list for my host that had the problem, I now see the 9 lost tasks showing up with a status of "Timed out - no response" and listed as a group under the 'Errors' column. I don't care about that, but up to that point, the host had an unblemished record - 499 total with 9 in progress, 471 valid, 19 pending, 0 invalid. So my unblemished record is now 'polluted' with 9 errors :-).

As a result of your quick fix, I've now changed the 'location' (aka venue) of the host so it would stop requesting FGRPB1G and start requesting O2MDF once again. It now has a fresh bunch of tasks in progress. The very first batch were resends - at first glance I thought I was getting my 'lost' ones back again :-). They turned out to be different 'sequence' numbers for slightly different frequency bins, eg. 654.35 instead of 654.70, but close enough so that I already had the needed large data files. Obviously, locality scheduling is working nicely as intended. Once again, thanks very much for finding a solution.

Cheers,
Gary.

Gary Roberts

Moderator

Joined: 9 Feb 05

Posts: 5872

Credit: 117485003797

RAC: 35484747

Bernd Machenschalk wrote:...

16 Nov 2019 23:25:00 UTC

Message 174382 in response to message 174361

(moderation:

)

Bernd Machenschalk wrote:

... I've yet to see a case where this change actually takes effect, though.

Is it possible that the change in the handling of the O2MDF resends also affects the FGRPB1G search? As a result of looking into the problem mentioned in this thread, I've noticed a very large number of what were probably lost FGRPBIG tasks (inadvertently created by the OP of the thread) that are now listed as 'timed out - no response' errors. That might be a rather dramatic case where the change has taken effect :-).

Cheers,
Gary.

error condition i am not understanding

Forums › Cruncher's Corner

Comment viewing options

Forums › Cruncher's Corner