FGRPB1G work shortage

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4,305
Credit: 249,042,731
RAC: 33,582
Topic 228360

Currently we're having trouble fulfilling the high demand of FGRPB1G "work". There are indications that it might get better by the end of October, but if you're limiting yourself to FGRPB1G right now, please consider running BRP7 instead, at least for a while.

BM

astro-marwil
astro-marwil
Joined: 28 May 05
Posts: 527
Credit: 599,966,543
RAC: 1,100,361

Hallo Bernd! Many thanks

Hallo Bernd!

Many thanks for the hint.

Best regards and happy crunching

Martin

JohnDK
JohnDK
Joined: 25 Jun 10
Posts: 115
Credit: 2,473,020,478
RAC: 2,247,025

Bernd Machenschalk

Bernd Machenschalk wrote:

Currently we're having trouble fulfilling the high demand of FGRPB1G "work". There are indications that it might get better by the end of October, but if you're limiting yourself to FGRPB1G right now, please consider running BRP7 instead, at least for a while.

Any new indications?

Tom M
Tom M
Joined: 2 Feb 06
Posts: 6,280
Credit: 9,004,588,487
RAC: 12,181,828

Bernd Machenschalk

Bernd Machenschalk wrote:

Currently we're having trouble fulfilling the high demand of FGRPB1G "work". There are indications that it might get better by the end of October, but if you're limiting yourself to FGRPB1G right now, please consider running BRP7 instead, at least for a while.

Has it gotten better on your end?

Tom M

 

A Proud member of the O.F.A.  (Old Farts Association).  Be well, do good work, and keep in touch.® (Garrison Keillor)

Harri Liljeroos
Harri Liljeroos
Joined: 10 Dec 05
Posts: 4,214
Credit: 3,124,533,390
RAC: 1,841,010

The graph on the front page

The graph on the front page about granted credit is showing a decline. I assume that is because people are shifting away from FGRB1G to other types of tasks and they are not so generous with credit. This graph is of course a slow indicator due to the validation process.

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4,305
Credit: 249,042,731
RAC: 33,582

Sorry, no good news yet for

Sorry, no good news yet for FGRPB1G.

I know that work is being done on new data, as well as on the somewhat unstable pre-processing code/pipeline. But I don't know how much longer it will take until everything is working again.

BM

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5,870
Credit: 116,080,482,246
RAC: 35,939,056

Bernd Machenschalk wrote:....

Bernd Machenschalk wrote:
.... the somewhat unstable pre-processing code/pipeline.

Hi Bernd,
That description made me wonder if a problem I've seen a couple of times recently might be related in some way.

I have a lot of hosts crunching FGRPB1G.  Most have no peripherals attached - just power and network.  Many have uptimes in the hundreds of days.  I use various scripts to monitor and control them.  In particular, one script visits all hosts every hour and produces quite a detailed log which allows me to track any unusual events or problems that might otherwise go unnoticed for quite a while.

Recently, there have been a number of detections of an issue that seems likely to be caused by something server-side.  It has just happened on two different machines on consecutive days.  It results in spurious 24hr back-offs as shown in these snips from the event logs.  The times are local (UTC+10).

04-Nov-2022 00:44:59 [Einstein@Home] Sending scheduler request: To fetch work.<br />
04-Nov-2022 00:44:59 [Einstein@Home] Requesting new tasks for AMD/ATI GPU<br />
04-Nov-2022 00:45:02 [Einstein@Home] Scheduler request completed: got 0 new tasks<br />
04-Nov-2022 00:45:02 [Einstein@Home] platform 'x86_64-pc-linux-gnu' not found<br />
04-Nov-2022 00:45:02 [Einstein@Home] Project requested delay of 86400 seconds

and

05-Nov-2022 02:50:00 [Einstein@Home] Sending scheduler request: To fetch work.<br />
05-Nov-2022 02:50:00 [Einstein@Home] Reporting 1 completed tasks<br />
05-Nov-2022 02:50:00 [Einstein@Home] Requesting new tasks for AMD/ATI GPU<br />
05-Nov-2022 02:50:03 [Einstein@Home] Scheduler request completed: got 0 new tasks<br />
05-Nov-2022 02:50:03 [Einstein@Home] platform 'x86_64-pc-linux-gnu' not found<br />
05-Nov-2022 02:50:03 [Einstein@Home] Project requested delay of 86400 seconds

The events happened in the middle of the night so I didn't get to see the warnings until the script itself had corrected the issue.  The "got 0 new tasks" is fine - it happens regularly - but why should there be a "platform not found" plus a 24hr backoff?

The time between scheduler contacts (last RPC) is monitored by the script.  If that time becomes excessive and there is no detected issue with the host itself, boinccmd is used to force a contact with the project.  This happened in both cases and the backlog of completed work that had piled up was returned.  One had ~50 tasks.

In my case, I'm not unduly bothered by these events.  If it's a bug, it would be good to fix it though :-).  Since I haven't seen other reports about this, maybe it's something specific to my setup.

Cheers,
Gary.

Keith Myers
Keith Myers
Joined: 11 Feb 11
Posts: 4,923
Credit: 18,477,900,924
RAC: 5,874,828

You are not the only person

You are not the only person to see this exact symptom of a server induced 24 hour backoff.  Many of my team have suffered similar backoffs.

Easy to fix with a boinccmd project connection as you said.

 

mikey
mikey
Joined: 22 Jan 05
Posts: 12,564
Credit: 1,838,909,891
RAC: 21,057

Keith Myers wrote:You are

Keith Myers wrote:

You are not the only person to see this exact symptom of a server induced 24 hour backoff.  Many of my team have suffered similar backoffs.

Easy to fix with a boinccmd project connection as you said.

MilkyWay has/had a similar problem for me as well and others are complaining about it in the forums over there but they have a brand new Admin who's still trying to get up to speed with swapping to a new Server etc

Ian&Steve C.
Ian&Steve C.
Joined: 19 Jan 20
Posts: 3,914
Credit: 44,162,789,309
RAC: 63,954,687

I believe the random 24hr

I believe the random 24hr back off is not related to the work availability issue. Since that’s been happening well before the issues with work availability. 

_________________________________________________________________________

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4,305
Credit: 249,042,731
RAC: 33,582

Thanks for reporting this.

Thanks for reporting this. This is a sporadic error that resulted from an inconsistency that occurred during the OS upgrade (Oct 18) and apparently went unnoticed so far. Fixed, should not occur again.

BM

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.