Currently we're having trouble fulfilling the high demand of FGRPB1G "work". There are indications that it might get better by the end of October, but if you're limiting yourself to FGRPB1G right now, please consider running BRP7 instead, at least for a while.
BM
Copyright © 2024 Einstein@Home. All rights reserved.
Hallo Bernd! Many thanks
)
Hallo Bernd!
Many thanks for the hint.
Best regards and happy crunching
Martin
Bernd Machenschalk
)
Any new indications?
Bernd Machenschalk
)
Has it gotten better on your end?
Tom M
A Proud member of the O.F.A. (Old Farts Association). Be well, do good work, and keep in touch.® (Garrison Keillor)
The graph on the front page
)
The graph on the front page about granted credit is showing a decline. I assume that is because people are shifting away from FGRB1G to other types of tasks and they are not so generous with credit. This graph is of course a slow indicator due to the validation process.
Sorry, no good news yet for
)
Sorry, no good news yet for FGRPB1G.
I know that work is being done on new data, as well as on the somewhat unstable pre-processing code/pipeline. But I don't know how much longer it will take until everything is working again.
BM
Bernd Machenschalk wrote:....
)
Hi Bernd,
That description made me wonder if a problem I've seen a couple of times recently might be related in some way.
I have a lot of hosts crunching FGRPB1G. Most have no peripherals attached - just power and network. Many have uptimes in the hundreds of days. I use various scripts to monitor and control them. In particular, one script visits all hosts every hour and produces quite a detailed log which allows me to track any unusual events or problems that might otherwise go unnoticed for quite a while.
Recently, there have been a number of detections of an issue that seems likely to be caused by something server-side. It has just happened on two different machines on consecutive days. It results in spurious 24hr back-offs as shown in these snips from the event logs. The times are local (UTC+10).
and
The events happened in the middle of the night so I didn't get to see the warnings until the script itself had corrected the issue. The "got 0 new tasks" is fine - it happens regularly - but why should there be a "platform not found" plus a 24hr backoff?
The time between scheduler contacts (last RPC) is monitored by the script. If that time becomes excessive and there is no detected issue with the host itself, boinccmd is used to force a contact with the project. This happened in both cases and the backlog of completed work that had piled up was returned. One had ~50 tasks.
In my case, I'm not unduly bothered by these events. If it's a bug, it would be good to fix it though :-). Since I haven't seen other reports about this, maybe it's something specific to my setup.
Cheers,
Gary.
You are not the only person
)
You are not the only person to see this exact symptom of a server induced 24 hour backoff. Many of my team have suffered similar backoffs.
Easy to fix with a boinccmd project connection as you said.
Keith Myers wrote:You are
)
MilkyWay has/had a similar problem for me as well and others are complaining about it in the forums over there but they have a brand new Admin who's still trying to get up to speed with swapping to a new Server etc
I believe the random 24hr
)
I believe the random 24hr back off is not related to the work availability issue. Since that’s been happening well before the issues with work availability.
_________________________________________________________________________
Thanks for reporting this.
)
Thanks for reporting this. This is a sporadic error that resulted from an inconsistency that occurred during the OS upgrade (Oct 18) and apparently went unnoticed so far. Fixed, should not occur again.
BM