server: einstein4 is down

robl
robl
Joined: 2 Jan 13
Posts: 1709
Credit: 1454484346
RAC: 8463
Topic 224432

and with it go the work generators.  I have not been able to get any GRPB#1 WUs in several hours.  

Ian&Steve C.
Ian&Steve C.
Joined: 19 Jan 20
Posts: 3681
Credit: 33842902622
RAC: 36889451

switch over to gravitational

switch over to gravitational wave tasks. the water's nice over here. everything still running for O2MD/F

_________________________________________________________________________

robl
robl
Joined: 2 Jan 13
Posts: 1709
Credit: 1454484346
RAC: 8463

Ian&Steve C. wrote: switch

Ian&Steve C. wrote:

switch over to gravitational wave tasks. the water's nice over here. everything still running for O2MD/F

Done.  

Tom M
Tom M
Joined: 2 Feb 06
Posts: 5589
Credit: 7675729313
RAC: 1852300

Is there any prediction about

Is there any prediction about when it will likely be back up?

Tom M

 

A Proud member of the O.F.A.  (Old Farts Association).  Be well, do good work, and keep in touch.® (Garrison Keillor)

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5842
Credit: 109407721236
RAC: 35292218

Tom M wrote:Is there any

Tom M wrote:
Is there any prediction about when it will likely be back up?

The thread title suggested that Einstein4 was down.  However, It's been up the whole time.  My guess is that the automatic process of switching task generation to a new data file has some sort of problem.  This has happened before and it's normally rectified quite quickly.  Perhaps it's something more serious this time.

Earlier on today, I requested that a quick note be put in Tech News to inform as to when we can expect new work to be generated.  Until there is some sort of response from the staff, there's no way of knowing.

My hosts doing FGRPB1G have about an hour of work left each.  They were suspended at that point awaiting the availability of new tasks.  They'll be released from suspension after the dust settles on what is likely to be several hours of feeding frenzy when new work eventually shows up.

Cheers,
Gary.

Tom M
Tom M
Joined: 2 Feb 06
Posts: 5589
Credit: 7675729313
RAC: 1852300

Ian&Steve C. wrote: switch

Ian&Steve C. wrote:

switch over to gravitational wave tasks. the water's nice over here. everything still running for O2MD/F

That's for sure.  Once I toggled the "run non-preferred tasks/applications" my Zero resource backup project stopped getting tasks because GW was/is munching steadily.

Tom M

A Proud member of the O.F.A.  (Old Farts Association).  Be well, do good work, and keep in touch.® (Garrison Keillor)

archae86
archae86
Joined: 6 Dec 05
Posts: 3145
Credit: 7024604931
RAC: 1810700

One warning to people

One warning to people considering turning on the option to allow non-preferred applications:

Consider setting your fetch queue preference to something extremely low first, then make sure your machine has had a project update cycle, before then activating the permission to download GW.

Otherwise, you'll get a gulp of GW tasks sized to fulfill your previous queue request, with the run time of GW tasks being wildly wrongly estimated using the DCF observed on your GRP tasks.  To compound the misery, GW tasks I got today still had the 7-day deadline, so your chances of going into panic mode as soon as your first GW task finishes are greatly elevated.

I failed to follow my own advice, and got 171 GW tasks on my first request.  The DCF in effect caused BOINC to estimate elapsed time for these on my system at just over 3 minutes.  The truth will be rather a lot more.

Matt White
Matt White
Joined: 9 Jul 19
Posts: 120
Credit: 280798376
RAC: 0

In lieu of other work, I

In lieu of other work, I decided to allow my AMD GPU to download GW tasks. The NVIDIA GPU is doing work for Milkyway@Home, my backup project. I may leave this configuration in place. My NVIDIA card doesn't have much horsepower and takes quite a while to crunch stuff over here, and because of the lack of memory, is no good for GW tasks. Both my GPUs reside in the same box, and I don't allow CPU tasks on that machine. My server handles CPU work, and, is mostly doing GW tasks, with the occasional Gamma Ray tossed in.

Clear skies,
Matt
robl
robl
Joined: 2 Jan 13
Posts: 1709
Credit: 1454484346
RAC: 8463

Gary Roberts wrote:Tom M

Gary Roberts wrote:

Tom M wrote:
Is there any prediction about when it will likely be back up?

The thread title suggested that Einstein4 was down.  However, It's been up the whole time.  My guess is that the automatic process of switching task generation to a new data file has some sort of problem.  This has happened before and it's normally rectified quite quickly.  Perhaps it's something more serious this time.

Earlier on today, I requested that a quick note be put in Tech News to inform as to when we can expect new work to be generated.  Until there is some sort of response from the staff, there's no way of knowing.

My hosts doing FGRPB1G have about an hour of work left each.  They were suspended at that point awaiting the availability of new tasks.  They'll be released from suspension after the dust settles on what is likely to be several hours of feeding frenzy when new work eventually shows up.

Maybe I should have been more clear.  I Just had a look and all work generators on einstein4 continue to be in a "not running state"

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5842
Credit: 109407721236
RAC: 35292218

robl wrote:... all work

robl wrote:
... all work generators on einstein4 continue to be in a "not running state"

Yes, that's usually the case.  Work generation is an 'on demand' thing and if data from which to generate tasks is available, and the available task supply drops below a threshold, the generation process runs for just long enough to 'top up' the number of tasks to an upper limit.  Normally that happens rather quickly so when a new status page snapshot is done (every 10 mins or so) the generators are 'caught' in the 'not running' state, most of the time.  Occasionally, one of the generators will actually be caught 'running' :-).

There will be something unusual if the status shows as 'disabled'.  That points to some sort of issue.

I've looked at the status page quite a few times over the last couple of days :-).  Occasionally I've seen other generators show as 'running' but not the FGRPB1G one.  That's why I'm guessing there's no data available (or something else is preventing the generator from starting) so it always shows as 'not running'.

I've tried to alert the staff and got no response.  Looks like they've all gone on extended holiday :-).

Cheers,
Gary.

archae86
archae86
Joined: 6 Dec 05
Posts: 3145
Credit: 7024604931
RAC: 1810700

I caught one of my hosts

I caught one of my hosts downloading a GRP CPU task just now, and imagined perhaps things had gotten better.

Not really.

A review of my tasks ready to run shows that last time I got freshly made tasks instead of resends (that is to say _0 or _1 at the end of the task ID, not _2, _3, _4...)

was 8:29 a.m. on January 5 (MDT)

since then I've gotten many dozens of resends, intermittently, and sometimes in non-tiny batches, but zero freshly baked.

So I score the outage of fresh GRP GPU work generation at about 61 hours and rising.

I assume it will get worse, as the resends will dry up.

My first host to switch over to GW has reminded me that I need to observe and make adjustments.

 

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.