No work available for FGRPB1G

Keith Myers
Keith Myers
Joined: 11 Feb 11
Posts: 5029
Credit: 18977655305
RAC: 6475590
Topic 216269

Anybody heard anything about the lack of FGRPB1G gpu tasks?

 

Jim1348
Jim1348
Joined: 19 Jan 06
Posts: 463
Credit: 257957147
RAC: 0

I just picked up some at

I just picked up three at 22:34 UTC for my Ubuntu machine.

But maybe that was the last of them?

Keith Myers
Keith Myers
Joined: 11 Feb 11
Posts: 5029
Credit: 18977655305
RAC: 6475590

I haven't received any

I haven't received any today.  Crunchers are cold from the lack of work from Seti, Einstein and MilkyWay.  They got a little bit of work from GPUGrid but that didn't last long.

I knew there would be troubles when MilkyWay announced that they would be starting regular Tuesday maintenance schedules along with Seti.  Always have depended on Einstein to carry the load when my other projects dropped out.  Now Einstein is not dependable either.  So what's the use of having backup projects when your backup projects don't have work either.  Bah humbug!????

 

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5876
Credit: 118563590968
RAC: 25900290

Jim1348 wrote:.... maybe that

Jim1348 wrote:
.... maybe that was the last of them?

They were probably resends - extra copies of failed or deadline miss tasks.  You can easily distinguish resends from the initial 'primary' tasks.  They will have an _2 (or higher) extension in their name, as opposed to _0 or _1 for the original copies.

There are currently almost 0.25M tasks in progress and at a rough guess, maybe 10-20% of those will fail in some way or not be returned.  So even if the workunit generator isn't kicked into life at today's start-of-business in Hannover, there will be quite a few more resends to be picked up if you happen to be asking at just the 'right' time :-).

Only problem is the 'right' time could be anywhere between now and a few weeks into the future :-).

We just have to hope that last night wasn't 'Wild Party Night' in Hannover and that people get to work on time and sufficiently awake to notice :-).

I'm always a bit bemused when one of these 'once in a blue moon' events happens.  Other projects have problems/outages more regularly, but if it happens here, this project has suddenly become unreliable?? :-).

 

Cheers,
Gary.

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4332
Credit: 251889932
RAC: 33845

There has been a problem in

There has been a problem in the pre-processing pipeline that we weren't able to fix before the buffers ran dry. We're frantically working on it.

Details: the central file system of our compute cluster Atlas is currently unstable, and apparently neither we nor the vendor support is able to determine the root cause to ultimately fix it. Currently it's working, and if it doesn't tip over in the next 12h we should be able to push at least one or two FGRPB1G datasets trough. Typing this with fingers crossed...

BM

Millenium
Millenium
Joined: 8 Oct 14
Posts: 21
Credit: 33102476
RAC: 0

Keith Myers wrote:Now

Keith Myers wrote:
Now Einstein is not dependable either.  So what's the use of having backup projects when your backup projects don't have work either.  Bah humbug!????

A bit exagerated in my opinion Tongue Out luckily Einstein is a solid project! I just dislike the website graphic but that is just an aesthetic personal preference!

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5876
Credit: 118563590968
RAC: 25900290

We seem to have some work

We seem to have some work again!! :-).

 

Cheers,
Gary.

Betreger
Betreger
Joined: 25 Feb 05
Posts: 992
Credit: 1616223146
RAC: 762381

Yep, I didn't run dry life

Yep, I didn't run dry life and is good in Einstenland.

Millenium
Millenium
Joined: 8 Oct 14
Posts: 21
Credit: 33102476
RAC: 0

Confirming, received some WUs

Confirming, received some WUs

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5876
Credit: 118563590968
RAC: 25900290

Betreger wrote:Yep, I didn't

Betreger wrote:
Yep, I didn't run dry ....

All of mine ran dry.

I keep a fairly small work cache on each machine and I have a script that gets all machines to top up to an extra 0.5 days worth for the over-night period.  As Murphy's Law would always predict, the well ran dry just in time to make sure that nobody was going to get a drink last evening :-).  I hung around for an extra hour or two after Bernd said he was on to it with fingers crossed, but as we all know, crossed fingers can't defeat Murphy :-).

All of mine are pretty much back to normal now.  There are two particular problems for me if hosts run dry temporarily.  The first is that after trying to get work and being refused, hosts will go into increasingly longer backoffs which means that there can be quite a delay in asking for a drink when the well fills up again.  I've solved that one by having an option in one of my scripts that requests the local boinccmd to force an 'update' on each host.  Whatever stage of the backoff a host happens to be in, this gets canceled by the 'update' and the host can take an immediate drink.

The other problem is that when tasks finally arrive (with all my GPUs running at least 2 concurrently) concurrent tasks all start together and tend to stay that way.  My experience suggests that you don't get the lowest crunch time if tasks start and finish at the same time so I try to make sure that start times are staggered so this inefficiency is avoided as much as possible.  I've spent the last few hours, sorting that out manually.  I'm now designing a new module that hopefully will be able to do the job automatically.

 

Cheers,
Gary.

archae86
archae86
Joined: 6 Dec 05
Posts: 3161
Credit: 7290071694
RAC: 2114482

Gary Roberts wrote:The other

Gary Roberts wrote:

The other problem is that when tasks finally arrive (with all my GPUs running at least 2 concurrently) concurrent tasks all start together and tend to stay that way.  My experience suggests that you don't get the lowest crunch time if tasks start and finish at the same time so I try to make sure that start times are staggered so this inefficiency is avoided as much as possible.  I've spent the last few hours, sorting that out manually.  I'm now designing a new module that hopefully will be able to do the job automatically.

 

Gary, While I also try to adjust matters from time to time to get two jobs sharing a GPU to be offset in time, I suspect this is less useful with the current flavor of WU data file than it has been at other times.  At least on my Nvidia/Windows 10 platforms, the time spent in the "past the end" portion has shrunk to the vanishing point.  I think the main virtue of offset times was allowing one of the two jobs to be running in the main portion while the other was "past then end".  As that condition can't occupy any appreciable time at the moment, I suspect taking the trouble to induce offset is unusually unfruitful at the moment.

None of which says anything about what a next batch of WU files might be like.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.