FGRP4 App version 1.14 vs 1.15, was: There's no CPU work available

Darren Peets
Darren Peets
Joined: 19 Nov 09
Posts: 37
Credit: 98590451
RAC: 27066
Topic 198255

Server status shows no work available on FGRP4 or S6Bucket*, and that the FGRP4 work generator is not running. It looks like BRP work is not being served for CPUs. It's clearly been like this for a while, I'm out of tasks.

Hopefully people are already aware of this and are too busy scrambling to read posts, but I thought I'd make sure.

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5845
Credit: 109965909163
RAC: 30657119

FGRP4 App version 1.14 vs 1.15, was: There's no CPU work availab

I sent a PM to Bernd and HB a few hours ago.

Cheers,
Gary.

Darren Peets
Darren Peets
Joined: 19 Nov 09
Posts: 37
Credit: 98590451
RAC: 27066

Thanks! Looks like it's been

Thanks! Looks like it's been fixed.

Oliver Behnke
Oliver Behnke
Moderator
Administrator
Joined: 4 Sep 07
Posts: 946
Credit: 25167626
RAC: 23

Sorry for the short work

Sorry for the short work production outage. We had to deploy a bug fix release (science-related) for FGRP that also required that we let the task pool run dry.

Cheers,
Oliver

 

Einstein@Home Project

Darren Peets
Darren Peets
Joined: 19 Nov 09
Posts: 37
Credit: 98590451
RAC: 27066

Would I also infer that if my

Would I also infer that if my wingman times out on a v1.14 task, I'll eventually lose a validation battle against the v1.15 resends?

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5845
Credit: 109965909163
RAC: 30657119

That's a good point. I don't

That's a good point. I don't know the answer.

From what Oliver implied (science related bug fix) the new app may well get a different 'answer' from the previous app. That wouldn't necessarily create a validation issue if resends for tasks crunched by the old app could also be forced to use the old app rather than the new.

I decided to see if I could find a quorum demonstrating what actually happens. Quite quickly, I found this quorum which appears to show that there might just be a problem looming here.

Firstly, note that the oldest two tasks were completed by the 1.14 app and didn't agree with each other. The normal process of issuing the third task was followed and this task, crunched by 1.15, was returned very quickly. So we now know that 1.14 resends will be crunched by 1.15 and not 1.14.

The really worrying bit is that the third task is also 'inconclusive' - it doesn't agree with either of the others. I think there would be a fairly low chance of this happening randomly like this if 1.15 results were supposed to be fully compatible with 1.14 results. It just looks suspicious. It will look even more suspicious if the 4th task (1.15) ends up agreeing with the 3rd task and both the 1.14 tasks miss out.

Cheers,
Gary.

Logforme
Logforme
Joined: 13 Aug 10
Posts: 332
Credit: 1714373961
RAC: 0

I got an invalid from a

I got an invalid from a resend against 2 wingmen running 1.14: https://einsteinathome.org/workunit/227919231

All my other 1.15 tasks are validated against 1.15 and came out valid.

Darren Peets
Darren Peets
Joined: 19 Nov 09
Posts: 37
Credit: 98590451
RAC: 27066

It looks like I've got a few

It looks like I've got a few mixed resends, and I have yet to see 1.14 validate against 1.15:
https://einsteinathome.org/workunit/227895662
https://einsteinathome.org/workunit/227991400

(it looks like I'm going to lose another 6 or so soon unless my absentee wingmen get their acts together very quickly)

Oliver Behnke
Oliver Behnke
Moderator
Administrator
Joined: 4 Sep 07
Posts: 946
Credit: 25167626
RAC: 23

Hi guys, Your assumption

Hi guys,

Your assumption is correct. 1.15 will most likely produce different result than 1.14 which means cross-validation would fail in those cases. That's why we waited at least until the then current set of workunits had been fully produced and sent. The alternative would have been to wait until all in-flight FGRP would have returned, but that would have meant that most volunteers would run out of CPU tasks as FGRP is the only CPU-based app we currently have data to crunch for. The way we chose seemed like the best compromise. Tough decision but I hope you understand.

Thanks,
Oliver

 

Einstein@Home Project

Darren Peets
Darren Peets
Joined: 19 Nov 09
Posts: 37
Credit: 98590451
RAC: 27066

I'd figured the alternative

I'd figured the alternative would be to send BRP CPU tasks for a couple weeks, but I suspect that the GPUs are so much better at those that that would be a much more serious waste of computing time than these orphaned 1.14 tasks.

Jasper
Jasper
Joined: 14 Feb 12
Posts: 63
Credit: 4032891
RAC: 0

Same for me:

Same for me: http://einsteinathome.org/workunit/227888120

It makes sense really. After all, if the results were supposed to be exactly the same, why bring a bug fix? I guess but don´t know for sure, that in my example it depends on who comes first now: the initial wingman who failed to reach the deadline, or the new wingman who got yet another WU this morning.

A pity resources are (need to be?) wasted like that though. I had the same feeling with the very short deadlines set at some point (with GW IIRC), which in the end caused numerous redundant resends - no invalids though, AFAIK. It also made me set Einstein to NNT for a few days, in favour of another project.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.