FGRP4 App version 1.14 vs 1.15, was: There's no CPU work available

Darren Peets

Joined: 19 Nov 09

Posts: 37

Credit: 111452336

RAC: 52150

29 Sep 2015 23:52:04 UTC

Topic 198255

(moderation:

)

Server status shows no work available on FGRP4 or S6Bucket*, and that the FGRP4 work generator is not running. It looks like BRP work is not being served for CPUs. It's clearly been like this for a while, I'm out of tasks.

Hopefully people are already aware of this and are too busy scrambling to read posts, but I thought I'd make sure.

Gary Roberts

Moderator

Joined: 9 Feb 05

Posts: 5887

Credit: 119603591176

RAC: 24856759

FGRP4 App version 1.14 vs 1.15, was: There's no CPU work availab

30 Sep 2015 0:55:24 UTC

Message 134244

(moderation:

)

I sent a PM to Bernd and HB a few hours ago.

Cheers,
Gary.

Darren Peets

Joined: 19 Nov 09

Posts: 37

Credit: 111452336

RAC: 52150

Thanks! Looks like it's been

30 Sep 2015 7:16:39 UTC

Message 134245 in response to message 134244

(moderation:

)

Thanks! Looks like it's been fixed.

Oliver Behnke

Moderator

Administrator

Joined: 4 Sep 07

Posts: 988

Credit: 25171438

RAC: 0

Sorry for the short work

1 Oct 2015 8:05:36 UTC

Message 134246

(moderation:

)

Sorry for the short work production outage. We had to deploy a bug fix release (science-related) for FGRP that also required that we let the task pool run dry.

Cheers,
Oliver

Einstein@Home Project

Darren Peets

Joined: 19 Nov 09

Posts: 37

Credit: 111452336

RAC: 52150

Would I also infer that if my

1 Oct 2015 23:48:36 UTC

Message 134247 in response to message 134246

(moderation:

)

Would I also infer that if my wingman times out on a v1.14 task, I'll eventually lose a validation battle against the v1.15 resends?

Gary Roberts

Moderator

Joined: 9 Feb 05

Posts: 5887

Credit: 119603591176

RAC: 24856759

That's a good point. I don't

2 Oct 2015 4:59:49 UTC

Message 134248 in response to message 134247

(moderation:

)

That's a good point. I don't know the answer.

From what Oliver implied (science related bug fix) the new app may well get a different 'answer' from the previous app. That wouldn't necessarily create a validation issue if resends for tasks crunched by the old app could also be forced to use the old app rather than the new.

I decided to see if I could find a quorum demonstrating what actually happens. Quite quickly, I found this quorum which appears to show that there might just be a problem looming here.

Firstly, note that the oldest two tasks were completed by the 1.14 app and didn't agree with each other. The normal process of issuing the third task was followed and this task, crunched by 1.15, was returned very quickly. So we now know that 1.14 resends will be crunched by 1.15 and not 1.14.

The really worrying bit is that the third task is also 'inconclusive' - it doesn't agree with either of the others. I think there would be a fairly low chance of this happening randomly like this if 1.15 results were supposed to be fully compatible with 1.14 results. It just looks suspicious. It will look even more suspicious if the 4th task (1.15) ends up agreeing with the 3rd task and both the 1.14 tasks miss out.

Cheers,
Gary.

Logforme

Joined: 13 Aug 10

Posts: 332

Credit: 1714373961

RAC: 0

I got an invalid from a

2 Oct 2015 5:29:38 UTC

Message 134249 in response to message 134248

(moderation:

)

I got an invalid from a resend against 2 wingmen running 1.14: https://einsteinathome.org/workunit/227919231

All my other 1.15 tasks are validated against 1.15 and came out valid.

Darren Peets

Joined: 19 Nov 09

Posts: 37

Credit: 111452336

RAC: 52150

It looks like I've got a few

2 Oct 2015 6:42:25 UTC

Message 134250 in response to message 134249

(moderation:

)

It looks like I've got a few mixed resends, and I have yet to see 1.14 validate against 1.15:
https://einsteinathome.org/workunit/227895662
https://einsteinathome.org/workunit/227991400

(it looks like I'm going to lose another 6 or so soon unless my absentee wingmen get their acts together very quickly)

Oliver Behnke

Moderator

Administrator

Joined: 4 Sep 07

Posts: 988

Credit: 25171438

RAC: 0

Hi guys, Your assumption

2 Oct 2015 6:56:42 UTC

Message 134251

(moderation:

)

Hi guys,

Your assumption is correct. 1.15 will most likely produce different result than 1.14 which means cross-validation would fail in those cases. That's why we waited at least until the then current set of workunits had been fully produced and sent. The alternative would have been to wait until all in-flight FGRP would have returned, but that would have meant that most volunteers would run out of CPU tasks as FGRP is the only CPU-based app we currently have data to crunch for. The way we chose seemed like the best compromise. Tough decision but I hope you understand.

Thanks,
Oliver

Einstein@Home Project

Darren Peets

Joined: 19 Nov 09

Posts: 37

Credit: 111452336

RAC: 52150

I'd figured the alternative

2 Oct 2015 7:58:07 UTC

Message 134252 in response to message 134251

(moderation:

)

I'd figured the alternative would be to send BRP CPU tasks for a couple weeks, but I suspect that the GPUs are so much better at those that that would be a much more serious waste of computing time than these orphaned 1.14 tasks.

Jasper

Joined: 14 Feb 12

Posts: 63

Credit: 4032891

RAC: 0

Same for me:

2 Oct 2015 8:29:37 UTC

Message 134253 in response to message 134251

(moderation:

)

Same for me: http://einsteinathome.org/workunit/227888120

It makes sense really. After all, if the results were supposed to be exactly the same, why bring a bug fix? I guess but donÂ´t know for sure, that in my example it depends on who comes first now: the initial wingman who failed to reach the deadline, or the new wingman who got yet another WU this morning.

A pity resources are (need to be?) wasted like that though. I had the same feeling with the very short deadlines set at some point (with GW IIRC), which in the end caused numerous redundant resends - no invalids though, AFAIK. It also made me set Einstein to NNT for a few days, in favour of another project.

FGRP4 App version 1.14 vs 1.15, was: There's no CPU work available

Forums › Problems and Bug Reports

Comment viewing options

Forums › Problems and Bug Reports