Multi-Directed Continuous GW production work begins

Christian Beer
Christian Beer
Moderator
Joined: 9 Feb 05
Posts: 595
Credit: 101,357,494
RAC: 4,691

We just found an error in the

We just found an error in the scientific parameters used in the search. This is not repairable afterwards so we need to cancel the O1MD1 run right away. I already paused sending out tasks and we have a plan how to send an abort signal to all of your hosts once they connect to our server. Unfortunately this has to wait until tomorrow.

Edit: we are now sending the abort signal automatically.

In the meantime anyone who reads this message may abort all O1MD1 work currently on their hosts. The rerun with fixed parameters will start later this week.

I'm very sorry for the wasted computing cycles so far especially because I'm the one who introduced the error. I'll keep you posted about the rerun that will start within the next days.

Trotador
Trotador
Joined: 2 May 13
Posts: 58
Credit: 1,466,234,132
RAC: 2,960

Christian Beer wrote:We just

Christian Beer wrote:

We just found an error in the scientific parameters used in the search. This is not repairable afterwards so we need to cancel the O1MD1 run right away. I already paused sending out tasks and we have a plan how to send an abort signal to all of your hosts once they connect to our server. Unfortunately this has to wait until tomorrow.

In the meantime anyone who reads this message may abort all O1MD1 work currently on their hosts. The rerun with fixed parameters will start later this week.

I'm very sorry for the wasted computing cycles so far especially because I'm the one who introduced the error. I'll keep you posted about the rerun that will start within the next days.

Ok, cancelled. 

 

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5,551
Credit: 79,054,050,123
RAC: 65,189,007

Christian Beer wrote:We just

Christian Beer wrote:
We just found an error in the scientific parameters used in the search.

Thanks for the heads up.  I feel your pain at the moment!!

Does the fix involve the large data files or the app (or both) or perhaps something else?  In particular, I'm interested to know if all existing large data files and apps will need to be replaced or not?

Cheers,
Gary.

AgentB
AgentB
Joined: 17 Mar 12
Posts: 915
Credit: 513,211,304
RAC: 0

Christian Beer wrote: I'll

Christian Beer wrote:
I'll keep you posted about the rerun that will start within the next days.

Thanks Christian for posting the details and seeing the problem, and i'd just call it a successful SPARST stage 1 (sciene parameter and abort run sequence testing).

Stu D.
Stu D.
Joined: 25 Aug 15
Posts: 25
Credit: 641,617,323
RAC: 10,770

Hi, So per Christian Beer we

Hi,

So per Christian Beer we should abort both O1MD1CV and O1MD1G tasks, correct?

Thanks,

Stu

Christian Beer
Christian Beer
Moderator
Joined: 9 Feb 05
Posts: 595
Credit: 101,357,494
RAC: 4,691

I just configured the server

I just configured the server to send out the abort signal for all O1MD1CV and O1MD1G tasks. You don't need to abort them manually from now on. The command should also abort already running tasks. If you have a really old Client you may need to abort manually.

The fix is related to the parameters we use when we create the tasks. The datafiles and the applications are fine and will be reused. They should not get deleted on your hosts.

Conan
Conan
Joined: 19 Jun 05
Posts: 162
Credit: 5,945,411
RAC: 329

I believe I only lost the

I believe I only lost the credit for about 4 work units that were completed and returned but the validating work units were then cancelled so no validation is possible for them.

On the bright side at least the hours were counted on WUProp@Home and contribute to my next badge there.

 

(Edit : well looks like a few more of those work units were around and I have now lost around a dozen work units either completed and can't be validated or cancelled whilst they were running, so no points there either and the loss of a lot of hours work.

Work flowing again and some validated work units have now been processed.

Conan

Betreger
Betreger
Joined: 25 Feb 05
Posts: 968
Credit: 756,501,637
RAC: 286,844

I had 5 cancelled about 30

I had 5 cancelled about 30 hours wasted, oh well I shall crunch on.

Christian Beer
Christian Beer
Moderator
Joined: 9 Feb 05
Posts: 595
Credit: 101,357,494
RAC: 4,691

Work is flowing again.

Work is flowing again.

Stu D.
Stu D.
Joined: 25 Aug 15
Posts: 25
Credit: 641,617,323
RAC: 10,770

Thank you Christian

Thank you Christian Beer,

I'll be looking for these work units on my cpu's from Cassiopeia A!

cassiopeia a.jpg

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.