Validation Errors in ABP Workunits

A recent modification to ABP workunit generation (bundling 10 dispersion measures into a single workunit) has caused all these new workunits to fail with validation errors. We have fixed the mistake, and have been sending out corrected ABP work since 14:20 UTC, October 2nd. In order to use and give credits for the work which (incorrectly) generated validation errors, the automatic validation of incoming results has been temporarily halted. Validation will be done manually, once per day, for (roughly) the next two weeks.

Comments

astro-marwil
astro-marwil
Joined: 28 May 05
Posts: 532
Credit: 651746543
RAC: 1106929

Validation Errors in ABP Workunits

Oh, that´s the reason, why my discspace used, increased by 50%. Is it advisable to kill this files for reason of precaution? Up to now I didn´t crunche one of them, but there are a dozen or so in stock.
Kind regards
Martin

Wedge009
Wedge009
Joined: 5 Mar 05
Posts: 122
Credit: 17424659375
RAC: 7048951

Thank you for the

Thank you for the information, and appreciate the efforts to recompense volunteers for 'lost' credit.

Soli Deo Gloria

Mike Hewson
Mike Hewson
Moderator
Joined: 1 Dec 05
Posts: 6591
Credit: 319485300
RAC: 427463

The credit will be awarded,

The credit will be awarded, it's a matter of when. As indicated elsewhere it was a server side problem - the validator at one ( worldwide ) location didn't have the correct files to use - so once the overall project work flows have been correctly re-established to our satisfaction the credit will be sorted. The devs are trying to achieve this without affected users having to do anything especial at their end.

Cheers, Mike.

I have made this letter longer than usual because I lack the time to make it shorter ...

... and my other CPU is a Ryzen 5950X :-) Blaise Pascal

mikey
mikey
Joined: 22 Jan 05
Posts: 12702
Credit: 1839106224
RAC: 3612

RE: The credit will be

Message 99868 in response to message 99867

Quote:

The credit will be awarded, it's a matter of when. As indicated elsewhere it was a server side problem - the validator at one ( worldwide ) location didn't have the correct files to use - so once the overall project work flows have been correctly re-established to our satisfaction the credit will be sorted. The devs are trying to achieve this without affected users having to do anything especial at their end.

Cheers, Mike.

Thanks Mike!!

Mike Hewson
Mike Hewson
Moderator
Joined: 1 Dec 05
Posts: 6591
Credit: 319485300
RAC: 427463

RE: RE: The credit will

Message 99869 in response to message 99868

Quote:
Quote:

The credit will be awarded, it's a matter of when. As indicated elsewhere it was a server side problem - the validator at one ( worldwide ) location didn't have the correct files to use - so once the overall project work flows have been correctly re-established to our satisfaction the credit will be sorted. The devs are trying to achieve this without affected users having to do anything especial at their end.

Cheers, Mike.

Thanks Mike!!


The lads are down at the SQL Syntax Tarpit as we speak ......

Cheers, Mike.

I have made this letter longer than usual because I lack the time to make it shorter ...

... and my other CPU is a Ryzen 5950X :-) Blaise Pascal

BarryAZ
BarryAZ
Joined: 8 May 05
Posts: 190
Credit: 325179522
RAC: 12185

I see that the scheduler is

Message 99870 in response to message 99869

I see that the scheduler is offline and has been for several hours -- I think that means you won't (until it is returned to service) have to new validate completed work manually -- since it can't be reported.

Is there a timeline for returning things to normal? If so, could you share it?

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4312
Credit: 250661459
RAC: 34531

Our old database server is

Message 99871 in response to message 99870

Our old database server is clearly at its limits these days. One reason for increasing the "size" of the APP workunits was to have fewer tasks to keep track of, in order to release the stress on the DB.

An effect of the slow DB was that the DB queries needed to correct the error mentioned in the news took quite some time, and the scheduler couldn't access the DB for about 8h yesterday.

Things should be back to normal now. The scheduler is running continuously since 11AM UTC last night, ABP work is being generated again and sent out, and the ABP work that is sent out since Sunday should be validated as usual.

ABP tasks that have been created with a larger bundle size before the fix were or will be uploaded to a server where the validator can't access the files. Therefore validation of these tasks has been postponed until we manually transferred the result files from one server to the other. This may take some time, but don't worry, it will happen eventually. You could already notice that these tasks that showed a "validate error" now should show "waiting for validation" again (and, if you a really curious, odd values for "minmum quorum" and "initial replication" in the WU display).

BM

BM

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4312
Credit: 250661459
RAC: 34531

Update: The result files for

Message 99872 in response to message 99871

Update: The result files for >65000 (of ~70000) affected workunits have been transferred and re-validation issued. You should find the results validated and credited within hours.

BM

BM

mikey
mikey
Joined: 22 Jan 05
Posts: 12702
Credit: 1839106224
RAC: 3612

RE: Update: The result

Message 99873 in response to message 99872

Quote:

Update: The result files for >65000 (of ~70000) affected workunits have been transferred and re-validation issued. You should find the results validated and credited within hours.

BM

Thank you, thank you VERY much!!

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4312
Credit: 250661459
RAC: 34531

Turns out I was a bit too

Message 99874 in response to message 99873

Turns out I was a bit too optimistic about this re-validation and issued too many workunits. Noticing this error I stopped the ABP validators again yesterday evening to fix this. There are only <40.000 WUs that can alredy be re-validated, and that should be done by the end of today.

BM

BM

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4312
Credit: 250661459
RAC: 34531

Update: The ABP2 validators

Message 99875 in response to message 99874

Update: The ABP2 validators are crunching through the backlogs, still 45000 workunits to go. When they are done, workunits like this one (minimum quorum = 2) should have been validated and tasks credited. For workunits like this one (minimum quorum = 21) files are still missing on the right server. Validating them now would result in validate errors again, so validation is further postponed.

BM

BM

mikey
mikey
Joined: 22 Jan 05
Posts: 12702
Credit: 1839106224
RAC: 3612

RE: Update: The ABP2

Message 99876 in response to message 99875

Quote:

Update: The ABP2 validators are crunching through the backlogs, still 45000 workunits to go. When they are done, workunits like this one (minimum quorum = 2) should have been validated and tasks credited. For workunits like this one (minimum quorum = 21) files are still missing on the right server. Validating them now would result in validate errors again, so validation is further postponed.

BM

So are you going to cancel all those that can't be validated so we users don't waste time trying to crunch them?

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4312
Credit: 250661459
RAC: 34531

RE: So are you going to

Message 99877 in response to message 99876

Quote:
So are you going to cancel all those that can't be validated so we users don't waste time trying to crunch them?

The result files that drop in on the wrong server will be manually moved to the correct server and the workunits will be validated there. Additional tasks will be generated, sent and crunched only if found necessary after this validation. There shouldn't be a need to cancel workunits. Actually canceling would mean that the tasks that have already been crunched will be worthless and not been credited, I'd rather want to avoid this.

BM

BM

tolafoph
tolafoph
Joined: 14 Sep 07
Posts: 122
Credit: 74659937
RAC: 0

If everyone gets their

If everyone gets their credits, the actuall impact on the project isn´t that great. It´s may be pointless to crunch some WU a 3rd or 4th time, but if it´s the easiest way to avoid other problems like no granted credits, it´s the right thing to do. The whole pulsar search will take only a few days longer.

mikey
mikey
Joined: 22 Jan 05
Posts: 12702
Credit: 1839106224
RAC: 3612

RE: RE: So are you going

Message 99879 in response to message 99877

Quote:
Quote:
So are you going to cancel all those that can't be validated so we users don't waste time trying to crunch them?

The result files that drop in on the wrong server will be manually moved to the correct server and the workunits will be validated there. Additional tasks will be generated, sent and crunched only if found necessary after this validation. There shouldn't be a need to cancel workunits. Actually canceling would mean that the tasks that have already been crunched will be worthless and not been credited, I'd rather want to avoid this.

BM

I think that is a wonderful plan!!! I guess that is why you make the big bucks and are the Admin!! ;-))

Mike Hewson
Mike Hewson
Moderator
Joined: 1 Dec 05
Posts: 6591
Credit: 319485300
RAC: 427463

RE: If everyone gets their

Message 99880 in response to message 99878

Quote:
If everyone gets their credits, the actuall impact on the project isn´t that great. It´s may be pointless to crunch some WU a 3rd or 4th time, but if it´s the easiest way to avoid other problems like no granted credits, it´s the right thing to do. The whole pulsar search will take only a few days longer.


It's possible that a few units have-been/will-be granted credit when it is not known that it was precisely deserved. But hey! That's better than the converse. Like a line call on a tennis ball - one can get the benefit of the doubt simply by being on the court. Could be worse ..... :-) :-)

Cheers, Mike.

I have made this letter longer than usual because I lack the time to make it shorter ...

... and my other CPU is a Ryzen 5950X :-) Blaise Pascal

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4312
Credit: 250661459
RAC: 34531

Update: Another charge of

Update: Another charge of result file that dropped in so far was transferred and 8376 more workunits issued for validation, leaving 23166 workunits on hold.

We'll issue another transfer & validation session around next weekend when the deadline of the tasks that were issued with the wrong upload URL is reached. What drops in on the wrong server after that date will not be credited, but would have timed out anyway.

BM

BM

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4312
Credit: 250661459
RAC: 34531

(final) Update: I transferred

(final) Update: I transferred the result files that dropped in until this morning and will re-issue validation of all workunits that were put on hold so far. What has not come back yet of the 'wrong' tasks is past the deadline and will time out anyway.

BM

BM

Rechenkuenstler
Rechenkuenstler
Joined: 22 Aug 10
Posts: 138
Credit: 102567115
RAC: 0

Hey.....Don't matter with the

Message 99883 in response to message 99882

Hey.....Don't matter with the credits.
I, for my person, are crunching because of interest in the project, and not for credits. OK, it's fine to see yout account growing, but if there are some not credited WU's......so what

Tom Richardson
Tom Richardson
Joined: 11 Aug 08
Posts: 9
Credit: 44019
RAC: 0

RE: Hey.....Don't matter

Message 99884 in response to message 99883

Quote:
Hey.....Don't matter with the credits.
I, for my person, are crunching because of interest in the project, and not for credits. OK, it's fine to see yout account growing, but if there are some not credited WU's......so what

Hear, hear, i could not agree more with those comments !

Tom.

mikey
mikey
Joined: 22 Jan 05
Posts: 12702
Credit: 1839106224
RAC: 3612

RE: Hey.....Don't matter

Message 99885 in response to message 99883

Quote:
Hey.....Don't matter with the credits.
I, for my person, are crunching because of interest in the project, and not for credits. OK, it's fine to see yout account growing, but if there are some not credited WU's......so what

But don't the validated, and credited, units signify that the unit was crunched successfully and therefore a help to the project? And a unit that does not get credits, because of some reason, would therefore not be a help?

Rechenkuenstler
Rechenkuenstler
Joined: 22 Aug 10
Posts: 138
Credit: 102567115
RAC: 0

You are right. But I trust

Message 99886 in response to message 99885

You are right. But I trust our admins, that they get all work accomplished. And they'll do there best, to do it correct. OK. If there is any reason, why they should have to resend a WU, than another person gets the credits and you have an invalid WU......

OK. Work is done, and I have some missing credits. But the one, who has successfully crunched, is part of the community. And if I look at my account. More than 99 % are credited correct. Not worth to discuss about it.

mikey
mikey
Joined: 22 Jan 05
Posts: 12702
Credit: 1839106224
RAC: 3612

RE: You are right. But I

Message 99887 in response to message 99886

Quote:

You are right. But I trust our admins, that they get all work accomplished. And they'll do there best, to do it correct. OK. If there is any reason, why they should have to resend a WU, than another person gets the credits and you have an invalid WU......

OK. Work is done, and I have some missing credits. But the one, who has successfully crunched, is part of the community. And if I look at my account. More than 99 % are credited correct. Not worth to discuss about it.

I agree and should have said all that, I am glad you did as I would not have said it as well as you did!

FORREST
FORREST
Joined: 14 Aug 10
Posts: 1
Credit: 20831980
RAC: 0

Are the validations still

Are the validations still being done manually? My computers are still running at the same rate as reflected by the number of credits earned in August and September, but in the past few weeks, my credits have barely moved.
FORREST