Large number of WU stays in pending credits

etrecords
etrecords
Joined: 16 Mar 08
Posts: 2
Credit: 628711
RAC: 0
Topic 193759

The last hours I don't receive any new points. all result stay in pending. I have looked to some of the work units an I se that there are two result waiting for validation for this wu. Is ther something wrong with the validation.

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5872
Credit: 117410640959
RAC: 35570605

Large number of WU stays in pending credits

If you check the server status page, you can see on the far right hand side that there seems to be an unusually large number of results that are waiting for validation. As it is flagged in red, I assume that it is unusual and that the Admins would be aware of this or at least some warning would have been issued to at least one of them - he might be still engaged in or sleeping off the effects of a wild Friday night party so maybe there won't be any immediate action :-).

The validator process itself is showing as "running" so obviously there is something unusual with the database that is preventing the validation queue from draining. The queue is currently 15,740 and the status page is updated every twenty minutes. It will be interesting to see what the number is after the next page update. The current page is timed at 6:20AM UTC.

Things like this can happen from time to time so nothing to worry about (yet) :-).

EDIT: The update came after 25mins at 6:45AM UTC and shows the queue is now at 16,264. Doesn't look like too much validation is going on at the moment.

Cheers,
Gary.

Bikeman (Heinz-Bernd Eggenstein)
Bikeman (Heinz-...
Moderator
Joined: 28 Aug 06
Posts: 3522
Credit: 718816229
RAC: 1040698

Apparently the validator took

Apparently the validator took itself a time-out but was sent back to work again, catching up on the backlog. All should be back to normal pretty soon.

CU
Bikeman

Stick
Stick
Joined: 24 Feb 05
Posts: 790
Credit: 33126164
RAC: 1610

RE: Apparently the

Message 82698 in response to message 82697

Quote:

Apparently the validator took itself a time-out but was sent back to work again, catching up on the backlog. All should be back to normal pretty soon.

CU
Bikeman

I think the problem may be back. The number "waiting for validation" was declining for a while (from around 20,000 to around 16,000) - but its going back up again (18,079 right now).

Thunder
Thunder
Joined: 18 Jan 05
Posts: 138
Credit: 46754541
RAC: 0

The really odd part is that

The really odd part is that this acts like exactly the same problem that completely brought down Cosmology@Home and nearly ground SETI@Home to a halt as well within the last few days.

I've never been one to be a conspiracy nut, but it's sure something that makes you go "Hmmmmmm".

Brian Silvers
Brian Silvers
Joined: 26 Aug 05
Posts: 772
Credit: 282700
RAC: 0

RE: The really odd part is

Message 82700 in response to message 82699

Quote:

The really odd part is that this acts like exactly the same problem that completely brought down Cosmology@Home and nearly ground SETI@Home to a halt as well within the last few days.

I've never been one to be a conspiracy nut, but it's sure something that makes you go "Hmmmmmm".

I had already been thinking the same thing with the issues with Cosmology and SETI, but now with it happening here too? Server-side code change that was installed recently???

Thunder
Thunder
Joined: 18 Jan 05
Posts: 138
Credit: 46754541
RAC: 0

Now we're up to: waiting for

Now we're up to: waiting for validation 32,581

The number is steadily growing and based on what I've seen from other projects, it's only a matter of time until the database is dealing with so many WU's that can't be purged out that it goes into a complete meltdown.

I sincerely hope I'm not just crying wolf here, but I hope there's some way for Bernd or someone else who can deal with the issue to be aware and try to correct it before it gets worse. :\\

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5872
Credit: 117410640959
RAC: 35570605

RE: ... I hope there's some

Message 82702 in response to message 82701

Quote:
... I hope there's some way for Bernd or someone else who can deal with the issue to be aware and try to correct it before it gets worse. :\\

David Hammer restarted the validator yesterday after being alerted by Bikeman. He has been alerted again that the problem has returned. I've also drawn the "conspiracy theory" to his attention so he can at least be warned that this might not be an isolated, random glitch.

Cheers,
Gary.

Thunder
Thunder
Joined: 18 Jan 05
Posts: 138
Credit: 46754541
RAC: 0

RE: RE: ... I hope

Message 82703 in response to message 82702

Quote:
Quote:
... I hope there's some way for Bernd or someone else who can deal with the issue to be aware and try to correct it before it gets worse. :\\

David Hammer restarted the validator yesterday after being alerted by Bikeman. He has been alerted again that the problem has returned. I've also drawn the "conspiracy theory" to his attention so he can at least be warned that this might not be an isolated, random glitch.

Thanks much Gary.

I know Bikeman reported that restarting the validator seemed to correct things here at E@H and I've not been able to follow things with my own WUs enough to know if any are being validated or if it's at a complete standstill. I know that the admin at Cosmology@Home reported (if I'm remembering correctly) a large amount of disk I/O that he had no explanation for before they went totally south. The folks at SETI have only described it as "database problems", but like E@H and Cosmo, the problem manifested as a large number of WUs waiting for validation. They seem to have rectified it, though I've seen no explanation of how as of yet. I don't believe it was as simple as just restarting the validator, so there's always a chance that these are just 3 unrelated problems that just happened to create similar symptoms.

Good luck E@H gurus! :)

roadrunner_gs
roadrunner_gs
Joined: 7 Mar 06
Posts: 94
Credit: 3369656
RAC: 0

Why are there so called

Why are there so called overreplications (Zuvielfachauslieferungen ^^)

> http://einsteinathome.org/workunit/41168115
> http://einsteinathome.org/workunit/41165172
> http://einsteinathome.org/workunit/41162167
> http://einsteinathome.org/workunit/41157579

Two results are reported in, one other was replicated without need.
One would not get any credit albeit the work is done and could be used.
Just curious...

Bikeman (Heinz-Bernd Eggenstein)
Bikeman (Heinz-...
Moderator
Joined: 28 Aug 06
Posts: 3522
Credit: 718816229
RAC: 1040698

Strange indeed. As to the

Strange indeed.
As to the vaidation problems mentioned above for other BOINC projects, are there summaries available on the Web that describe what went wrong?

TIA

Bikeman

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.