What to do about the current Seti Server Overload

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5,591
Credit: 85,328,769,762
RAC: 67,183,191

RE: I have 1 WU queued for

Message 20754 in response to message 20753

Quote:

I have 1 WU queued for downloading, I stopped requesting new work, suspended Seti, and wanted to cancel the to be downloaded WU. But don't I have to connect to Seti for that request? In messages it says:
request reschedule cpus: result op
and
request reschedule cpus: project op

But in Work the WU is still trying to be downloaded, is this normal?

Yes, this is normal. If you leave that WU that is trying to download, at some point on one of the automatic retries, you will probably get lucky and actually complete the download. However, if it were my decision, I'd be thinking that this current Seti outage has the potential to be a long drawn out affair and I wouldn't want the hassle of trying to watch what happens to that stuck download.

Personally, I would go to your "Transfers" tab, select that download, and then abort it. Gets rid of a potentially messy problem. The decision of course is yours.

Cheers,
Gary.

Skyflash
Skyflash
Joined: 7 Dec 05
Posts: 2
Credit: 413,805
RAC: 0

Ah, thanx, that's what I

Ah, thanx, that's what I wanted, if I unsuspend (sorry, English is not my first language) when S@H is normal again, will it automatically restart downloading this unit?

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5,591
Credit: 85,328,769,762
RAC: 67,183,191

RE: Ah, thanx, that's what

Message 20756 in response to message 20755

Quote:
Ah, thanx, that's what I wanted, if I unsuspend (sorry, English is not my first language) when S@H is normal again, will it automatically restart downloading this unit?

Yes. That would be the best plan to follow. Hopefully Seti will get their problems sorted out before too much longer :). When they do, just "Resume" the Seti project.

EDIT: To clarify a few points about stuck Seti transfers. Some of these were prompted by Jord's helpful comments.
1. If you "Suspend" a project, the word on the button will change to "Resume"
2. While a project is suspended it will not initiate new transfers.
3. Existing transfers will still be retried from time to time by BOINC and occasionally may complete.
4. If an upload happens to complete you should "Update" the project to get it reported. Do not abort a stuck upload.
5. If a stuck download completes, the new work will sit in your work tab until you "Resume" Seti.
6. You can freely abort stuck downloads (while stuck) if they annoy you.
7. If you do abort a stuck download, you will never see that particular WU again, but that doesn't matter.
8. If you don't abort, resuming will allow the stuck download to complete if it can. When you get this WU, its deadline will have been partly used.

Cheers,
Gary.

Jord
Joined: 26 Jan 05
Posts: 2,952
Credit: 5,779,100
RAC: 0

Um Gary, by Suspending a

Um Gary, by Suspending a project you only pause the science application its computations.

You will not suspend the uploads/downloads waiting, as that is in BOINC Manager's hands now. You can suspend network activity, thereby making sure BOINC cannot connect to the outside world, and thereby pausing uploads & downloads.

But the normal Project & Work Unit Suspend mode only pauses the working on actual work units on the computer.

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5,591
Credit: 85,328,769,762
RAC: 67,183,191

RE: Um Gary, by Suspending

Message 20758 in response to message 20757

Quote:

Um Gary, by Suspending a project you only pause the science application its computations.

You will not suspend the uploads/downloads waiting, as that is in BOINC Manager's hands now. You can suspend network activity, thereby making sure BOINC cannot connect to the outside world, and thereby pausing uploads & downloads.

But the normal Project & Work Unit Suspend mode only pauses the working on actual work units on the computer.

Jord,

Yes, I understand all this. Who cares about the uploads/downloads that are currently stuck? That's not the point. They are going nowhere anyway and if one occasionally happens to fluke a connection and complete, so what?? The point is that CPUs are becoming idle because stuck downloads are causing the scheduler to think that Seti has work. Other projects have negative LTDs and are therefore being prevented from getting new work. Setting "No new work" stops Seti from starting more stuck downloads but doesn't allow EAH to get work.

The only workaround, without resetting LTDs manually, seems to be to "Suspend" Seti. I've now watched this allow two of my idle boxes to immediately get work from EAH. There appears to be a growing number of people who are going to be affected by this as the Seti server overload drags on. Suspending Seti is the easiest short term fix for these people. All people have to do, say about once a day, is check the front page of Seti for any announcements that things are on the improve. They can then "Resume" Seti (and turn on New work fetch) to see if normal work flow will start. If it doesn't and they get another stuck download that persists, they will eventually have to "Suspend" again and wait it out a bit more. My main concern is that as Dec 15 approaches, things may very well get worse.

Suspending network activity is not an option as the CPUs will still remain idle. People just want to crunch something :).

Cheers,
Gary.

Jord
Joined: 26 Jan 05
Posts: 2,952
Credit: 5,779,100
RAC: 0

RE: The point is that CPUs

Message 20759 in response to message 20758

Quote:
The point is that CPUs are becoming idle because stuck downloads are causing the scheduler to think that Seti has work.


I had 3 uploading, 2 downloading units and an Einstein unit went down onto my drive inbetween them all. No need to suspend any project.

I may have imagined it.
I do know I was at 5.2.13 at that time.
I am now at 5.3.2, but if you want me to, I'll go backl to 5.2.13 to test it again.

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5,591
Credit: 85,328,769,762
RAC: 67,183,191

RE: RE: The point is that

Message 20760 in response to message 20759

Quote:
Quote:
The point is that CPUs are becoming idle because stuck downloads are causing the scheduler to think that Seti has work.

I had 3 uploading, 2 downloading units and an Einstein unit went down onto my drive inbetween them all. No need to suspend any project.

Ahhhh... but did EAH have a negative LTD at the time?? I've seen exactly this too at an earlier stage before Seti got too far behind in it's resource share by not being able to get new work. As the saga drags on and EAH gets ahead (negative LTD) then the CPU becomes idle. It's happened to me twice, version 5.2.5.

Quote:
I may have imagined it.
I do know I was at 5.2.13 at that time.
I am now at 5.3.2, but if you want me to, I'll go backl to 5.2.13 to test it again.

I don't think you imagine too much :). Actually we need to hear from JM7 in case there were any scheduler changes that came in between versions around 5.2.5 to 5.2.8 and the current recommended 5.2.13. It might be fixed in 5.2.13?? I'll go check if there are reports of this behaviour (idle cpu) for people on 5.2.13. No need for you to change. I've got a few on 5.2.13 that I can play with, thanks.

Thanks very much for your feedback. Very helpful!!

Cheers,
Gary.

Cruncher
Cruncher
Joined: 6 Dec 05
Posts: 3
Credit: 692
RAC: 0

Nope, I'm using 5.2.13, and

Nope, I'm using 5.2.13, and the problem does occur.
One thing we could check, though, would be whether the CPU sits idle only for one scheduler interval (the 60-min-default). I had some strange behaviour concerning LHC, where the server wouldn't be checked for work, even when updating manually (see here). But yesterday or so, I noticed that the LHC server *is* contacted every other scheduler interval or so, even though the corresponding LTD is still negative. SETI was *not* suspended and did some work at that time, however.

Steven Gray
Steven Gray
Joined: 25 Nov 05
Posts: 1
Credit: 369,279
RAC: 0

Out of curiosity, why don't

Out of curiosity, why don't the SETI folk suspend all new work downloads and focus on servicing uploads of completed work? With no new work being sent out, the glut would have to start clearing sooner or later.

Michael Roycraft
Michael Roycraft
Joined: 10 Mar 05
Posts: 846
Credit: 157,718
RAC: 0

RE: Out of curiosity, why

Message 20763 in response to message 20762

Quote:
Out of curiosity, why don't the SETI folk suspend all new work downloads and focus on servicing uploads of completed work? With no new work being sent out, the glut would have to start clearing sooner or later.

Owner,

Good idea, but with one problem - they would have all the SetiClassic and other SetiBoinc newbies screaming "Where's my work?", or more pathetic whining about "Boinc doesn't work - it sux". Bad enough, all the new "experts" with 2 days experience on Boinc claiming it's a piece of....trash. When someone who is used to riding a bicycle gets behind the wheel of a Cadillac does it mean that the car is trash just because he has to start it with a key, learn to adjust the seat and run the Air Conditioning?

(edit) Come on, now. Ya gotta luv that analogy :-)

microcraft
"The arc of history is long, but it bends toward justice" - MLK

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.