I rest my case

Ned Ludd

Joined: 9 Feb 05

Posts: 23

Credit: 56045

RAC: 0

> Ive been in the businees

25 Feb 2005 16:53:52 UTC

Message 4369 in response to message 4362

(moderation:

)

> Ive been in the businees continuity racket for years. Stuff like UPS's are
> not a priority with folks like SETI because there is no financial pain
> involved in being down. Maybe this last experience will be painful enough to
> motivate thinking in a different way. But no matter how much of a pain in the
> rear it might be, if there is no $$$ involved in being down, its difficult for
> some to make the business case to invest in uptime.

Some of my associates think I'm insane, but from time to time I go around and yank the UPS plugs out of the wall.

Because, ultimately, that is the only way to tell if a UPS works -- and if you are afraid to do that, you need new UPSes.

Doris and Jens

Joined: 30 Oct 04

Posts: 30

Credit: 2688588

RAC: 0

The problem with a project

25 Feb 2005 16:53:54 UTC

Message 4370 in response to message 4368

(moderation:

)

The problem with a project like SETI@home is that many hardware is partial or full donated and not really fully sold to them. And it seems to be much more easy to get a new server as donation then a UPS.

The second problem is, that there are UPS in the server room, but because of the migration they have to manage two projects at the moment and not all hardware can be placed in the server room. It am sure it not possible with this known small budgets, to buy more UPS power only for the current time until it is all back to one project and one server room.

At all this is typical for such a migration scenario. I am working in IT business and did a lot of this projects in the past. You are always living with the risk that you can not fully protect all systems all time until all hardware is at its final place.

One more problem - and the report sounds like this problem - is that it is not enough to use a UPS, you must configure a safe automatic shutdown of all applications and operating system before the UPS is at its end, but not for every small outage. With the different systems, some of them new and strange and having issues, it is difficty to get it working and then test it carefully.

I am sure the system administrators of the SETI@home project didn't have a nice time at the moment. As posted by other in this thread, it is a very special situation to migrate a 7/24 project with 500000 active users to new hard- and software with extremly limited budget. I believe I am really good in my job, but don't dare me to say I could do it better.

Hopefully they have a little bit more luck in the future. :)

Greetings from Bremen/Germany

Jens Seidler (TheBigJens)

Ned Ludd

Joined: 9 Feb 05

Posts: 23

Credit: 56045

RAC: 0

> My current UPS is about

25 Feb 2005 16:58:24 UTC

Message 4371 in response to message 4364

(moderation:

)

> My current UPS is about dead. I went to Best Buy(german translation:
> Wonderful Adult/child Electronics playland/store, hehehe).
>
> THe UPSes ran from $40-$100. So, Seti can't afford $500 for all new UPSes?

I'm not going to say that this isn't a good idea, but....

We're not talking typical desktop machines, but servers with multiple disks and multiple processors, and I'm not sure these would be suitable -- and I'm not talking output power, I'm talking battery capacity.

If you read the announcement, the folks in Berkeley had UPSes, but they didn't run long enough for everything to be gracefully shut down. I'm second-guessing what happened, but it's probably a combination of a UPS that's a little small, and old batteries. New batteries might have been enough.

Paul D. Buck

Joined: 17 Jan 05

Posts: 754

Credit: 5385205

RAC: 0

> We're not talking typical

25 Feb 2005 17:23:52 UTC

Message 4372 in response to message 4371

(moderation:

)

> We're not talking typical desktop machines, but servers with multiple disks
> and multiple processors, and I'm not sure these would be suitable -- and I'm
> not talking output power, I'm talking battery capacity.

To run my 6-10 computers I use a 3000 VA UPS that will hold them up for about 15 minutes ... it cost about $1,800 ... and it is not a UPS either, it is an SPS which is even cheaper.

But your main point is that the protection is not a $100 purchase is correct ...

Seti-Cruncher

Joined: 9 Feb 05

Posts: 70

Credit: 7114

RAC: 0

It might interest some of you

25 Feb 2005 18:18:20 UTC

Message 4373

(moderation:

)

It might interest some of you to see the museum pieces that Seti runs on - if you haven't already. ;)

Piccies

Be lucky,

Neil

Ziran

Joined: 26 Nov 04

Posts: 194

Credit: 661785

RAC: 1978

> > > If BOINC was made

25 Feb 2005 18:38:19 UTC

Message 4374 in response to message 4354

(moderation:

)

>
> > If BOINC was made smart enough to not download WUâ€™s from projects that
> > wouldnâ€™t get CPU time in the next couple of days, we could set it up
> to
> > crunch WUâ€™s serial instead of parallel like today. Then deadlines
> wouldnâ€™t
> > be a problem, even if we set resource chare to project A=10000 B=1.
>
> Actually, it might be. The server knows how long it takes you to return work
> units based on past history, and it certainly seems that it is adjusting how
> much work it offers based on how fast stuff comes back.
>
> ... and if you are crunching more projects, keeping "days between connections"
> low seems like a good thing.

The problem is that you will always have at least one WU from each project you are attached to on your computer. It is the client that requests work based on your "days between connections" settings. The servers only know information regarding one project. What i basically want to do is to crunch one WU from start till finish. Yes i know i can do this now, but the problem is that the WU from the other project was downloaded before i start crunching this one.

My "days between connections" is set to 0.02 so i can return work before the deadline. Because of the way BOINC downloads work at the moment, i canâ€™t participate in multiple projects and still meat the deadline of all of them.

Then you're really interested in a subject, there is no way to avoid it. You have to read the Manual.

Ned Ludd

Joined: 9 Feb 05

Posts: 23

Credit: 56045

RAC: 0

> To run my 6-10 computers I

25 Feb 2005 18:56:41 UTC

Message 4375 in response to message 4372

(moderation:

)

> To run my 6-10 computers I use a 3000 VA UPS that will hold them up for about
> 15 minutes ... it cost about $1,800 ... and it is not a UPS either, it is an
> SPS which is even cheaper.

On roughly the same size server farm I run a pair of 2200VA units. I managed to acquire them at a very good price because the batteries were dead.

For one of my UPSes, APC wants something like $600 for new battery cartridges, and all they are are standard AGM-type cells with double-sticky. I think I paid about $100 per UPS to replace them.

The automatic transfer switch shifts the whole load from one UPS to the other as needed.

Ned Ludd

Joined: 9 Feb 05

Posts: 23

Credit: 56045

RAC: 0

> One more problem - and the

25 Feb 2005 19:02:57 UTC

Message 4376 in response to message 4370

(moderation:

)

> One more problem - and the report sounds like this problem - is that it is not
> enough to use a UPS, you must configure a safe automatic shutdown of all
> applications and operating system before the UPS is at its end, but not for
> every small outage. With the different systems, some of them new and strange
> and having issues, it is difficty to get it working and then test it
> carefully.

The way I read this was that they had the UPSes, had the shutdown software, but the batteries just didn't last long enough for things to come down gracefully.

Cochise

Joined: 11 Feb 05

Posts: 38

Credit: 3717

RAC: 0

Yep, having the equipment is

25 Feb 2005 19:22:12 UTC

Message 4377

(moderation:

)

Yep, having the equipment is only 1/2 of the problem. The other half is having Documented Operational Procedures on Regular Periodic Testing of the system. I had a large customer (Broadwing), they tested almost everything. But not the transfer switch. Well, power outage, transfer switch failed of course, the Huge data center sucked the UPS dry in 10 mins. Lots of unhappy people. At least if the switch had failed in testing, they would have been ready for it with everyone on standby.

You have to test EVERYTHING REGULARLY and have documentation on how to do it and what to do if it fails. Putting together an ROI in these cases is very easy.

Doris and Jens

Joined: 30 Oct 04

Posts: 30

Credit: 2688588

RAC: 0

> The way I read this was

25 Feb 2005 19:32:02 UTC

Message 4378 in response to message 4376

(moderation:

)

> The way I read this was that they had the UPSes, had the shutdown software,
> but the batteries just didn't last long enough for things to come down
> gracefully.

Yes, thats what I was talking about. You have to configure a save shutdown (possible synchronized between servers) without a user action, then to test how long it needs to shutdown all operations and then to test how long the UPS holds the power, how long the security buffer should be and then to decide how long to wait until you start with the shutdown. Starts you shutdown to early, every small outage of some minutes may stop the project for a longer time. It needs time to shutdown and may be difficulty to startup this combined network of integrated services automaticly. And that means possible to wait for the operaters beginning the work in the morning.

And when this is all well done, then you have to check and test it again on every change. The batteries become older and didn't stay as long as before, one server get a new harddisk, the other more RAM and a new Fan. The possiblity that your plan failed when it meet the reallity is not small. ;)

Many words, short meaning. ;) It is not a problem of 100 or 1000 Dollar to protect a server system like that currently running for SETI@home. You have to be lucky too. ;)

Greetings from Bremen/Germany

Jens Seidler (TheBigJens)

I rest my case

Forums › Cruncher's Corner

Comment viewing options

Forums › Cruncher's Corner