uploading servers down again!?

Betreger
Betreger
Joined: 25 Feb 05
Posts: 992
Credit: 1593945661
RAC: 770399

They fixed it, whew.

They fixed it, whew.

CElliott
CElliott
Joined: 9 Feb 05
Posts: 28
Credit: 1007216395
RAC: 888410

Why does it make sense to

Why does it make sense to forbid downloads of new work to computers having a backlog of uploads?  The problem is always fixed eventually, and in the mean time the user has to look elsewhere for work.  Certainly, if that isn't loyalty busting, nothing will be.  In addition, the user can only upload a batch N of results at a time, so there is little way of hurting the server by monopolizing it with work, like in the bad old days, although admittedly, N presently has no effective upper limit.

 

Has anyone in E@H admin ever said publically why it was necessary (last spring) to change the FGRP client so that GPU tasks required a full CPU instead of just about 20% of a CPU?  If so, can you tell me where that statement is located?  

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4312
Credit: 250672854
RAC: 34831

Mumak wrote:Since it seems

Mumak wrote:
Since it seems that most of such failures happen over weekend, wouldn't it be a good idea to implement some remote monitoring of the systems?

We do have some pretty extensive monitoring for our systems in place. The problem here was that due a transient problem with the filesystem a few processes apparently locked up, but continued to run (they continued to run, but didn't do anything, not even writing to the logs). It's pretty hard to monitor that.

In addition to that a weekend where everyone of the E@H team is offline is rather exceptional.

BM

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.