ATLAS issues

Brian Silvers
Brian Silvers
Joined: 26 Aug 05
Posts: 772
Credit: 282700
RAC: 0
Topic 194098

All of the tasks I have been assigned over the past few days have had the wingman be an ATLAS node. I've noticed that both systems have problems, one with the "no heartbeat from core client" issue, the other with a checkpointing problem.

host 1596531 - "no heartbeat"

host 1596730 - checkpointing

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4273
Credit: 245277384
RAC: 11889

ATLAS issues

We had some problems on ATLAS since the failure of the cooling unit on Sunday. Currently Einstein@home is not running on ATLAS and will be brought back up slowly during the next days.

BM

BM

Brian Silvers
Brian Silvers
Joined: 26 Aug 05
Posts: 772
Credit: 282700
RAC: 0

RE: We had some problems on

Message 89256 in response to message 89255

Quote:

We had some problems on ATLAS since the failure of the cooling unit on Sunday. Currently Einstein@home is not running on ATLAS and will be brought back up slowly during the next days.

BM

Thanks Bernd... Merry Christmas...

DanNeely
DanNeely
Joined: 4 Sep 05
Posts: 1364
Credit: 3562358667
RAC: 109

How're the older

How're the older merlin/morgaine clusters doing now?

Brian Silvers
Brian Silvers
Joined: 26 Aug 05
Posts: 772
Credit: 282700
RAC: 0

RE: How're the older

Message 89258 in response to message 89257

Quote:
How're the older merlin/morgaine clusters doing now?

Steffen Grunewald

DanNeely
DanNeely
Joined: 4 Sep 05
Posts: 1364
Credit: 3562358667
RAC: 109

RE: RE: How're the older

Message 89259 in response to message 89258

Quote:
Quote:
How're the older merlin/morgaine clusters doing now?

Steffen Grunewald

that's not really what i was asking about. Last spring there was an article linked that said the oldedr array was starting to lose blades on a semi regular basis. I was wondering how many blades were still left and if the failures had spread to the second cluster yet.

Brian Silvers
Brian Silvers
Joined: 26 Aug 05
Posts: 772
Credit: 282700
RAC: 0

RE: RE: RE: How're the

Message 89260 in response to message 89259

Quote:
Quote:
Quote:
How're the older merlin/morgaine clusters doing now?

Steffen Grunewald

that's not really what i was asking about. Last spring there was an article linked that said the oldedr array was starting to lose blades on a semi regular basis. I was wondering how many blades were still left and if the failures had spread to the second cluster yet.

Hmmm... Dunno that, obviously, but it looks like that account is doing fine, whatever that means... :-)

Bikeman (Heinz-Bernd Eggenstein)
Bikeman (Heinz-...
Moderator
Joined: 28 Aug 06
Posts: 3522
Credit: 691082925
RAC: 261316

I think I can remember the

I think I can remember the story about dying servers, and I think it affected the older Merlin cluster (> 5 years old), which is made from Dual Athlon MP systems (K7 architecture) in desktop cases. The newer Morgane Cluster is made from K8 Opterons in 19" cases IIRC and should not yet show that many signs of aging.

CU
Bikeman

Alinator
Alinator
Joined: 8 May 05
Posts: 927
Credit: 9352143
RAC: 0

Well, I'm just glad ATLAS got

Well, I'm just glad ATLAS got unbusy enough to come back and help clear out the template frequency my poor little T2400 had been trying to plow through virtually alone since the middle of October! :-D

It finally moved up 0.05 MHz with todays task download. ;-)

Of course, ATLAS probably is one of my wingmen (didn't look to see for sure, and you might not get many others after he pops into the picture) in this template as well. Most likely, I'll get left holding the bag when something more interesting/urgent comes along for him to do. :-)

Alinator

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4273
Credit: 245277384
RAC: 11889

RE: I think I can remember

Message 89263 in response to message 89261

Quote:

I think I can remember the story about dying servers, and I think it affected the older Merlin cluster (> 5 years old), which is made from Dual Athlon MP systems (K7 architecture) in desktop cases. The newer Morgane Cluster is made from K8 Opterons in 19" cases IIRC and should not yet show that many signs of aging.

CU
Bikeman


After the last power outage @AEI Potsdam a few weeks ago Merlin was no longer reactivated. It is actually dead now. I think it had less than 50 nodes left running of its original 180.

Morgane is running well, about half a dozen (of 615) nodes are down for hardware failures, that's all.

[edit]Looks like most of the failed nodes have been repaired, only one seems to be down. Learn more about the AEI clusters at gw.aei.mpg.de.[/edit]

BM

BM

DanNeely
DanNeely
Joined: 4 Sep 05
Posts: 1364
Credit: 3562358667
RAC: 109

Thanks Bernd.

Thanks Bernd.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.