BRP4 & FGRP1 download (server) problems

Mike Hewson

Moderator

Joined: 1 Dec 05

Posts: 6588

Credit: 315013245

RAC: 304775

RE: This sort of thing

24 Nov 2011 5:25:04 UTC

Message 107610 in response to message 107609

(moderation:

)

Quote:

This sort of thing should be posted to the front page news section. That way also gets the word out via RSS.

ROFL! Oh, yeah. Right. So the people who cannot reach us can tell us that? :-)

Cheers, Mike.

( edit ) For the rest of us : those who hold the validators for editing the web content are currently incommunicado .... but the next time my car runs out of petrol I'll be sure to drive it to the next town to fill up.

I have made this letter longer than usual because I lack the time to make it shorter ...

... and my other CPU is a Ryzen 5950X :-) Blaise Pascal

Mike Hewson

Moderator

Joined: 1 Dec 05

Posts: 6588

Credit: 315013245

RAC: 304775

Now, as to the original issue

24 Nov 2011 6:24:40 UTC

Message 107611

(moderation:

)

Now, as to the original issue : it seems E@H may be a victim of it's own success. Again alas. Posters may recall analogous problems in the past when there's been a change in workflow patterns due to new work unit types etc. Thresholds get reached, bandwidths peak ..... that sort of thing. AFAIK a key problem is maintaining logical coherence of activities across separated hardware. Naturally in a perfect world with infinite funds, plenty of staff and an accurate crystal ball these scenarios would be escaped or never entered. :-)

In any case please bear with us. Most likely temporizing measures will be put in place and then followed by more lasting ones. Right now there's alot of back end discussion on a wide range of alternatives. Your patience is very much appreciated, but I guess now might be the time ( & I can't think of a better sort of occasion ) to switch to a backup BOINC project of your choice meantime if that suits your mindset.

Cheers, Mike.

I have made this letter longer than usual because I lack the time to make it shorter ...

... and my other CPU is a Ryzen 5950X :-) Blaise Pascal

Bernd Machenschalk

Moderator

Administrator

Joined: 15 Oct 04

Posts: 4312

Credit: 250362588

RAC: 35541

As for the network outage in

24 Nov 2011 8:52:32 UTC

Message 107612

(moderation:

)

As for the network outage in Hannover: A couple of network switches suddenly blew fuses, the reason being investigated. Probably power malfunction.

Anyway, switches are back to normal operation, the server issue still being worked on.

Oliver Behnke

Moderator

Administrator

Joined: 4 Sep 07

Posts: 984

Credit: 25171376

RAC: 43

Ok, we should be back on

24 Nov 2011 14:20:57 UTC

Message 107613

(moderation:

)

Ok, we should be back on track now. We identified the cause and fixed it. Data are flowing again. We'll monitor the situation and ramp up BRP/FGRP work unit distribution over the next hours/days...

Thanks for your patience!

Oliver

Einstein@Home Project

Richard Haselgrove

Joined: 10 Dec 05

Posts: 2143

Credit: 2954576624

RAC: 715215

RE: Ok, we should be back

24 Nov 2011 14:24:22 UTC

Message 107614 in response to message 107613

(moderation:

)

Quote:

Ok, we should be back on track now. We identified the cause and fixed it. Data is flowing again. We'll monitor the situation and ramp up BRP/FGRP work unit distribution over the next hours/days...

Thanks for your patience!

Oliver

Would you mind telling us what it turned out to be, in case the experience might be useful for other BOINC projects?

Oliver Behnke

Moderator

Administrator

Joined: 4 Sep 07

Posts: 984

Credit: 25171376

RAC: 43

RE: Would you mind telling

24 Nov 2011 14:45:57 UTC

Message 107615 in response to message 107614

(moderation:

)

Quote:

Would you mind telling us what it turned out to be, in case the experience might be useful for other BOINC projects?

Sure! A few months ago we noticed that Apache wasn't able to handle the BRP/FGRP download requests anymore and switched to lighttpd which turned to be more suitable for our specific setup, data type and access pattern. The load increased even further and we seem to have crossed a crucial threshold last week such that lighttpd also wasn't up to the task anymore. Various filesystem/network/daemon tests have revealed that the web server was in fact the bottleneck and we now moved to nginx, the very efficient web server that powers Facebook, WordPress, SourceForge and GitHub for instance (third, almost second, most popular web server).

Best,
Oliver

Einstein@Home Project

Richard Haselgrove

Joined: 10 Dec 05

Posts: 2143

Credit: 2954576624

RAC: 715215

Many thanks. Although BRP4 is

24 Nov 2011 15:19:00 UTC

Message 107616 in response to message 107615

(moderation:

)

Many thanks. Although BRP4 is probably the highest-download-traffic sub-project I know, there are others with high flows - that could well be useful advice/experience for other admins.

zombie67 [MM]

Joined: 10 Oct 06

Posts: 121

Credit: 477056359

RAC: 163358

RE: RE: This sort of

24 Nov 2011 19:12:52 UTC

Message 107617 in response to message 107610

(moderation:

)

Quote:

Quote:
This sort of thing should be posted to the front page news section. That way also gets the word out via RSS.

ROFL! Oh, yeah. Right. So the people who cannot reach us can tell us that? :-).

I don't understand your point. Everyone could get to the web site (and RSS) just fine. It was only the upload/download of tasks that wasn't working. It would have been good to announce the issue, so that crunchers would know to redirect their machines to other projects for the duration. And it helps head off all the posts from people asking "what's up?".

Reno, NV Team: SETI.USA

telegd

Joined: 17 Apr 07

Posts: 91

Credit: 10212522

RAC: 0

Is it just me or have we run

3 Dec 2011 17:12:03 UTC

Message 107618

(moderation:

)

Is it just me or have we run out of BRP4 work today?

I just checked the server status page, which has "Tasks to send" at 0.

Not sure if that was planned...

Svenie25

Joined: 21 Mar 05

Posts: 139

Credit: 2436862

RAC: 0

RE: Is it just me or have

3 Dec 2011 17:49:08 UTC

Message 107619 in response to message 107618

(moderation:

)

Quote:

Is it just me or have we run out of BRP4 work today?

I just checked the server status page, which has "Tasks to send" at 0.

Not sure if that was planned...

I donÂ´t think, it was planned, but wonder why nobody ask about this until now. ;)

BRP4 & FGRP1 download (server) problems

Forums › Technical News

Comment viewing options

Forums › Technical News