This sort of thing should be posted to the front page news section. That way also gets the word out via RSS.
ROFL! Oh, yeah. Right. So the people who cannot reach us can tell us that? :-)
Cheers, Mike.
( edit ) For the rest of us : those who hold the validators for editing the web content are currently incommunicado .... but the next time my car runs out of petrol I'll be sure to drive it to the next town to fill up.
I have made this letter longer than usual because I lack the time to make it shorter ...
... and my other CPU is a Ryzen 5950X :-) Blaise Pascal
Now, as to the original issue : it seems E@H may be a victim of it's own success. Again alas. Posters may recall analogous problems in the past when there's been a change in workflow patterns due to new work unit types etc. Thresholds get reached, bandwidths peak ..... that sort of thing. AFAIK a key problem is maintaining logical coherence of activities across separated hardware. Naturally in a perfect world with infinite funds, plenty of staff and an accurate crystal ball these scenarios would be escaped or never entered. :-)
In any case please bear with us. Most likely temporizing measures will be put in place and then followed by more lasting ones. Right now there's alot of back end discussion on a wide range of alternatives. Your patience is very much appreciated, but I guess now might be the time ( & I can't think of a better sort of occasion ) to switch to a backup BOINC project of your choice meantime if that suits your mindset.
Cheers, Mike.
I have made this letter longer than usual because I lack the time to make it shorter ...
... and my other CPU is a Ryzen 5950X :-) Blaise Pascal
Ok, we should be back on track now. We identified the cause and fixed it. Data are flowing again. We'll monitor the situation and ramp up BRP/FGRP work unit distribution over the next hours/days...
Ok, we should be back on track now. We identified the cause and fixed it. Data is flowing again. We'll monitor the situation and ramp up BRP/FGRP work unit distribution over the next hours/days...
Thanks for your patience!
Oliver
Would you mind telling us what it turned out to be, in case the experience might be useful for other BOINC projects?
Would you mind telling us what it turned out to be, in case the experience might be useful for other BOINC projects?
Sure! A few months ago we noticed that Apache wasn't able to handle the BRP/FGRP download requests anymore and switched to lighttpd which turned to be more suitable for our specific setup, data type and access pattern. The load increased even further and we seem to have crossed a crucial threshold last week such that lighttpd also wasn't up to the task anymore. Various filesystem/network/daemon tests have revealed that the web server was in fact the bottleneck and we now moved to nginx, the very efficient web server that powers Facebook, WordPress, SourceForge and GitHub for instance (third, almost second, most popular web server).
Many thanks. Although BRP4 is probably the highest-download-traffic sub-project I know, there are others with high flows - that could well be useful advice/experience for other admins.
This sort of thing should be posted to the front page news section. That way also gets the word out via RSS.
ROFL! Oh, yeah. Right. So the people who cannot reach us can tell us that? :-).
I don't understand your point. Everyone could get to the web site (and RSS) just fine. It was only the upload/download of tasks that wasn't working. It would have been good to announce the issue, so that crunchers would know to redirect their machines to other projects for the duration. And it helps head off all the posts from people asking "what's up?".
RE: This sort of thing
)
ROFL! Oh, yeah. Right. So the people who cannot reach us can tell us that? :-)
Cheers, Mike.
( edit ) For the rest of us : those who hold the validators for editing the web content are currently incommunicado .... but the next time my car runs out of petrol I'll be sure to drive it to the next town to fill up.
I have made this letter longer than usual because I lack the time to make it shorter ...
... and my other CPU is a Ryzen 5950X :-) Blaise Pascal
Now, as to the original issue
)
Now, as to the original issue : it seems E@H may be a victim of it's own success. Again alas. Posters may recall analogous problems in the past when there's been a change in workflow patterns due to new work unit types etc. Thresholds get reached, bandwidths peak ..... that sort of thing. AFAIK a key problem is maintaining logical coherence of activities across separated hardware. Naturally in a perfect world with infinite funds, plenty of staff and an accurate crystal ball these scenarios would be escaped or never entered. :-)
In any case please bear with us. Most likely temporizing measures will be put in place and then followed by more lasting ones. Right now there's alot of back end discussion on a wide range of alternatives. Your patience is very much appreciated, but I guess now might be the time ( & I can't think of a better sort of occasion ) to switch to a backup BOINC project of your choice meantime if that suits your mindset.
Cheers, Mike.
I have made this letter longer than usual because I lack the time to make it shorter ...
... and my other CPU is a Ryzen 5950X :-) Blaise Pascal
As for the network outage in
)
As for the network outage in Hannover: A couple of network switches suddenly blew fuses, the reason being investigated. Probably power malfunction.
Anyway, switches are back to normal operation, the server issue still being worked on.
BM
BM
Ok, we should be back on
)
Ok, we should be back on track now. We identified the cause and fixed it. Data are flowing again. We'll monitor the situation and ramp up BRP/FGRP work unit distribution over the next hours/days...
Thanks for your patience!
Oliver
Einstein@Home Project
RE: Ok, we should be back
)
Would you mind telling us what it turned out to be, in case the experience might be useful for other BOINC projects?
RE: Would you mind telling
)
Sure! A few months ago we noticed that Apache wasn't able to handle the BRP/FGRP download requests anymore and switched to lighttpd which turned to be more suitable for our specific setup, data type and access pattern. The load increased even further and we seem to have crossed a crucial threshold last week such that lighttpd also wasn't up to the task anymore. Various filesystem/network/daemon tests have revealed that the web server was in fact the bottleneck and we now moved to nginx, the very efficient web server that powers Facebook, WordPress, SourceForge and GitHub for instance (third, almost second, most popular web server).
Best,
Oliver
Einstein@Home Project
Many thanks. Although BRP4 is
)
Many thanks. Although BRP4 is probably the highest-download-traffic sub-project I know, there are others with high flows - that could well be useful advice/experience for other admins.
RE: RE: This sort of
)
I don't understand your point. Everyone could get to the web site (and RSS) just fine. It was only the upload/download of tasks that wasn't working. It would have been good to announce the issue, so that crunchers would know to redirect their machines to other projects for the duration. And it helps head off all the posts from people asking "what's up?".
Reno, NV Team: SETI.USA
Is it just me or have we run
)
Is it just me or have we run out of BRP4 work today?
I just checked the server status page, which has "Tasks to send" at 0.
Not sure if that was planned...
RE: Is it just me or have
)
I don´t think, it was planned, but wonder why nobody ask about this until now. ;)