scheduler got bird flu?

Twosheds
Twosheds
Joined: 18 Jan 05
Posts: 1405
Credit: 3548147
RAC: 0

Just checked my results..I

Message 17767 in response to message 17766

Just checked my results..I have this as sent to my machine.

9632601 2282969 7 Oct 2005 11:40:45 UTC

I haven't recieved it. It Looks like the problem is still there.

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5872
Credit: 117345611343
RAC: 35848825

OK, I see it. At least it's

OK, I see it. At least it's only one ghost and your download limit is still 4/day.

Well, I bit the bullet and lowered myself down the cesspit to see what was down there near 1601 :). Man, it was pretty bad!! The stench was unbelievable :).

I was actually surprised to find quite a few expired results going back to 09 Sep still there. They were ghosts to you but a surprisingly large number of other users had some trouble and it took a while to reach a quorum and in the very few cases I looked at the final successful result took close to a month from when the result was first issued. It's not a sign of anything sinister - just interesting.

The bad news is that you have already had a bunch expire today and there's quite a few more that will expire later today and in the subsequent days ahead. You are going to drop to a limit of 1/day shortly and probably stay there for the next two weeks. You will get your one result each day and if it is a real work unit then your box will crunch it. If it is a ghost then your other projects will get some bonus time.

Unfortunately, version 4.19 does not have an "abort this result" button. One of your alternatives is to stop BOINC and uninstall 4.19. Then install a more recent version (at least 4.45) where the ability to better handle ghosts has been included. Personally I'd probably go for 4.72. Other more daring souls might suggest 5.1.6. The choice is up to you. The main reason I suggest 4.72 is that it is more a known quantity that shouldn't do you any damage.

Once the new installation completes, I would set "no new work" for EAH immediately, so that you don't get a flood of ghosts straight away. I don't think that can happen with your 4/day limit but I don't know. This is absolutely new territory for me. With the "no new work" set, you would at least have plenty of time to assess what to do calmly. To me, I'd be wanting to abort everything dated in Sept at least. That will get rid of the immediate "daily expiry" ongoing threat. If you do a search back through the message boards, I have a feeling Bruce did document what would happen once the server started to "sync up" your results list to what it thought you should have. The server is smart enough if I remember correctly not to sync up anything where the quorum has already been formed so in your case a lot of the problem may just go away quite by itself.

Sorry to ramble on so much - that's just me. The decision on what to do however has to be yours. Someone may chime in with a better solution so it might pay to wat a bit and see if anyone else responds.

Good luck!!

Cheers,
Gary.

Bruce Allen
Bruce Allen
Moderator
Joined: 15 Oct 04
Posts: 1119
Credit: 172127663
RAC: 0

RE: Just checked my

Message 17769 in response to message 17767

Quote:

Just checked my results..I have this as sent to my machine.

9632601 2282969 7 Oct 2005 11:40:45 UTC

I haven't recieved it. It Looks like the problem is still there.

Keith, my advice would normallly be "Upgrade to 4.45. That will cure the ghost problem." However you will then get sent 1600 workunits! This may break BOINC at your end; I'm not sure. If you are willing to risk this chaos, then go ahead and upgrade. If everything works, you'll still have to abort most of these 1600 workunits by hand.

If things don't work, and horrible problems result, I can make the work to time out prematurely on the server end. You'd then have to detach from the project, [possibly reinstall BOINC],reattach to the project, and merge hosts to sort out the mess.

Cheers,
Bruce

Director, Einstein@Home

Twosheds
Twosheds
Joined: 18 Jan 05
Posts: 1405
Credit: 3548147
RAC: 0

Hmmm..decisions...thanks

Message 17770 in response to message 17769

Hmmm..decisions...thanks again for the input, much appreciated. I'll let you know how things transpire.

Twosheds
Twosheds
Joined: 18 Jan 05
Posts: 1405
Credit: 3548147
RAC: 0

Uninstalled 4.19...installed

Message 17771 in response to message 17770

Uninstalled 4.19...installed 4.45.

This is what I'm getting after trying to connect to LHC and Einstein.

2005-10-07 16:15:28 [http://einstein.phys.uwm.edu/] Master file fetch failed
BOINC couldn't get main page for http://einstein.phys.uwm.edu/.
Please check the URL and try again.
2005-10-07 16:15:28 [http://einstein.phys.uwm.edu/] Resetting project
2005-10-07 16:15:28 [---] request_reschedule_cpus: exit_tasks
2005-10-07 16:15:28 [http://einstein.phys.uwm.edu/] Detaching from project
2005-10-07 16:17:46 [http://lhcathome.cern.ch/] Master file fetch failed
BOINC couldn't get main page for http://lhcathome.cern.ch/.
Please check the URL and try again.
2005-10-07 16:17:46 [http://lhcathome.cern.ch/] Resetting project
2005-10-07 16:17:46 [---] request_reschedule_cpus: exit_tasks
2005-10-07 16:17:46 [http://lhcathome.cern.ch/] Detaching from project

Checking back, I had ghost WU's on other projects while I ran 4.19.

I can see I'm going to have some fun...

Jord
Joined: 26 Jan 05
Posts: 2952
Credit: 5893653
RAC: 167

Does it have all the

Does it have all the permissions to go through the firewall?
Also wondering if it may have to do with the Service Pack 1 on XP.

I doubt you get 1600+ units sent as soon as you manage to attach correctly to EAH. Isn't it the 5.x version that tries to upload your missing units? (So never upgrade to 5.2, Keith. ;))

Twosheds
Twosheds
Joined: 18 Jan 05
Posts: 1405
Credit: 3548147
RAC: 0

The firewall settings allow

Message 17773 in response to message 17772

The firewall settings allow Boinc to connect...but now after trying to connect to Seti..

2005-10-07 16:29:31 [http://setiathome.berkeley.edu/] Scheduler request to http://setiboinc.ssl.berkeley.edu/sah_cgi/cgi failed
2005-10-07 16:29:31 [http://setiathome.berkeley.edu/] No schedulers responded
2005-10-07 16:29:32 [http://setiathome.berkeley.edu/] Deferring communication with project for 5 minutes and 46 seconds
2005-10-07 16:35:22 [http://setiathome.berkeley.edu/] Scheduler request to http://setiboinc.ssl.berkeley.edu/sah_cgi/cgi failed
2005-10-07 16:35:22 [http://setiathome.berkeley.edu/] No schedulers responded
2005-10-07 16:35:23 [http://setiathome.berkeley.edu/] Deferring communication with project for 34 minutes and 28 seconds
2005-10-07 17:09:52 [http://setiathome.berkeley.edu/] Scheduler request to http://setiboinc.ssl.berkeley.edu/sah_cgi/cgi failed
2005-10-07 17:09:52 [http://setiathome.berkeley.edu/] No schedulers responded
2005-10-07 17:09:53 [http://setiathome.berkeley.edu/] Deferring communication with project for 1 hours, 8 minutes, and 9 seconds
2005-10-07 17:25:20 [http://setiathome.berkeley.edu/] Scheduler request to http://setiboinc.ssl.berkeley.edu/sah_cgi/cgi failed
2005-10-07 17:25:20 [http://setiathome.berkeley.edu/] No schedulers responded
2005-10-07 17:25:21 [http://setiathome.berkeley.edu/] Deferring communication with project for 58 seconds

...and getting the above, I thught I'd better check my results there.

I find I have 10 new ghost WU's on my Seti account.

Boy, I must have upset someone in my previous life...

Jord
Joined: 26 Jan 05
Posts: 2952
Credit: 5893653
RAC: 167

All of your problems sound

All of your problems sound like network and internet connection problems. So the first place I'd check is with the local hardware and software. LAN card, modem, firewall software, firewall hardware, cables etc.

Didn't you always have problems with this (or a) machine, or was that Larry?
Check in the XP firewall if the correct ports are available to both boinc.exe and boincmgr.exe: 80, 1043 and 31416.

Twosheds
Twosheds
Joined: 18 Jan 05
Posts: 1405
Credit: 3548147
RAC: 0

My Windows firewall is not

Message 17775 in response to message 17774

My Windows firewall is not activated, I'm using the Mc Afee firewall.

The current settings are....to quote.

"Allow this program to communicate if the data direction is outbound and protocol is TCP/IP and remote port is 80"

and

"Allow this program to communicate if the data direction is outbound and protocol is TCP/IP and remote port is 1039, 1041, 1044, 1046, 1219 or 1221. Or if local port is 1043"

Jord
Joined: 26 Jan 05
Posts: 2952
Credit: 5893653
RAC: 167

I asked JM7 about the ports

I asked JM7 about the ports once, here's what I got as an answer:

Quote:

TCP port 80 outbound from BOINC.exe is the only port that should be needed to communicate with the project servers.

TCP Ports 1043 and 31416 inbound for BOINC.exe and outbound for BOINCMgr.exe are used for remote control of the daemon (boinc.exe) by the manager (BOINCMgr.exe). It should not be nessecary to allow these ports to be accessed from internet.

Most OSes can use 127.0.0.1 without opening a port on the host, but some OSes will not allow the distinction between the loopback (127.0.0.1) address and a general internet address, and the machine must allow these ports to be open to the LAN.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.