Serious BUG: Phantom WUs NOT on user client machines but on result pages

Bruce Allen
Bruce Allen
Moderator
Joined: 15 Oct 04
Posts: 1119
Credit: 172127663
RAC: 0

> Should we let you know

Message 6392 in response to message 6391

> Should we let you know about lost work even if it isn't a case of it being
> continually lost or will those WUs be taken casr of some other way?

If the work is not being continually lost, but was only lost until the host got registered, then please ignore it. The lost work will be resent as soon as it times out (after one week).

On the other hand, if you have a host which is not behind a firewall and which is continuously losing work, I'd like to know. This should be very useful for debugging BOINC.

Bruce

Director, Einstein@Home

hoarfrost
hoarfrost
Joined: 9 Feb 05
Posts: 207
Credit: 94984156
RAC: 120594

What I should make for

What I should make for deleting "Fantom WU" from my Results list and delete me from list of participants which processing this unit?

Bruce Allen
Bruce Allen
Moderator
Joined: 15 Oct 04
Posts: 1119
Credit: 172127663
RAC: 0

> What I should make for

Message 6394 in response to message 6393

> What I should make for deleting "Fantom WU" from my Results list and delete me
> from list of participants which processing this unit?

You can't remove these from your result list. But eventually they will time out (after a week) and get issued to some other user. When three successful results have been returned and validated, the whole WU and all the phantom results will get purged.

Bruce

Director, Einstein@Home

AnRM
AnRM
Joined: 9 Feb 05
Posts: 213
Credit: 4346941
RAC: 0

> > Should we let you know

Message 6395 in response to message 6392

> > Should we let you know about lost work even if it isn't a case of it
> being
> > continually lost or will those WUs be taken casr of some other way?
>
> If the work is not being continually lost, but was only lost until the host
> got registered, then please ignore it. The lost work will be resent as soon
> as it times out (after one week).
>
> On the other hand, if you have a host which is not behind a firewall and which
> is continuously losing work, I'd like to know. This should be very useful for
> debugging BOINC.
>
> Bruce
>We have 4 machines that are networked together and have been having little problems with the project. We are using the 4.19/4.79 cobo on all machines but we seem to have your problem on only one machine:12208. We were assigned 4 WU's (#s 375946,375945,375939,375938)on the 22Feb05. The first two were received and processed ok. The last two (the first sent) did not arrive but are displayed on the 'Results for computer' web page and have timed out.
A similiar pattern for WU's sent on the 24Feb05 (#s 87185,387110,386345,383293)ie.two were processed ok and two didn't arrive apparently. The last WU down loads on the 26,27Feb and 1Mar seem ok. This machine is behind a Zone Alarm firewall but I don't think it is causing the problem as it appears to work correctly any time we have monitored communications with the servers. Hope this info helps. You are doing a great job keeping up with all these problems and your personal involvement is really outstanding compared to the World Community Grid, Boinc/SETI, etc.The smooth start up compared to other projects is a tribute to your efforts. Thanks.

Michael Berger
Michael Berger
Joined: 22 Jan 05
Posts: 36
Credit: 37252
RAC: 0

Bruce, The issue is still

Message 6396 in response to message 6390

Bruce,

The issue is still repeating itself for the hosts listed below. None are not behind a firewall and they're networked at the same location. What networking details do you need?

7672 GenuineIntel Intel(R) Pentium(R) 4 CPU 3.40GHz Microsoft Windows XP Professional Edition, Service Pack 2, (05.01.2600.00)

1397642 395805 25 Feb 2005 11:23:06 UTC 4 Mar 2005 11:23:06 UTC In Progress Unknown New --- --- ---
1393899 394912 25 Feb 2005 11:23:00 UTC 4 Mar 2005 11:23:00 UTC In Progress Unknown New --- --- ---
1203915 351169 18 Feb 2005 7:28:45 UTC 25 Feb 2005 7:28:45 UTC Over No reply New 0.00 --- ---

7660 GenuineIntel Intel(R) Pentium(R) 4 CPU 2.80GHz Microsoft Windows XP Home Edition, Service Pack 2, (05.01.2600.00)

1440862 405896 27 Feb 2005 14:19:17 UTC 6 Mar 2005 14:19:17 UTC In Progress Unknown New --- --- ---
1440850 405893 27 Feb 2005 14:19:17 UTC 6 Mar 2005 14:19:17 UTC In Progress Unknown New --- --- ---
1178852 347092 15 Feb 2005 21:57:28 UTC 22 Feb 2005 21:57:28 UTC Over No reply New 0.00 --- ---
1177546 346808 16 Feb 2005 4:09:06 UTC 23 Feb 2005 4:09:06 UTC Over No reply New 0.00 --- ---

7668 AuthenticAMD AMD Athlon(tm) Processor Microsoft Windows XP Professional Edition, Service Pack 2, (05.01.2600.00)

1254561 362688 20 Feb 2005 21:52:55 UTC 27 Feb 2005 21:52:55 UTC In Progress Unknown New --- --- ---

This is not a complete list as some WU's have already been deleted.

Thanks in advance,

Michael

There's a fine line between fishing and standing on the shore looking like an idiot -- Steven Wright

AnRM
AnRM
Joined: 9 Feb 05
Posts: 213
Credit: 4346941
RAC: 0

> > > Should we let you know

Message 6397 in response to message 6395

> > > Should we let you know about lost work even if it isn't a case of
> it
> > being
> > > continually lost or will those WUs be taken casr of some other way?
> >
> > If the work is not being continually lost, but was only lost until the
> host
> > got registered, then please ignore it. The lost work will be resent as
> soon
> > as it times out (after one week).
> >
> > On the other hand, if you have a host which is not behind a firewall and
> which
> > is continuously losing work, I'd like to know. This should be very
> useful for
> > debugging BOINC.
> >
> > Bruce
> >We have 4 machines that are networked together and have been having little
> problems with the project. We are using the 4.19/4.79 cobo on all machines but
> we seem to have your problem on only one machine:12208. We were assigned 4
> WU's (#s 375946,375945,375939,375938)on the 22Feb05. The first two were
> received and processed ok. The last two (the first sent) did not arrive but
> are displayed on the 'Results for computer' web page and have timed out.
> A similiar pattern for WU's sent on the 24Feb05 (#s
> 87185,387110,386345,383293)ie.two were processed ok and two didn't arrive
> apparently. The last WU down loads on the 26,27Feb and 1Mar seem ok. This
> machine is behind a Zone Alarm firewall but I don't think it is causing the
> problem as it appears to work correctly any time we have monitored
> communications with the servers. Hope this info helps. You are doing a great
> job keeping up with all these problems and your personal involvement is really
> outstanding compared to the World Community Grid, Boinc/SETI, etc.The smooth
> start up compared to other projects is a tribute to your efforts. Thanks.
>Just to clarify the English in the first sentence "little problems" should really state "NO problems"

hoarfrost
hoarfrost
Joined: 9 Feb 05
Posts: 207
Credit: 94984156
RAC: 120594

> > What I should make for

Message 6398 in response to message 6394

> > What I should make for deleting "Fantom WU" from my Results list and
> delete me
> > from list of participants which processing this unit?
>
> You can't remove these from your result list. But eventually they will time
> out (after a week) and get issued to some other user. When three successful
> results have been returned and validated, the whole WU and all the phantom
> results will get purged.
>
> Bruce
>
OK. But from four host from list of host that processing this WU, my host must be a second that go over a deadline. It is a bad.

And this WU spoils list of my results. :D

hoarfrost
hoarfrost
Joined: 9 Feb 05
Posts: 207
Credit: 94984156
RAC: 120594

Don't know right or not, but

Don't know right or not, but I think, that bug with "fantom WU" is a direct consequence of bug "exited with zero status but no 'finished' file". Today I have a second case with "exited with zero" and in this case (as well as in the first) in my results list included two work units. And one of this - "fantom".

With interval of sending about 2 minutes:

First case:
414417 28 Feb 2005 7:51:22 UTC 1 Mar 2005 2:29:08 UTC Over Success Done
362420 28 Feb 2005 7:50:08 UTC 7 Mar 2005 7:50:08 UTC In Progress Unknown New

Second (today) case:
431544 2 Mar 2005 10:51:33 UTC 9 Mar 2005 10:51:33 UTC In Progress Unknown New
431535 2 Mar 2005 10:49:50 UTC 9 Mar 2005 10:49:50 UTC In Progress Unknown New

As well as in first case on my computer only last WU from this couple of WUs.

================================================================
stdout (if need):

2005-03-02 07:12:53 [---] Starting BOINC client version 4.19 for windows_intelx86
2005-03-02 07:12:53 [Einstein@Home] Project prefs: using your defaults
2005-03-02 07:12:53 [Einstein@Home] Host ID is 43575
2005-03-02 07:12:53 [---] No general preferences found - using BOINC defaults
2005-03-02 07:12:55 [Einstein@Home] Resuming computation for result H1_0954.9__0955.1_0.1_T04_Test02_1 using einstein version 4.79
2005-03-02 08:04:29 [---] Suspending network activity - user request
2005-03-02 13:41:23 [Einstein@Home] Computation for result H1_0954.9__0955.1_0.1_T04_Test02 finished
2005-03-02 13:48:49 [---] Resuming network activity
2005-03-02 13:48:49 [---] Insufficient work; requesting more
2005-03-02 13:48:49 [Einstein@Home] Requesting 16201 seconds of work
2005-03-02 13:48:49 [Einstein@Home] Sending request to scheduler: http://einstein.phys.uwm.edu/EinsteinAtHome_cgi/cgi
2005-03-02 13:48:50 [Einstein@Home] Started upload of H1_0954.9__0955.1_0.1_T04_Test02_1_0
2005-03-02 13:49:48 [Einstein@Home] Finished upload of H1_0954.9__0955.1_0.1_T04_Test02_1_0
2005-03-02 13:49:48 [Einstein@Home] Throughput 1310 bytes/sec
2005-03-02 13:50:17 [---] Insufficient work; requesting more
2005-03-02 13:50:17 [Einstein@Home] Requesting 16201 seconds of work
2005-03-02 13:50:17 [Einstein@Home] Sending request to scheduler: http://einstein.phys.uwm.edu/EinsteinAtHome_cgi/cgi
2005-03-02 13:50:45 [Einstein@Home] Scheduler RPC to http://einstein.phys.uwm.edu/EinsteinAtHome_cgi/cgi succeeded
2005-03-02 13:50:45 [Einstein@Home] Project prefs: using your defaults
2005-03-02 13:50:45 [Einstein@Home] Starting result H1_0954.9__0955.4_0.1_T04_Test02_0 using einstein version 4.79
================================================================

In first case my host is a fourth in list of host that processing this unit and second that don't returned result! And deadline - 7 Mar 2005 7:50:08 UTC! Two users that already processed this WU some days ago, must waiting this deadline, sending of this WU to other host, and if it send in not "fantom" - waiting of processing! And if "future send" of this work unit is "fantom"...

In second case - deadline at 9 Mar 2005 10:51:33 UTC and my host is first (and alone) in hosts list.

Please make a tool, that make a resend this WU to users, or delete this WU from results list and hosts from hosts list.

I THINK THAT IT IS A VERY IMPORTANT FOR IMAGE OF PROJECT.

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4273
Credit: 245202351
RAC: 13514

This is definetly not

This is definetly not related.

BM

BM

Michael Berger
Michael Berger
Joined: 22 Jan 05
Posts: 36
Credit: 37252
RAC: 0

^TOP^ Bruce, The issue

Message 6401 in response to message 6396

^TOP^

Bruce,

The issue is still repeating itself for the hosts listed below. None are not behind a firewall and they're networked at the same location. What networking details do you need?

7672 GenuineIntel Intel(R) Pentium(R) 4 CPU 3.40GHz Microsoft Windows XP Professional Edition, Service Pack 2, (05.01.2600.00)

1397642 395805 25 Feb 2005 11:23:06 UTC 4 Mar 2005 11:23:06 UTC In Progress Unknown New --- --- ---
1393899 394912 25 Feb 2005 11:23:00 UTC 4 Mar 2005 11:23:00 UTC In Progress Unknown New --- --- ---
1203915 351169 18 Feb 2005 7:28:45 UTC 25 Feb 2005 7:28:45 UTC Over No reply New 0.00 --- ---

7660 GenuineIntel Intel(R) Pentium(R) 4 CPU 2.80GHz Microsoft Windows XP Home Edition, Service Pack 2, (05.01.2600.00)

1440862 405896 27 Feb 2005 14:19:17 UTC 6 Mar 2005 14:19:17 UTC In Progress Unknown New --- --- ---
1440850 405893 27 Feb 2005 14:19:17 UTC 6 Mar 2005 14:19:17 UTC In Progress Unknown New --- --- ---
1178852 347092 15 Feb 2005 21:57:28 UTC 22 Feb 2005 21:57:28 UTC Over No reply New 0.00 --- ---
1177546 346808 16 Feb 2005 4:09:06 UTC 23 Feb 2005 4:09:06 UTC Over No reply New 0.00 --- ---

7668 AuthenticAMD AMD Athlon(tm) Processor Microsoft Windows XP Professional Edition, Service Pack 2, (05.01.2600.00)

1254561 362688 20 Feb 2005 21:52:55 UTC 27 Feb 2005 21:52:55 UTC In Progress Unknown New --- --- ---

This is not a complete list as some WU's have already been deleted.

Thanks in advance,

Michael

There's a fine line between fishing and standing on the shore looking like an idiot -- Steven Wright

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.