I haven't had any resent to me (I think) but from looking at the posts here and the change notes I think there is a check missing. If the project is reset causeing the lost workunits they should not be resent. Probably this should apply to merged hosts as well.
I think this needs to be added because in either of those cases there was a problem that may have even been caused by the workunit that is being resent. If so that workunit will most likely cause the same problem again and we get into a cycle of resetting and resending.
I see the point. But I'm not sure about this. After all a user can always ABORT a workunit that is problematic, to get rid of it.
Even if (s)he doesn't abort it (not everyone babysits their BOINC installations), it will eventually pass the dead-line iff it is not re-set after WU is re-sent.
I haven't had any resent to me (I think) but from looking at the posts here and the change notes I think there is a check missing. If the project is reset causeing the lost workunits they should not be resent. Probably this should apply to merged hosts as well.
I think this needs to be added because in either of those cases there was a problem that may have even been caused by the workunit that is being resent. If so that workunit will most likely cause the same problem again and we get into a cycle of resetting and resending.
I see the point. But I'm not sure about this. After all a user can always ABORT a workunit that is problematic, to get rid of it.
Even if (s)he doesn't abort it (not everyone babysits their BOINC installations), it will eventually pass the dead-line iff it is not re-set after WU is re-sent.
I just got a pile of these on one of my hosts. However, the deadline is set to tomorrow. I'm not sure how 48-70 hours of work is supposed to get done in 36 hours or so. Shouldn't the deadlines be reset on any of these resent units such that the host has a chance of catching up?
I just got a pile of these on one of my hosts. However, the deadline is set to tomorrow. I'm not sure how 48-70 hours of work is supposed to get done in 36 hours or so. Shouldn't the deadlines be reset on any of these resent units such that the host has a chance of catching up?
I suggest that you abort the workunits which can't be finished in time. Then do 'update project' to report the aborted WU to the server. This way, new WU can be issued and your computer won't spend a long time doing work that's overdue.
I suggest that you abort the workunits which can't be finished in time. Then do 'update project' to report the aborted WU to the server. This way, new WU can be issued and your computer won't spend a long time doing work that's overdue.
Any idea how this work got lost??
I went through as another poster had suggested and aborted the ones that already had been granted credit, figuring the remaining ones would be useful, at least.
I now have this on 2 of my 20 hosts, with those 2 having 8-10 WU's each. All expirations are less than 48 hours.
As for how they got lost, I was going to ask about that. One of the affected hosts is a new PC I got a week ago. It's only been attached to the project for a week, and I don't see how it could have this many ghosts associated with it. Is it possible that the new code is seeing WU's from another host?
Alternately, is it possibly marking WU's as Ghosts that are really in the machine's Work Unit Data File, but just hadn't been assigned to the machine as actual WU's yet?
I also had a bunch of work get lost/re-sent to one of my hosts. What probably happened to me was that my ADSL account had exceeded its quota for the month - international bandwidth then drops to sub 1KB/s levels. The client probably managed to contact the server and request new work, but was unable to transfer the wu's (2 days worth). That's what I suspect, anyway.
I was glad to see them re-sent though, I hate failing anything ;)
PS If that is what happened, wouldn't it be good to have the client return a acknowledgement of receipt before the work is marked as 'In Progress'?
On a slightly different topic: is it possible that the DL server has slight problems from time to time?
Just today I installed BOINC on another cruncher and attached to E@H project. It downloaded all the needed files fine except for science app (exe and pdb). Due to that it trashed two WUs. Next try yielded in assigning two more WUs and DLing exe fine, but DLing pdf file failed, therefore trashing another two Wus. The pdb file transferred fine just a moment later but at that time, that host used up it's daily quota (4 as it is a new host) leaving it without E@H work until tomorrow.
I just looked over my second host that got these units, and realized it had 8 WU's all due in 7 hours. I ended up aborting all but the currently executing unit, since none of them will finish on time.
If you're going to resend these ghost units to their original hosts, there either needs to be a much longer deadline, or a throttle on how many get sent. Otherwise, you're just causing more missed deadlines and aborted units.
How about just marking the ghost units as aborted/comp error/not returned/etc, and then throwing them back in the queue for the next user to pick up in the normal course of business? I know other projects resubmit units that for whatever reason never got a quorum. Shouldn't this be handled the same way?
Then reading through this tread one suggestion comes to my mind. If results already have quorum and validated, is there a point in resending that result? Isn't it better to automatically mark those results with an error so the result can be removed from the database faster.
Then you're really interested in a subject, there is no way to avoid it. You have to read the Manual.
RE: RE: I haven't had any
)
Even if (s)he doesn't abort it (not everyone babysits their BOINC installations), it will eventually pass the dead-line iff it is not re-set after WU is re-sent.
Metod ...
RE: RE: RE: I haven't
)
Agreed.
Director, Einstein@Home
I just got a pile of these on
)
I just got a pile of these on one of my hosts. However, the deadline is set to tomorrow. I'm not sure how 48-70 hours of work is supposed to get done in 36 hours or so. Shouldn't the deadlines be reset on any of these resent units such that the host has a chance of catching up?
RE: I just got a pile of
)
I suggest that you abort the workunits which can't be finished in time. Then do 'update project' to report the aborted WU to the server. This way, new WU can be issued and your computer won't spend a long time doing work that's overdue.
Any idea how this work got lost??
Cheers,
Bruce
Director, Einstein@Home
RE: I suggest that you
)
I went through as another poster had suggested and aborted the ones that already had been granted credit, figuring the remaining ones would be useful, at least.
I now have this on 2 of my 20 hosts, with those 2 having 8-10 WU's each. All expirations are less than 48 hours.
As for how they got lost, I was going to ask about that. One of the affected hosts is a new PC I got a week ago. It's only been attached to the project for a week, and I don't see how it could have this many ghosts associated with it. Is it possible that the new code is seeing WU's from another host?
Alternately, is it possibly marking WU's as Ghosts that are really in the machine's Work Unit Data File, but just hadn't been assigned to the machine as actual WU's yet?
By the way, this new machine
)
By the way, this new machine has CC 4.45, and that's the only client it's ever had.
RE: Any idea how this work
)
I also had a bunch of work get lost/re-sent to one of my hosts. What probably happened to me was that my ADSL account had exceeded its quota for the month - international bandwidth then drops to sub 1KB/s levels. The client probably managed to contact the server and request new work, but was unable to transfer the wu's (2 days worth). That's what I suspect, anyway.
I was glad to see them re-sent though, I hate failing anything ;)
PS If that is what happened, wouldn't it be good to have the client return a acknowledgement of receipt before the work is marked as 'In Progress'?
On a slightly different
)
On a slightly different topic: is it possible that the DL server has slight problems from time to time?
Just today I installed BOINC on another cruncher and attached to E@H project. It downloaded all the needed files fine except for science app (exe and pdb). Due to that it trashed two WUs. Next try yielded in assigning two more WUs and DLing exe fine, but DLing pdf file failed, therefore trashing another two Wus. The pdb file transferred fine just a moment later but at that time, that host used up it's daily quota (4 as it is a new host) leaving it without E@H work until tomorrow.
Metod ...
I just looked over my second
)
I just looked over my second host that got these units, and realized it had 8 WU's all due in 7 hours. I ended up aborting all but the currently executing unit, since none of them will finish on time.
If you're going to resend these ghost units to their original hosts, there either needs to be a much longer deadline, or a throttle on how many get sent. Otherwise, you're just causing more missed deadlines and aborted units.
How about just marking the ghost units as aborted/comp error/not returned/etc, and then throwing them back in the queue for the next user to pick up in the normal course of business? I know other projects resubmit units that for whatever reason never got a quorum. Shouldn't this be handled the same way?
Then reading through this
)
Then reading through this tread one suggestion comes to my mind. If results already have quorum and validated, is there a point in resending that result? Isn't it better to automatically mark those results with an error so the result can be removed from the database faster.
Then you're really interested in a subject, there is no way to avoid it. You have to read the Manual.