"Unsent" WUs
15 Sep 2008 12:06:00 UTC
Topic 193926
(moderation:
Since Saturday all work-units that I return stay "Pending" because only one instance was sent out and therefore no minimum quorum can be reached.
The same problem shows with a few work-units from mid-August. On these my wingman missed the deadline but the work-unit was not re-distributed: e.g. name: h1_0605.20_S5R4__290_S5R4a, name: h1_0605.20_S5R4__291_S5R4a, name: h1_0605.20_S5R4__292_S5R4a
"Unsent" WUs
)
This happens when one has a fast computer and you race ahead of those who have been sent the same data packs as you.
I believe that when a task has to be resent, the re-sends are added to the end of the queue of tasks waiting to be sent
RE: Since Saturday all
)
I'm no expert on this but from looking at the links you've provided another wu has been created but not yet allocated to another computer. I suspect this is the "locality scheduling" working, in that it only wants to give it to a machine that already has the pre-requisite files if possible (after a while it gives up and just gives it to the first person to request a wu). How long it takes before giving up I am not sure, maybe some more knowledgable person can say.
Cheers
MarkJ
BOINC blog
It's just the way the
)
It's just the way the scheduler works.
The scheduler has to wait until a host with the correct data files requests more work before those "unsent" tasks actually get sent. If the wait becomes too long the scheduler will eventually decide to send the set of large data files to more hosts requesting work so that the "unsents" can be disposed of. In a worst case scenario this could take in the vicinity of 7 - 10 days.
The norm is usually a lot less than this but it's all just the luck of the draw. The best thing is to not be concerned about it as it will be properly sorted in the end.
Cheers,
Gary.
RE: I believe that when a
)
No, if a host requests work for a particular set of data files and if there are resends available, my experience is that you will get the resends first before any new (ie _0 or _1) tasks. I often have machines requesting multiple tasks and I tend to see _2 (or higher) tasks coming before new tasks.
Cheers,
Gary.
RE: RE: I believe that
)
You are probably right that the server will always try to find work for the data sets that the host already has, even if that means taking them from a long way down the queue. But I still think that BOINC server code adds the re-send units to the end of the queue. On Seti Beta I have a pending task that requires a re-send, where the first deadline was 20 Jul 2008 21:34:14 UTC.
Martin, for what it's worth,
)
Martin, for what it's worth, I've seen the same thing occur (units "unsent" while they wait for another computer to have the same data to work with), but at least my anecdotal observations have been that I've never seen it go beyond 7 days without the server finding another computer to send data to.
I have no idea if that (7 days) has anything to do with any of how the "locality scheduling" code works or if I've just never seen it go beyond a week, but I wouldn't start to worry unless they remain unsent for longer.
[edit]Oddly enough though... after looking at my own hosts it's only my quad Xeon that also has a bunch of pending results with "unsent" mates. Coincidence? Perhaps...[/edit]
I'm running into the same
)
I'm running into the same thing sort of. Host has 16 out of the last 18 units pending. It's gotten so bad that I dropped one of the cores and put it on another project until some of this is resolved. I understand that sometimes it takes a while for a wingman to catch up but this is the first time I've ever seen it this bad. Hope it gets straightened out soon.
RE: .... It's gotten so bad
)
I really don't understand why you think there is a "problem" when really it's just the way the locality scheduling system works. Eventually the scheduler will give this particular dataset to additional hosts and things will rapidly catch up. If you leave your cores the way they were, you'll get a nice credit boost when that happens.
Since there is nothing to "straighten out", your "hope" is bound to be dashed :-).
However, the scheduler will restore equilibrium in its own good time and with no intervention needed.
Cheers,
Gary.
RE: I have no idea if that
)
For those who don't know how it works, Locality Scheduling.
RE: Since there is nothing
)
Is this something that's been instigated with the new application? I don't recall in the last 3 years having to wait an extended period of time for either the wu to be sent out to a wingman or a 5 to 7 day run of work to not receive credits within a resonable period of time!
I know all about faster computers finishing first and having to wait for the slower ones, people joining the project; grabbing a wu or so and never showing back up, etc. But, when you are used to seeing a consistant amount of credit showing up everyday and then suddenly it's 1/3 or 1/4 the normal amount you start looking for answers. It was bad enough with the change over that credits were cut in 1/2, but now there's an added delay?
[edit]If this had started after Ike blew through I could understand there could be a lot of work out there that is going to be delayed until that 1 or 2 million people get power back and it was just my bad luck to get matched with some of them. Seeing how this started prior to that and only within the last week or so for me I was looking for answers to why there's such a delay.
[/edit]