"Unsent" WUs

Martin P.
Martin P.
Joined: 17 Feb 05
Posts: 162
Credit: 40156217
RAC: 0
Topic 193926

Since Saturday all work-units that I return stay "Pending" because only one instance was sent out and therefore no minimum quorum can be reached.

The same problem shows with a few work-units from mid-August. On these my wingman missed the deadline but the work-unit was not re-distributed: e.g. name: h1_0605.20_S5R4__290_S5R4a, name: h1_0605.20_S5R4__291_S5R4a, name: h1_0605.20_S5R4__292_S5R4a

Winterknight
Winterknight
Joined: 4 Jun 05
Posts: 1481
Credit: 388619055
RAC: 512487

"Unsent" WUs

Quote:
Since Saturday all work-units that I return stay "Pending" because only one instance was sent out and therefore no minimum quorum can be reached.


This happens when one has a fast computer and you race ahead of those who have been sent the same data packs as you.

Quote:

The same problem shows with a few work-units from mid-August. On these my wingman missed the deadline but the work-unit was not re-distributed: e.g. name: h1_0605.20_S5R4__290_S5R4a, name: h1_0605.20_S5R4__291_S5R4a, name: h1_0605.20_S5R4__292_S5R4a


I believe that when a task has to be resent, the re-sends are added to the end of the queue of tasks waiting to be sent

MarkJ
MarkJ
Joined: 28 Feb 08
Posts: 437
Credit: 139002861
RAC: 0

RE: Since Saturday all

Quote:

Since Saturday all work-units that I return stay "Pending" because only one instance was sent out and therefore no minimum quorum can be reached.

The same problem shows with a few work-units from mid-August. On these my wingman missed the deadline but the work-unit was not re-distributed: e.g. name: h1_0605.20_S5R4__290_S5R4a, name: h1_0605.20_S5R4__291_S5R4a, name: h1_0605.20_S5R4__292_S5R4a

I'm no expert on this but from looking at the links you've provided another wu has been created but not yet allocated to another computer. I suspect this is the "locality scheduling" working, in that it only wants to give it to a machine that already has the pre-requisite files if possible (after a while it gives up and just gives it to the first person to request a wu). How long it takes before giving up I am not sure, maybe some more knowledgable person can say.

Cheers
MarkJ

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5875
Credit: 118471461551
RAC: 25984963

It's just the way the

It's just the way the scheduler works.

The scheduler has to wait until a host with the correct data files requests more work before those "unsent" tasks actually get sent. If the wait becomes too long the scheduler will eventually decide to send the set of large data files to more hosts requesting work so that the "unsents" can be disposed of. In a worst case scenario this could take in the vicinity of 7 - 10 days.

The norm is usually a lot less than this but it's all just the luck of the draw. The best thing is to not be concerned about it as it will be properly sorted in the end.

Cheers,
Gary.

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5875
Credit: 118471461551
RAC: 25984963

RE: I believe that when a

Message 85519 in response to message 85516

Quote:
I believe that when a task has to be resent, the re-sends are added to the end of the queue of tasks waiting to be sent.

No, if a host requests work for a particular set of data files and if there are resends available, my experience is that you will get the resends first before any new (ie _0 or _1) tasks. I often have machines requesting multiple tasks and I tend to see _2 (or higher) tasks coming before new tasks.

Cheers,
Gary.

Winterknight
Winterknight
Joined: 4 Jun 05
Posts: 1481
Credit: 388619055
RAC: 512487

RE: RE: I believe that

Message 85520 in response to message 85519

Quote:
Quote:
I believe that when a task has to be resent, the re-sends are added to the end of the queue of tasks waiting to be sent.

No, if a host requests work for a particular set of data files and if there are resends available, my experience is that you will get the resends first before any new (ie _0 or _1) tasks. I often have machines requesting multiple tasks and I tend to see _2 (or higher) tasks coming before new tasks.


You are probably right that the server will always try to find work for the data sets that the host already has, even if that means taking them from a long way down the queue. But I still think that BOINC server code adds the re-send units to the end of the queue. On Seti Beta I have a pending task that requires a re-send, where the first deadline was 20 Jul 2008 21:34:14 UTC.

Thunder
Thunder
Joined: 18 Jan 05
Posts: 138
Credit: 46754541
RAC: 0

Martin, for what it's worth,

Martin, for what it's worth, I've seen the same thing occur (units "unsent" while they wait for another computer to have the same data to work with), but at least my anecdotal observations have been that I've never seen it go beyond 7 days without the server finding another computer to send data to.

I have no idea if that (7 days) has anything to do with any of how the "locality scheduling" code works or if I've just never seen it go beyond a week, but I wouldn't start to worry unless they remain unsent for longer.

[edit]Oddly enough though... after looking at my own hosts it's only my quad Xeon that also has a bunch of pending results with "unsent" mates. Coincidence? Perhaps...[/edit]

Arion
Arion
Joined: 20 Mar 05
Posts: 147
Credit: 1626747
RAC: 0

I'm running into the same

I'm running into the same thing sort of. Host has 16 out of the last 18 units pending. It's gotten so bad that I dropped one of the cores and put it on another project until some of this is resolved. I understand that sometimes it takes a while for a wingman to catch up but this is the first time I've ever seen it this bad. Hope it gets straightened out soon.

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5875
Credit: 118471461551
RAC: 25984963

RE: .... It's gotten so bad

Message 85523 in response to message 85522

Quote:
.... It's gotten so bad that I dropped one of the cores and put it on another project until some of this is resolved....

I really don't understand why you think there is a "problem" when really it's just the way the locality scheduling system works. Eventually the scheduler will give this particular dataset to additional hosts and things will rapidly catch up. If you leave your cores the way they were, you'll get a nice credit boost when that happens.

Quote:
... Hope it gets straightened out soon.

Since there is nothing to "straighten out", your "hope" is bound to be dashed :-).

However, the scheduler will restore equilibrium in its own good time and with no intervention needed.

Cheers,
Gary.

Jord
Joined: 26 Jan 05
Posts: 2952
Credit: 5893653
RAC: 3

RE: I have no idea if that

Message 85524 in response to message 85521

Quote:
I have no idea if that (7 days) has anything to do with any of how the "locality scheduling" code works or if I've just never seen it go beyond a week, but I wouldn't start to worry unless they remain unsent for longer.


For those who don't know how it works, Locality Scheduling.

Arion
Arion
Joined: 20 Mar 05
Posts: 147
Credit: 1626747
RAC: 0

RE: Since there is nothing

Message 85525 in response to message 85523

Quote:

Since there is nothing to "straighten out", your "hope" is bound to be dashed :-).

However, the scheduler will restore equilibrium in its own good time and with no intervention needed.

Is this something that's been instigated with the new application? I don't recall in the last 3 years having to wait an extended period of time for either the wu to be sent out to a wingman or a 5 to 7 day run of work to not receive credits within a resonable period of time!

I know all about faster computers finishing first and having to wait for the slower ones, people joining the project; grabbing a wu or so and never showing back up, etc. But, when you are used to seeing a consistant amount of credit showing up everyday and then suddenly it's 1/3 or 1/4 the normal amount you start looking for answers. It was bad enough with the change over that credits were cut in 1/2, but now there's an added delay?

[edit]If this had started after Ike blew through I could understand there could be a lot of work out there that is going to be delayed until that 1 or 2 million people get power back and it was just my bad luck to get matched with some of them. Seeing how this started prior to that and only within the last week or so for me I was looking for answers to why there's such a delay.
[/edit]

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.