"Unsent" WUs

Martin P.

Joined: 17 Feb 05

Posts: 162

Credit: 40156217

RAC: 0

15 Sep 2008 12:06:00 UTC

Topic 193926

(moderation:

)

Since Saturday all work-units that I return stay "Pending" because only one instance was sent out and therefore no minimum quorum can be reached.

The same problem shows with a few work-units from mid-August. On these my wingman missed the deadline but the work-unit was not re-distributed: e.g. name: h1_0605.20_S5R4__290_S5R4a, name: h1_0605.20_S5R4__291_S5R4a, name: h1_0605.20_S5R4__292_S5R4a

Winterknight

Joined: 4 Jun 05

Posts: 1515

Credit: 404918927

RAC: 515487

"Unsent" WUs

15 Sep 2008 12:51:23 UTC

Message 85516

(moderation:

)

Quote:

Since Saturday all work-units that I return stay "Pending" because only one instance was sent out and therefore no minimum quorum can be reached.

This happens when one has a fast computer and you race ahead of those who have been sent the same data packs as you.

Quote:

The same problem shows with a few work-units from mid-August. On these my wingman missed the deadline but the work-unit was not re-distributed: e.g. name: h1_0605.20_S5R4__290_S5R4a, name: h1_0605.20_S5R4__291_S5R4a, name: h1_0605.20_S5R4__292_S5R4a

I believe that when a task has to be resent, the re-sends are added to the end of the queue of tasks waiting to be sent

MarkJ

Joined: 28 Feb 08

Posts: 437

Credit: 139002861

RAC: 0

RE: Since Saturday all

15 Sep 2008 12:53:20 UTC

Message 85517

(moderation:

)

Quote:

Since Saturday all work-units that I return stay "Pending" because only one instance was sent out and therefore no minimum quorum can be reached.

The same problem shows with a few work-units from mid-August. On these my wingman missed the deadline but the work-unit was not re-distributed: e.g. name: h1_0605.20_S5R4__290_S5R4a, name: h1_0605.20_S5R4__291_S5R4a, name: h1_0605.20_S5R4__292_S5R4a

I'm no expert on this but from looking at the links you've provided another wu has been created but not yet allocated to another computer. I suspect this is the "locality scheduling" working, in that it only wants to give it to a machine that already has the pre-requisite files if possible (after a while it gives up and just gives it to the first person to request a wu). How long it takes before giving up I am not sure, maybe some more knowledgable person can say.

Cheers
MarkJ

BOINC blog

Gary Roberts

Moderator

Joined: 9 Feb 05

Posts: 5885

Credit: 119113304331

RAC: 24198169

It's just the way the

15 Sep 2008 12:54:22 UTC

Message 85518

(moderation:

)

It's just the way the scheduler works.

The scheduler has to wait until a host with the correct data files requests more work before those "unsent" tasks actually get sent. If the wait becomes too long the scheduler will eventually decide to send the set of large data files to more hosts requesting work so that the "unsents" can be disposed of. In a worst case scenario this could take in the vicinity of 7 - 10 days.

The norm is usually a lot less than this but it's all just the luck of the draw. The best thing is to not be concerned about it as it will be properly sorted in the end.

Cheers,
Gary.

Gary Roberts

Moderator

Joined: 9 Feb 05

Posts: 5885

Credit: 119113304331

RAC: 24198169

RE: I believe that when a

15 Sep 2008 13:03:12 UTC

Message 85519 in response to message 85516

(moderation:

)

Quote:

I believe that when a task has to be resent, the re-sends are added to the end of the queue of tasks waiting to be sent.

No, if a host requests work for a particular set of data files and if there are resends available, my experience is that you will get the resends first before any new (ie _0 or _1) tasks. I often have machines requesting multiple tasks and I tend to see _2 (or higher) tasks coming before new tasks.

Cheers,
Gary.

Winterknight

Joined: 4 Jun 05

Posts: 1515

Credit: 404918927

RAC: 515487

RE: RE: I believe that

15 Sep 2008 14:52:35 UTC

Message 85520 in response to message 85519

(moderation:

)

Quote:

Quote:
I believe that when a task has to be resent, the re-sends are added to the end of the queue of tasks waiting to be sent.

No, if a host requests work for a particular set of data files and if there are resends available, my experience is that you will get the resends first before any new (ie _0 or _1) tasks. I often have machines requesting multiple tasks and I tend to see _2 (or higher) tasks coming before new tasks.

You are probably right that the server will always try to find work for the data sets that the host already has, even if that means taking them from a long way down the queue. But I still think that BOINC server code adds the re-send units to the end of the queue. On Seti Beta I have a pending task that requires a re-send, where the first deadline was 20 Jul 2008 21:34:14 UTC.

Thunder

Joined: 18 Jan 05

Posts: 138

Credit: 46754541

RAC: 0

Martin, for what it's worth,

15 Sep 2008 16:11:13 UTC

Message 85521

(moderation:

)

Martin, for what it's worth, I've seen the same thing occur (units "unsent" while they wait for another computer to have the same data to work with), but at least my anecdotal observations have been that I've never seen it go beyond 7 days without the server finding another computer to send data to.

I have no idea if that (7 days) has anything to do with any of how the "locality scheduling" code works or if I've just never seen it go beyond a week, but I wouldn't start to worry unless they remain unsent for longer.

[edit]Oddly enough though... after looking at my own hosts it's only my quad Xeon that also has a bunch of pending results with "unsent" mates. Coincidence? Perhaps...[/edit]

Arion

Joined: 20 Mar 05

Posts: 147

Credit: 1626747

RAC: 0

I'm running into the same

16 Sep 2008 3:14:49 UTC

Message 85522

(moderation:

)

I'm running into the same thing sort of. Host has 16 out of the last 18 units pending. It's gotten so bad that I dropped one of the cores and put it on another project until some of this is resolved. I understand that sometimes it takes a while for a wingman to catch up but this is the first time I've ever seen it this bad. Hope it gets straightened out soon.

Gary Roberts

Moderator

Joined: 9 Feb 05

Posts: 5885

Credit: 119113304331

RAC: 24198169

RE: .... It's gotten so bad

16 Sep 2008 7:45:54 UTC

Message 85523 in response to message 85522

(moderation:

)

Quote:

.... It's gotten so bad that I dropped one of the cores and put it on another project until some of this is resolved....

I really don't understand why you think there is a "problem" when really it's just the way the locality scheduling system works. Eventually the scheduler will give this particular dataset to additional hosts and things will rapidly catch up. If you leave your cores the way they were, you'll get a nice credit boost when that happens.

Quote:

... Hope it gets straightened out soon.

Since there is nothing to "straighten out", your "hope" is bound to be dashed :-).

However, the scheduler will restore equilibrium in its own good time and with no intervention needed.

Cheers,
Gary.

Jord

Joined: 26 Jan 05

Posts: 2952

Credit: 5893653

RAC: 0

RE: I have no idea if that

16 Sep 2008 8:33:33 UTC

Message 85524 in response to message 85521

(moderation:

)

Quote:

I have no idea if that (7 days) has anything to do with any of how the "locality scheduling" code works or if I've just never seen it go beyond a week, but I wouldn't start to worry unless they remain unsent for longer.

For those who don't know how it works, Locality Scheduling.

Arion

Joined: 20 Mar 05

Posts: 147

Credit: 1626747

RAC: 0

RE: Since there is nothing

16 Sep 2008 9:52:44 UTC

Message 85525 in response to message 85523

(moderation:

)

Quote:

Since there is nothing to "straighten out", your "hope" is bound to be dashed :-).

However, the scheduler will restore equilibrium in its own good time and with no intervention needed.

Is this something that's been instigated with the new application? I don't recall in the last 3 years having to wait an extended period of time for either the wu to be sent out to a wingman or a 5 to 7 day run of work to not receive credits within a resonable period of time!

I know all about faster computers finishing first and having to wait for the slower ones, people joining the project; grabbing a wu or so and never showing back up, etc. But, when you are used to seeing a consistant amount of credit showing up everyday and then suddenly it's 1/3 or 1/4 the normal amount you start looking for answers. It was bad enough with the change over that credits were cut in 1/2, but now there's an added delay?

[edit]If this had started after Ike blew through I could understand there could be a lot of work out there that is going to be delayed until that 1 or 2 million people get power back and it was just my bad luck to get matched with some of them. Seeing how this started prior to that and only within the last week or so for me I was looking for answers to why there's such a delay.
[/edit]

"Unsent" WUs

Forums › Cruncher's Corner

Comment viewing options

Forums › Cruncher's Corner