Pending Tasks

jon b.
jon b.
Joined: 30 Jun 09
Posts: 7
Credit: 423140599
RAC: 911381
Topic 196401

When I looked at the pending tasks for computer 5429322, I noticed that the work unit details show the same thing for all of the tasks (with the exception of the Work unit ID and time created). Some of the tasks have been pending for one week. Thanks.

archae86
archae86
Joined: 6 Dec 05
Posts: 3163
Credit: 7354721687
RAC: 2252141

Pending Tasks

What question do you wish to ask?

I'll hazard a guess that your abundant experience on other projects has left you not expecting the vastly less random short and medium term quorum partner identity induced on the Einstein work that uses locality scheduling. This procedure cuts communication traffic (and host disk space usage) by sending new tasks to hosts which are such close relatives of tasks already sent to that host that the huge files don't need to be sent. (some like to describe this process as "carving" a new task out of the existing data files).

This means you are far more likely on gravity wave work here than elsewhere to have quorum partners (a.k.a. wingmen) repeating, especially in the short term. So if a quorum partner with whom you share many Work Units has a machine problem and returns a lot of bad work, or quits the project, or goes on extended vacation, or aborts current work for any of many reasons, you'll get a cluster of delayed validations. As the system will wait awhile for someone with the right files to request work after the first partner errors out, you can also see the resend needed to get you a new wingman delayed by much more than the usual seen elsewhere. (I've seen days, but not weeks, but I've not looked very hard).

It works out in the end, and I, personally, don't see nearly so long a tail of validation delay here as at SETI.

If none of that got anywhere near answering your actual question, please try an alternate expression of it.

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5887
Credit: 119166294008
RAC: 24615239

RE: When I looked at the

Quote:
When I looked at the pending tasks for computer 5429322, I noticed that the work unit details show the same thing for all of the tasks ....


Yes, and there are good reasons for that if you consider how locality scheduling works. Archae86 has correctly described this but I thought I'd add some more details to fully explain why your first batch of 12 tasks seem so identical when you drill down into each work unit ID and see pretty much the same pattern of 4 quorum members in each and every case. You would think this would be extremely rare but (with good reason) it's actually quite common. I've seen this sort of thing a few times before.

To understand what is going on, it's helpful to view your list of tasks using the task names view. You can then see that every single GW task in the list is for the frequency 0369.80. Your computer only joined the project on June 6th and locality scheduling has worked perfectly in giving you a series of tasks for the same set of large data files which has eliminated the need for further large data file downloads. Your most recent GW task has a 'sequence number' of __1248. This number will eventually count down to zero so there are lots more tasks to come for this data set. __1248 is actually a task you aborted - probably out of frustration with all the previous pending results, but if you click on the WUID you can see two things. Firstly, your 'wingman' completed the task later the same day you aborted it so if you had allowed it to crunch you would have most likely got very rapid credit and secondly, a replacement task remains unsent as at the time of writing, so your wingman is now waiting.

There's no criticism intended in this observation. You are quite free to abort whatever tasks you wish. However, because the scheduler tends to take a bit of time to select a host that already has the correct large data files, your wingman will be waiting a little longer for a new host to be found and then for the task to actually be crunched and returned. So when locality scheduling is in play (all GW tasks) the 'kindest' action is to crunch what you receive if possible.

When your host first requested work (June 7th), it must have asked for quite a lot of work because it was assigned 12 tasks in one hit. The scheduler tends to love large initial work requests because it's a golden opportunity for it to 'unload' a whole bunch of other people's 'rejects'. If the scheduler has been accumulating a bunch of related tasks that need to be resent (for whatever reason) it will eventually give up waiting for a suitably endowed host and hand them out to a new host asking for a lot of work. I have seen this happen to my new hosts over the years (multiple resend tasks for multiple data frequencies if the request is large), so much so that I always set the work cache size to a low value before allowing the new host to request - just in case. I don't have a problem at all with resends. It's the risk of getting several unrelated frequency sets requiring a huge number of large data file downloads. I believe that this has been fixed by the Devs in more recent times but old habits die hard :-).

So your initial work request resulted in 12 resend tasks all for the same frequency. You can know these are resends rather than primary tasks because the suffix to the name is not _0 or _1. The primary tasks were actually issued back on the 17th and 24th May. In each case, both primary tasks exceeded the deadline (on one host the tasks were actually aborted way after the deadline) and so had to be resent to two new hosts, yours and a fourth host. Unfortunately, there are quite a few new hosts that download a bunch of work and then never return it. The delays in reissuing tasks can lead to quite a few pendings remaining for some time in your tasks list. The good news is that your new wingman is steadily completing these (and other) tasks so if this continues, you shouldn't have much longer to wait.

Cheers,
Gary.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.