Help!! I want to abort tasks but E@H keeps resending them to me

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5887
Credit: 119316422941
RAC: 25613008
Topic 195768

I just reported 8 resent tasks manually with NNT selected by clicking the update button. Tasks reported fine but it's downloading another 12 resent tasks even with NNT selected. Here's the log. Downloading tasks goes for miles as it gets files necessary. (We have an unlimited data cap)

23/04/2011 3:19:10 p.m.	Einstein@Home	Sending scheduler request: Requested by user.
23/04/2011 3:19:10 p.m.	Einstein@Home	Reporting 8 completed tasks, not requesting new tasks
23/04/2011 3:19:21 p.m.	Einstein@Home	Scheduler request completed
23/04/2011 3:19:21 p.m.	Einstein@Home	Message from server: Resent lost task h1_1478.40_S5R4__595_S5GC1HFa_1
23/04/2011 3:19:21 p.m.	Einstein@Home	Message from server: Resent lost task h1_1481.25_S5R4__552_S5GC1HFa_1
23/04/2011 3:19:21 p.m.	Einstein@Home	Message from server: Resent lost task h1_1490.95_S5R4__640_S5GC1HFa_0
23/04/2011 3:19:21 p.m.	Einstein@Home	Message from server: Resent lost task h1_1478.65_S5R4__577_S5GC1HFa_0
23/04/2011 3:19:21 p.m.	Einstein@Home	Message from server: Resent lost task h1_1478.65_S5R4__576_S5GC1HFa_0
23/04/2011 3:19:21 p.m.	Einstein@Home	Message from server: Resent lost task h1_1464.20_S5R4__461_S5GC1HFa_0
23/04/2011 3:19:21 p.m.	Einstein@Home	Message from server: Resent lost task h1_1441.25_S5R4__316_S5GC1HFa_0
23/04/2011 3:19:21 p.m.	Einstein@Home	Message from server: Resent lost task h1_1484.80_S5R4__559_S5GC1HFa_1
23/04/2011 3:19:21 p.m.	Einstein@Home	Message from server: Resent lost task h1_1484.80_S5R4__558_S5GC1HFa_1
23/04/2011 3:19:21 p.m.	Einstein@Home	Message from server: Resent lost task h1_1479.10_S5R4__633_S5GC1HFa_1
23/04/2011 3:19:21 p.m.	Einstein@Home	Message from server: Resent lost task h1_1447.90_S5R4__134_S5GC1HFa_0
23/04/2011 3:19:21 p.m.	Einstein@Home	Message from server: Resent lost task h1_1484.80_S5R4__557_S5GC1HFa_0
23/04/2011 3:19:23 p.m.	Einstein@Home	Started download of skygrid_1480Hz_S5GC1.dat
23/04/2011 3:19:23 p.m.	Einstein@Home	Started download of h1_1478.40_S5R4
23/04/2011 3:20:08 p.m.	Einstein@Home	Finished download of skygrid_1480Hz_S5GC1.dat
23/04/2011 3:20:08 p.m.	Einstein@Home	Started download of h1_1478.40_S5R7
23/04/2011 3:20:14 p.m.	Einstein@Home	Finished download of h1_1478.40_S5R4
23/04/2011 3:20:14 p.m.	Einstein@Home	Started download of l1_1478.40_S5R4
23/04/2011 3:21:02 p.m.	Einstein@Home	Finished download of l1_1478.40_S5R4
23/04/2011 3:21:02 p.m.	Einstein@Home	Started download of l1_1478.40_S5R7
23/04/2011 3:21:21 p.m.	Einstein@Home	Finished download of h1_1478.40_S5R7
23/04/2011 3:21:21 p.m.	Einstein@Home	Started download of h1_1478.45_S5R4
23/04/2011 3:22:00 p.m.	Einstein@Home	Finished download of l1_1478.40_S5R7
23/04/2011 3:22:00 p.m.	Einstein@Home	Started download of h1_1478.45_S5R7
23/04/2011 3:22:13 p.m.	Einstein@Home	Finished download of h1_1478.45_S5R4
23/04/2011 3:22:13 p.m.	Einstein@Home	Started download of l1_1478.45_S5R4
23/04/2011 3:22:45 p.m.	Einstein@Home	Finished download of h1_1478.45_S5R7
23/04/2011 3:22:45 p.m.	Einstein@Home	Started download of l1_1478.45_S5R7
23/04/2011 3:22:51 p.m.	Einstein@Home	Finished download of l1_1478.45_S5R4
23/04/2011 3:22:51 p.m.	Einstein@Home	Started download of h1_1478.50_S5R4
23/04/2011 3:23:19 p.m.	Einstein@Home	Finished download of l1_1478.45_S5R7
23/04/2011 3:23:19 p.m.	Einstein@Home	Started download of h1_1478.50_S5R7


Is this meant to happen?

I'm using Win 7 64 Ultimate I7 980X boinc 6.10.58

Cheers,
Gary.

Gundolf Jahn
Gundolf Jahn
Joined: 1 Mar 05
Posts: 1079
Credit: 341280
RAC: 0

Help!! I want to abort tasks but E@H keeps resending them to me

Yes, resending lost tasks seems to be independent of NNT.

Gruß,
Gundolf

Computer sind nicht alles im Leben. (Kleiner Scherz)

Speedy
Speedy
Joined: 11 Aug 05
Posts: 41
Credit: 24976846
RAC: 52476

RE: Yes, resending lost

Quote:

Yes, resending lost tasks seems to be independent of NNT.

Gruß,
Gundolf


Thanks so does this mean that if I let Boinc automatically report tasks will get more on it's own accord? Evan if I don't want anymore

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4349
Credit: 253194368
RAC: 40817

RE: Thanks so does this

Quote:
Thanks so does this mean that if I let Boinc automatically report tasks will get more on it's own accord? Evan if I don't want anymore

No, to the contrary. You will get sent the tasks again that were not (yet) reported. If you don't want to run Einstein@home anymore, set "no new tasks", then abort the tasks you already got and update the project to report these. You may reset the project after that if you want.

BM

BM

Speedy
Speedy
Joined: 11 Aug 05
Posts: 41
Credit: 24976846
RAC: 52476

RE: RE: Thanks so does

Quote:
Quote:
Thanks so does this mean that if I let Boinc automatically report tasks will get more on it's own accord? Evan if I don't want anymore

No, to the contrary. You will get sent the tasks again that were not (yet) reported. If you don't want to run Einstein@home anymore, set "no new tasks", then abort the tasks you already got and update the project to report these. You may reset the project after that if you want.

BM


Thanks I've aborted all tasks & detached E@H from BAM for the time being. I now have 19 tasks showing in progress sorry these will have to time themselves out unless admin (BM) can speed this along?

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5887
Credit: 119316422941
RAC: 25613008

RE: I just reported 8

Quote:
I just reported 8 resent tasks manually with NNT selected by clicking the update button.


OK, because you are reporting your experiences in this particular thread, I'm assuming you are interested in understanding what is going on in order to assist in some way with the cleanup of the remaining tasks in the old run, particularly any resends. I'll try to explain it all as best I can. Please note that these comments and explanations are NOT designed for people who don't want to micromanage and just want a 'set and forget' experience. If you're in that category, please ignore what follows. Please don't take ANY action if what follows is not perfectly clear to you. BOINC will handle it automatically, but rather inefficiently, if left alone.

Your log snippet starts with the reporting of completed tasks but it doesn't say what sort of tasks were reported. I very much doubt you are reporting resent tasks but you could be reporting resends or primary tasks - you can't tell without looking at the actual task names.

Quote:
23/04/2011 3:19:10 p.m.	Einstein@Home	Sending scheduler request: Requested by user.
23/04/2011 3:19:10 p.m.	Einstein@Home	Reporting 8 completed tasks, not requesting new tasks
23/04/2011 3:19:21 p.m.	Einstein@Home	Scheduler request completed


I think you may be confusing what the scheduler is calling a "lost task" that is being "resent" with what I call a "resend" task. When tasks are first issued, I call them "primary" tasks. They always have an _0 or _1 suffix. If a primary task fails for whatever reason, the scheduler (when it becomes aware of this failure) will issue a further copy of the task. I call this extra copy a "resend" and you can always distinguish these from primary tasks because they have suffixes like _2, _3, _4, etc, as many as are required to eventually complete the quorum - with a max limit of 20 tasks.

The snippet above simply shows that your client reported 8 tasks and wasn't asking for any new ones - after all you did have NNT set. You would have to check the suffix on each of those tasks to see if any were resends and it really doesn't matter because it's not connected with what comes next.

Quote:
23/04/2011 3:19:21 p.m.	Einstein@Home	Message from server: Resent lost task h1_1478.40_S5R4__595_S5GC1HFa_1
23/04/2011 3:19:21 p.m.	Einstein@Home	Message from server: Resent lost task h1_1481.25_S5R4__552_S5GC1HFa_1
23/04/2011 3:19:21 p.m.	Einstein@Home	Message from server: Resent lost task h1_1478.40_S5R4__595_S5GC1HFa_1
23/04/2011 3:19:21 p.m.	Einstein@Home	Message from server: Resent lost task h1_1481.25_S5R4__552_S5GC1HFa_1
23/04/2011 3:19:21 p.m.	Einstein@Home	Message from server: Resent lost task h1_1490.95_S5R4__640_S5GC1HFa_0
23/04/2011 3:19:21 p.m.	Einstein@Home	Message from server: Resent lost task h1_1478.65_S5R4__577_S5GC1HFa_0
23/04/2011 3:19:21 p.m.	Einstein@Home	Message from server: Resent lost task h1_1478.65_S5R4__576_S5GC1HFa_0
23/04/2011 3:19:21 p.m.	Einstein@Home	Message from server: Resent lost task h1_1464.20_S5R4__461_S5GC1HFa_0
23/04/2011 3:19:21 p.m.	Einstein@Home	Message from server: Resent lost task h1_1441.25_S5R4__316_S5GC1HFa_0
23/04/2011 3:19:21 p.m.	Einstein@Home	Message from server: Resent lost task h1_1484.80_S5R4__559_S5GC1HFa_1
23/04/2011 3:19:21 p.m.	Einstein@Home	Message from server: Resent lost task h1_1484.80_S5R4__558_S5GC1HFa_1
23/04/2011 3:19:21 p.m.	Einstein@Home	Message from server: Resent lost task h1_1479.10_S5R4__633_S5GC1HFa_1
23/04/2011 3:19:21 p.m.	Einstein@Home	Message from server: Resent lost task h1_1447.90_S5R4__134_S5GC1HFa_0
23/04/2011 3:19:21 p.m.	Einstein@Home	Message from server: Resent lost task h1_1484.80_S5R4__557_S5GC1HFa_0


This snippet tells you there was a discrepancy between the schedulers idea of things and your client's idea of things. There are (at least) 12 tasks (and they are all primary tasks - check for yourself) that the scheduler thinks you should have and that your client doesn't have. As you could imagine, it's extremely important that both sides of the client/server relationship should have identical views of the world. Your client has sent a list of the tasks it has and the server thinks that the client should have more. So the scheduler (irrespective of your NNT setting) will force the client to take these extra tasks (the "lost" tasks) so that each side will agree. It will do this 12 at a time so it could well be that you will eventually be sent more than 12 on the next exchange between client and scheduler - if there are any more lost tasks.

So how do tasks become lost? The most common reason is simply bad luck. If your client happens to ask for work at a time when the server is under heavy load, it may not receive a reply within a timeout interval. The client stops waiting and reports (on the messages tab of BOINC Manager) a problem talking to the server. it will usually retry after a further interval but this could easily have been prevented if NNT had been set in the interim. In the meantime, the server would have eventually got around to answering your request and would have issued tasks and recorded that fact in the database. Your client is no longer listening so it wouldn't have received the tasks. So the discrepancy now exists. This really isn't much of an issue because the server will notice and make good the discrepancy on the very next contact and your client can't refuse the lost tasks even if NNT is set. So agreement is going to be restored eventually.

Quote:
23/04/2011 3:19:23 p.m.	Einstein@Home	Started download of skygrid_1480Hz_S5GC1.dat
23/04/2011 3:19:23 p.m.	Einstein@Home	Started download of h1_1478.40_S5R4
23/04/2011 3:20:08 p.m.	Einstein@Home	Finished download of skygrid_1480Hz_S5GC1.dat
23/04/2011 3:20:08 p.m.	Einstein@Home	Started download of h1_1478.40_S5R7
23/04/2011 3:20:14 p.m.	Einstein@Home	Finished download of h1_1478.40_S5R4
23/04/2011 3:20:14 p.m.	Einstein@Home	Started download of l1_1478.40_S5R4
23/04/2011 3:21:02 p.m.	Einstein@Home	Finished download of l1_1478.40_S5R4
23/04/2011 3:21:02 p.m.	Einstein@Home	Started download of l1_1478.40_S5R7
23/04/2011 3:21:21 p.m.	Einstein@Home	Finished download of h1_1478.40_S5R7
23/04/2011 3:21:21 p.m.	Einstein@Home	Started download of h1_1478.45_S5R4
23/04/2011 3:22:00 p.m.	Einstein@Home	Finished download of l1_1478.40_S5R7
23/04/2011 3:22:00 p.m.	Einstein@Home	Started download of h1_1478.45_S5R7
23/04/2011 3:22:13 p.m.	Einstein@Home	Finished download of h1_1478.45_S5R4
23/04/2011 3:22:13 p.m.	Einstein@Home	Started download of l1_1478.45_S5R4
23/04/2011 3:22:45 p.m.	Einstein@Home	Finished download of h1_1478.45_S5R7
23/04/2011 3:22:45 p.m.	Einstein@Home	Started download of l1_1478.45_S5R7
23/04/2011 3:22:51 p.m.	Einstein@Home	Finished download of l1_1478.45_S5R4
23/04/2011 3:22:51 p.m.	Einstein@Home	Started download of h1_1478.50_S5R4
23/04/2011 3:23:19 p.m.	Einstein@Home	Finished download of l1_1478.45_S5R7
23/04/2011 3:23:19 p.m.	Einstein@Home	Started download of h1_1478.50_S5R7


There are a number of things in the above snippet that are important to understand. Firstly, these are not tasks that are being downloaded. They are data files, namely skygrid files and lots of LIGO data files - in fact all the data needed to support the 12 lost tasks that the scheduler had just resent to you. Secondly, there would have been about 2GB of data which would have taken quite a long time to download - and it was for just 12 tasks! Is it any wonder I'm on a crusade to improve this. Take a good look at all the different and unrelated frequencies being sent to you. In other words, check the frequency value included as part of the task name of each lost task being resent. The scheduler is not very smart in the way it assigns work - particularly from now on with so little work left in the current run.

Earlier on in this thread I discussed these sort of issues with Oliver Bock, who (in time) will be able fix the problems. This was a couple of weeks ago and there were more tasks available then. What happened to you is exactly what I described to Oliver in this message that I posted on April 7. Here's the critical bit.

Quote:

Imagine a client has LIGO data for just a small number of frequency bands - lets say 1430.20Hz to 1430.40Hz - that's just 5 bands. It has blocks for all of these only. A request for a single task will most likely result in a 1430.20Hz task being issued (assume availability is OK) and all the extra LIGO files from 1430.45Hz and above (it would actually be 28 files covering 1430.45Hz to 1430.75Hz) will be downloaded and all the blocks for these will be added to the state file. No problem with any of this.

For behaviour comparison purposes, assume the work request was much larger (say 20 tasks - a full day was added to the cache) with everything else the same. I do this quite a lot. From my experience, around 2 tasks per frequency band will be issued for the 5 available frequency bands - for simplicity let's assume it is exactly 2 so that 10 of the 20 requested will be for frequencies from 1430.20Hz to 1430.40Hz. It's what happens for the further 10 tasks needed to fill the request that's the interesting bit. Before explaining that I should also mention that there will be quite a few more blocks added to the state file this time - 44 blocks covering the range from 1430.45Hz to 1430.95Hz.

The remaining 10 tasks will be supplied after jumping to a completely different frequency. Even though 48 LIGO files will need to be downloaded for a single frequency band, the scheduler will not take advantage of that and will not choose more tasks from immediately adjacent frequency bands. Quite often, the new frequency band contains a single task (the scheduler loves to use the opportunity to get rid of a single resend). I can remember one example where the extra 10 tasks were going to be supplied from 7 completely new frequency sets - around 350 new LIGO files in total.

Note in the last paragraph above, I commented that 10 tasks would be supplied from about 7 different frequency sets. In your case you got about 12 tasks from about 9 different frequency sets. This is the sort of inefficiency that I'm concerned about and (hopefully) Oliver will vastly improve.

Any participant can make a big improvement by their own actions. To prevent this unnecessary jumping to multiple frequency sets, just make sure you don't ask for a big number of tasks in one single hit if you don't have the necessary blocks already in your state file to support it. Always make it your business to know how many frequency bands you have available above the particular frequency that your most recent task was for. If you know you don't have many, just follow the next procedure. It's a bit painful to do but ask for 20 tasks by asking for just 1 initially (eg work out what cache size you need, to get just (say) a single extra task). In the worst case there may be no tasks for your current frequency set and you will get a frequency jump to a new set along with either 48 or 52 LIGO files. Once you have them all, you could then ask for at least 10 more tasks without risking another frequency jump. The scheduler can now see all the 12 or 13 frequency bands you have and will supply tasks for these bands without doing a complete frequency jump. You will download 4 extra LIGO files for each 0.05Hz frequency shift to the next band but this is far less than what happens with a jump to a different frequency set. When you get those 10 tasks (and the extra LIGO data as described) you can keep asking for 10 (or even more) at a time and keep getting more tasks from the same frequency bands with perhaps the occasional server delete request and the 4 extra LIGO files. The rule to apply is, "Don't ask for more tasks than what could be supplied for the frequency bands for which you already have blocks in your state file." You can estimate this by allowing say 2 tasks per frequency band. So if you've just downloaded a single task, you would have at least 12 current frequency bands in you state file and you could estimate that quite a lot of extra tasks could be asked for next time. If you're keen enough you can also remove the tags from the state file to get more tasks from the bands that have been marked prematurely for deletion.

Quote:
Is this meant to happen?


Yes, everything in that log snippet is compatible with the way I know that the scheduler works.

Cheers,
Gary.

Speedy
Speedy
Joined: 11 Aug 05
Posts: 41
Credit: 24976846
RAC: 52476

Thanks Gary. All tasks

Thanks Gary. All tasks returned on the 23rd were the 8 tasks I was referring to in my earlier post. I still can't understand how you can tel E@H to work in a radio band ie 1430Hz to 1440Hz. If you are going to explain you'll need to put it step by step so I can understand. As I said in eariler post I've detached from E@H. How long do you thing the resend will be around for?

Thanks for taking time to explain.

Gundolf Jahn
Gundolf Jahn
Joined: 1 Mar 05
Posts: 1079
Credit: 341280
RAC: 0

RE: Thanks I've aborted all

Quote:
Thanks I've aborted all tasks & detached E@H from BAM for the time being. I now have 19 tasks showing in progress sorry these will have to time themselves out unless admin (BM) can speed this along?


Did you report the aborted tasks before detaching? If not, they'll just be resent to you when you re-attach to Einstein@home.

If you want to speed this along yourself, re-attach to Einstein, set NNT and accept resent lost tasks. Then abort and report them. Don't forget to abort the download of the data files.

Rinse and repeat until your Tasks (and Transfers) tab show no more Einstein tasks and your online task list has no more "in progress" tasks. ;-)

Gruß,
Gundolf

Computer sind nicht alles im Leben. (Kleiner Scherz)

Speedy
Speedy
Joined: 11 Aug 05
Posts: 41
Credit: 24976846
RAC: 52476

RE: RE: Thanks I've

Quote:
Quote:
Thanks I've aborted all tasks & detached E@H from BAM for the time being. I now have 19 tasks showing in progress sorry these will have to time themselves out unless admin (BM) can speed this along?

Did you report the aborted tasks before detaching? If not, they'll just be resent to you when you re-attach to Einstein@home.

If you want to speed this along yourself, re-attach to Einstein, set NNT and accept resent lost tasks. Then abort and report them. Don't forget to abort the download of the data files.

Rinse and repeat until your Tasks (and Transfers) tab show no more Einstein tasks and your online task list has no more "in progress" tasks. ;-)

Gruß,
Gundolf


Thanks I've aborted all my tasks & detached. Thanks again for the help

Claggy
Claggy
Joined: 29 Dec 06
Posts: 560
Credit: 2830707
RAC: 2876

RE: RE: RE: Thanks I've

Quote:
Quote:
Quote:
Thanks I've aborted all tasks & detached E@H from BAM for the time being. I now have 19 tasks showing in progress sorry these will have to time themselves out unless admin (BM) can speed this along?

Did you report the aborted tasks before detaching? If not, they'll just be resent to you when you re-attach to Einstein@home.

If you want to speed this along yourself, re-attach to Einstein, set NNT and accept resent lost tasks. Then abort and report them. Don't forget to abort the download of the data files.

Rinse and repeat until your Tasks (and Transfers) tab show no more Einstein tasks and your online task list has no more "in progress" tasks. ;-)

Gruß,
Gundolf


Thanks I've aborted all my tasks & detached. Thanks again for the help

But did you report them Before you detached?

Claggy

Speedy
Speedy
Joined: 11 Aug 05
Posts: 41
Credit: 24976846
RAC: 52476

RE: RE: RE: RE: Thank

Quote:
Quote:
Quote:
Quote:
Thanks I've aborted all tasks & detached E@H from BAM for the time being. I now have 19 tasks showing in progress sorry these will have to time themselves out unless admin (BM) can speed this along?

Did you report the aborted tasks before detaching? If not, they'll just be resent to you when you re-attach to Einstein@home.

If you want to speed this along yourself, re-attach to Einstein, set NNT and accept resent lost tasks. Then abort and report them. Don't forget to abort the download of the data files.

Rinse and repeat until your Tasks (and Transfers) tab show no more Einstein tasks and your online task list has no more "in progress" tasks. ;-)

Gruß,
Gundolf


Thanks I've aborted all my tasks & detached. Thanks again for the help

But did you report them Before you detached?

Claggy


Yes they show as Aborted by user in my account. Sorry for any misunderstanding

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.