Task time estimates and sent tasks

grn
grn
Joined: 6 Nov 14
Posts: 17
Credit: 30339583
RAC: 0
Topic 197845

I am wondering why the time estimates given for tasks are so badly our, particularly if the time estimate is used to define the number of tasks downloaded and the deadline. On one system I have two task currently running with an estimated time of over 22 hours one of these reports it is 86% complete with a remianing time of just over 3 hours but since this value doesn't change until the run time increases by minutes and then the remaining time reduces by a few seconds at best. All of these tasks have taken well over the estimate resulting in each task taking over a day to complete. I currently have 20 of these tasks still to run so will take, at best, 10 days to complete but the deadline set for them is 8th December - clearly quite a few will not complete by the deadline.

On another system I have 8 binary pulsar searches running, estimate at almost 16 hours and they are taking over 20 hours to complete and I have 31 of thes still to run with a deadline for the 3rd December, so again many of these will not run before the deadline.

On the same system I have a Perseus ARm Survey running on the GPU estimated at over 22 hours with a deadline of 9th December. These are estimated at just under 6 hours and the one currently running is 91% complete but a run time currently over 14 hours with 20 minutes still estimated to run. I have 27 of these tasks still to run and if they all follow the same pattern many of these will not run before the deadline.

Any suggestions as to what I should do?

archae86
archae86
Joined: 6 Dec 05
Posts: 3157
Credit: 7228958230
RAC: 1134786

Task time estimates and sent tasks

Quote:
Any suggestions as to what I should do?


For people who run their systems constantly, or at least a reproducible fraction of the day, and who run just one kind of work, the estimates converge to reality reasonably quickly.

But they don't start out right for anyone. So newcomers, especially, are well advised to set their work request to a small fraction of a day. That way you won't miss deadlines, be troubled by High Priority mode, and such.

The system does not converge so well when multiple task types are involved, especially when they run on the same resource.

So, my suggestions as to what you, personally, should do.

1. If you have a local preference set on a host (which if present takes precedence over website requested preferences), adjust the minimum work buffer and maximum additional work buffer parameters to well under a day. Try 0.2, 0.1 for a start.

2. If you do not have a local preference set then go to your user account page on Einstein, click the Computing Preferences link, and adjust the "Computer is connected to the Internet about every" number to 0.2 days, and the "Maintain enough work for an additional" to 0.1 days. You should do this separately for all for locations (aka venues) unless you are sure one is not in use in your fleet.

3. I suggest you restrict work types from Einstein to your hosts to just one CPU type of task and one GPU type of task. To do this go to your Einstein user account page, select Einstein preferences link, and for each location you are using de-select the other three applications types, leaving only Perseus Arm survey and Gamma-ray pulsar search #4 enabled.

It will take time for the over-fetching you have induced to clear, but when it does these settings will keep you supplied with work, and estimates will converge with reality fairly quickly.

Later, you may find reason to enable additional application types or to request a deeper queue of work, but here at Einstein I think you'll find these settings to work well until one of the enabled applications finishes the available data.

grn
grn
Joined: 6 Nov 14
Posts: 17
Credit: 30339583
RAC: 0

Thanks for the reply. My

Thanks for the reply. My systems run all day and, since I joined the Einstein@Home project it has been running these tasks to the exclusion of other projects I subscribe to. The only preferences I have set for this project is to limit new work to a maximum of two days but this is clearly ignored by the Boinc manager. I have tried suspending tasks to allow other projects in but fortunately, SETI@Home has run out of work and the other two projects I subscribe to have long running tasks but also with a long deadline (February next year) and the number of tasks sent can easily be completed by the deadline.

I have been sent tasks only for the Perseus Arm Survey and the Binary Pulsar search and this has been running constantly on my systems since subscribing to the project (about 2 weeks ago), so I think estimates should have converged by now. In fact the initial batches I got of the Perseus Arm Survey finished in about half the estimated time but this latest batch is taking almost 3 times the estimated time; the first one is still running, is 95% complete with almost 15 hours elapsed.

I have my preferences set to swap between tasks every 60 minutes but see no evidence of this happening. The only way I have found to let tasks from other projects run is to suspend this project (I have already disallowed new work as, if I allow more work, I get sent new tasks daily). It seems to me that BOINC Manager isn't handling the workload correctly.

That said, it did abort all the outstanding tasks for the 3rd and one of the running tasks has just completed and a long running task from another project has started. This seems to be increasing the run time and decreasing the remaining time at about the same rate. This is almost the same behaviour I have seen with the other projects I'm subscribed to. Only this project seems to underestimate the time and remaining time decreases substantially slower than run time increases. I have seen the remaining time increase frequently in line with the %age completion increasing and, of course, run time increasing.

archae86
archae86
Joined: 6 Dec 05
Posts: 3157
Credit: 7228958230
RAC: 1134786

BoincMgr, of course, is not

BoincMgr, of course, is not an Einstein piece of software.

If you find it is not observing your preferences, the simplest explanation would be that, despite your intentions, you have not communicated them properly to it.

One simple way for this to happen would be if you are making your preferences wishes known using web site input, but you already have blanked out that source of control by specifying preferences directly on your PC, using the Tools|Computing Preferences ... entry.

Another simple way for this to happen is if you specify on the web site preferences for a location (aka venue) different than the one your host actually currently occupies.

It seems unlikely to be productive for me to discuss your other concerns until the quite basic matter of applying preferences is resolved.

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5872
Credit: 117759262069
RAC: 34780510

Hi George, welcome to

Hi George, welcome to Einstein@Home.

I agree with archae86 when he says that you may not have various preferences set correctly for this project. I'll also comment on a couple of other points that you make.

Quote:
... The only preferences I have set for this project is to limit new work to a maximum of two days but this is clearly ignored by the Boinc manager.


Is your "two days" the total of both settings values that control work fetch? Are you using website preferences or have you set them locally within BOINC Manager? What values are you using for each setting? I can assure you that whatever they really are, they aren't being ignored.

Quote:
I have been sent tasks only for the Perseus Arm Survey and the Binary Pulsar search and this has been running constantly on my systems since subscribing to the project (about 2 weeks ago), so I think estimates should have converged by now.


There are three separate pulsar searches currently going on. There is BRP5 (Binary Radio Pulsar search #5, aka Perseus Arm Survey) which is a GPU only search. It's using data from the Parkes Radio Telescope in Australia. There is BRP4 (Binary Radio Pulsar search #4) which uses data from the Arecibo Radio Telescope. The "G" version of this search is for the more powerful external GPUs like yours. You don't seem to have any of those. The third search is FGRP4 (Fermi Gamma Ray Pulsar search #4) and it is using data gathered by the Fermi space telescope. This search is currently a 'CPU only' search. Radio pulsar searches and the gamma ray pulsar search are looking for quite different signals.

You appear to be running BRP5 on your GPU and FGRP4 on your CPU cores. Because E@H uses highly customised server code based on 'old' BOINC server code, it still uses a single (per project) DCF (Duration Correction Factor) for correcting run time estimates. This means that when you are running different searches within the one project, the estimates will never really converge. They will tend to bounce around a bit if the different searches have different 'accuracies' built into their tasks, as they tend to do. There are a lot of different factors that make it impossible to remove all the oscillations.

Despite all this, it is possible to craft your preferences in such a way as to give you a 'set and forget' operation. The key is to keep your cache settings relatively small. Personally, I keep a 3 day supply. I use a 'low water mark' of 3.0 days and an 'extra days' setting of 0.01.

Quote:
In fact the initial batches I got of the Perseus Arm Survey finished in about half the estimated time but this latest batch is taking almost 3 times the estimated time; the first one is still running, is 95% complete with almost 15 hours elapsed.


I really don't understand what you are saying here. Perseus Arm Survey (BRP5) is GPU only so on your machine with the GTX670 I can see 76 of these of which 49 are completed and only 27 are 'in progress'. For 48 out of the 49 completed tasks, the run time has been between 9500 and 10000 seconds. Are you perhaps referring to FGRP4 (Gamma Ray Pulsar) tasks?

Quote:
I have my preferences set to swap between tasks every 60 minutes but see no evidence of this happening.


This setting simply provides the interval after which BOINC will check to see if a task from a different project is more eligible to run than the current one. If you have managed to get into high priority (panic) mode, BOINC will not change to a different task until the panic is over.

Quote:
The only way I have found to let tasks from other projects run is to suspend this project (I have already disallowed new work as, if I allow more work, I get sent new tasks daily). It seems to me that BOINC Manager isn't handling the workload correctly.


It seems to me that you have inappropriate cache settings (and, perhaps, inappropriate resource shares for the current situation). You must have, if BOINC keeps requesting more work. What resource shares have you assigned to each different project? Do you realise that if your highest share project can't supply work, BOINC will tend to grab the whole lot from a low resource share project? This can create havoc (and panic mode) later on. If you are going to add a project to make up for a project that can't supply, you must reduce your cache size to a nice low value beforehand simply to avoid potential mayhem.

Quote:
That said, it did abort all the outstanding tasks for the 3rd and one of the running tasks has just completed and a long running task from another project has started.


At first, I couldn't work out what "3rd" referred to - 3rd time?, 3rd host?, 3rd project? - but I couldn't find any aborted BRP5 tasks on any host and then I noticed a whole bunch of aborted FGRP4 tasks on your machine with the GTX670. So now I realise you must have been referring to the deadline date of 3rd December.

At the moment you have 129 FGRP4 tasks on that machine. No doubt there were more (going back to 19th Nov) which have since been deleted from the online database. Of the 129, 75 have been returned and a further 31 were aborted. There are 23 in progress, of which 4 have exceeded the deadline. They must have already been started at the time the others were aborted.

Once again, the crunch times for the 75 completed tasks seem to be reasonably consistent. The vast majority are in the range of 50000 to 55000 secs with the odd couple of tasks more towards 60000. These times seem perfectly normal for that machine (4 cores, 8 threads). I see no evidence of CPU tasks running for excessive lengths of time.

My strongest advice to you is to ignore estimates and simply concentrate on the fact that FGRP4 tasks are each going to take around 15 hours to crunch on average on that machine. The 23 'in progress' tasks you currently have could be crunched in about 340 hours total - about 14 days using just 1 of your 8 threads. If you could stand to use 2 threads, you should have no problem with deadlines any more.

Quote:
Only this project seems to underestimate the time and remaining time decreases substantially slower than run time increases. I have seen the remaining time increase frequently in line with the %age completion increasing and, of course, run time increasing.


You just need to trust that FGRP4 tasks are going to take around 15 hours on your rig, no matter what the estimate says. If the estimate says 5 hours at the start, of course it will increase as BOINC gradually works out that it's really going to take a lot longer than the estimate. Why does BOINC start off with such a low estimate? Probably because a number of BRP5 tasks were completed at less than their respective estimates so with a single DCF affecting all searches, the estimates of all FGRP4 tasks will also have been reduced.

I hope some of the above is understandable to you. If not, it wont matter much as long as you set something like 1.0 days / 0.1 days for your two cache settings and that you share your resources sensibly between all your projects. Don't give one project 90% and share the balance between the rest. BOINC is likely to get in a real mess with that scenario :-).

Cheers,
Gary.

grn
grn
Joined: 6 Nov 14
Posts: 17
Credit: 30339583
RAC: 0

Gary, Many thanks for the

Gary,

Many thanks for the information but, despite what you say, I have had many tasks that have consistently overrun their estimated time. On my system with the GPU, I have a backlog of tasks for the GPU estimated at over 22 hours run time - there are 27 of these (the 27 marked as in progress) - so using the estimated run time that is around 25 days to complete and the deadline for these is 15/12 only 12 days away. If E@H is recognising my local setting for two days work the number of tasks I have been sent is not considering my local setting. I have also set this project to to disallow more tasks, as I would get a bunch of new tasks on an almost daily basis.

I am subscribed to a number of projects and they all seem to behave according to my settings. All the projects have the same local and site preferences with equal resource sharing rights - 4 projects with 25% resource share each but E@H appears to hog the machine, although I have seen another project start to get a share but still see little evidence of any task relinquishing its processor with my defined 60 minute run period setting. It may be happening I just never see it even if I am working on the system at the time for over an hour.

I am not concerned about the overrun, the whole point of the post was to determine what is the basis that a project uses to determine the amount of work to send and the deadline to set. I agree with your 15 hour figure - it is in line with what I have seen but the in progress tasks all have an estimated time of 10 hours 20 minutes which, if the complete in that time then they would complete within the 12 days, if no other project tasks ran. However, they appear to take around 15 hours to run as evidenced by the returned results but the current task (these are the Perseus Arm GPU tasks) has been running for just over 14 hours with over 21 hours estimated as remaining. AT just over 40% complete, it looks like this will significantly exceed the 15 hour typical completion time. The initial set of tasks I had for the Perseus Arm Survey were of the order of 6 hours estimated and must have taken around half that time to complete. Every time I looked at progress, these tasks would clock up 1 second in run time and reduce time remaining by 2 seconds

I do not understand what the 0.1 cache setting suggestion you make refers to. I have 2 days work set and clearly E@H sends much more than this amount of work. I can see no other setting that I think the 0.1 applies to.

All I am trying to achieve is a balanced load across my projects but I honestly do not see E@H taking any notice of any of the settings I have tried. I subscribed to this project as it is one of the areas I am interested in donating computer time to but may have to reconsider if I cannot get it to share the resource with my other projects. I would be quite happy to deselect all but a single GPU task set as only SETI@Home is the only other project I have that provides GPU tasks but would prefer to allow the two task types I get at present to continue, if I could control the amount of work that is sent.

I've not been able to achieve this and have tried a variety of different settings without success. I am just trying to get a balanced load across my projects without having to continually monitor progress and make manual intervations to avoid the resource starvation the other projects seem to suffer.

Claggy
Claggy
Joined: 29 Dec 06
Posts: 560
Credit: 2699403
RAC: 0

RE: If E@H is recognising

Quote:
If E@H is recognising my local setting for two days work the number of tasks I have been sent is not considering my local setting.


According to your last scheduler contact you have the minimum buffer set to 10 days (864000 seconds divided by 60 makes 14400 minutes, divided by 60 makes 240 hours, divided by 24 makes 10 days):

Quote:
2014-12-03 20:40:56.4758 [PID=31461] Request: [USER#xxxxx] [HOST#11687059] [IP xxx.xxx.xxx.126] client 7.4.27
2014-12-03 20:40:56.4772 [PID=31461] [handle] [HOST#11687059] [RESULT#469942670] [WU#205397281] got result (DB: server_state=4 outcome=0 client_state=0 validate_state=0 delete_state=0)
2014-12-03 20:40:56.4772 [PID=31461] [handle] cpu time 46027.850000 credit/sec 0.007256, claimed credit 333.976165
2014-12-03 20:40:56.4777 [PID=31461] [handle] [RESULT#469942670] [WU#205397281]: setting outcome SUCCESS
2014-12-03 20:40:56.5813 [PID=31461] [send] effective_ncpus 8 max_jobs_on_host_cpu 999999 max_jobs_on_host 999999
2014-12-03 20:40:56.5813 [PID=31461] [send] effective_ngpus 1 max_jobs_on_host_gpu 999999
2014-12-03 20:40:56.5813 [PID=31461] [send] Not using matchmaker scheduling; Not using EDF sim
2014-12-03 20:40:56.5813 [PID=31461] [send] CPU: req 0.00 sec, 0.00 instances; est delay 0.00
2014-12-03 20:40:56.5813 [PID=31461] [send] CUDA: req 0.00 sec, 0.00 instances; est delay 0.00
2014-12-03 20:40:56.5814 [PID=31461] [send] work_req_seconds: 0.00 secs
2014-12-03 20:40:56.5814 [PID=31461] [send] available disk 194.83 GB, work_buf_min 864000
2014-12-03 20:40:56.5814 [PID=31461] [send] active_frac 0.999971 on_frac 0.996191 DCF 2.315605
2014-12-03 20:40:56.6069 [PID=31461] Sending reply to [HOST#11687059]: 0 results, delay req 60.00
2014-12-03 20:40:56.6079 [PID=31461] Scheduler ran 0.135 seconds


The other use of the minimum buffer is to determine how many days offline Boinc is going to have, meaning Boinc in this case will try and cache 10 days work (probably 10+2 days),
and try and get work done 10 days early, all on a project with max deadlines of only 14 days, basically meaning you have four days to do ten days work.

Another thing to Note, Boinc 7.4.27 doesn't show tasks running in 'High priority' any longer, DA was concerned that new users would confuse that with thread priority and get worried,
So now tasks running in High priority just have a status of 'running', no-one managed to convince him to use an alternate word:

http://boinc.berkeley.edu/gitweb/?p=boinc-v2.git;a=commit;h=bca7c006deae20cc31be20fc37396bb4c0cfafc2

Quote:

Manager: omit ", high priority" from task status

This makes it sound like BOINC is running the job at high OS priority.

Claggy

grn
grn
Joined: 6 Nov 14
Posts: 17
Credit: 30339583
RAC: 0

Claggy, Thanks for

Claggy,

Thanks for clarifying this figure. I have to say it is very misleading

Quote:

2014-12-03 20:40:56.5814 [PID=31461] [send] available disk 194.83 GB, work_buf_min 864000

quotes a disk size - work_buf_min in the same line suggests to me that this is the size of a minimum memory buffer to hold data being downloaded or processed and not that the figure represents a time. So how does this relate to the setting in computing preferences/network usage?

Quote:
Maintain enough work for an additional 2 days

I thought this setting controlled the amount of work that would be sent. I will update my settings accordingly.

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5872
Credit: 117759262069
RAC: 34780510

RE: ... I have had many

Quote:
... I have had many tasks that have consistently overrun their estimated time. On my system with the GPU, I have a backlog of tasks for the GPU estimated at over 22 hours run time ...


Would you care to point to a single example of a GPU task that has run longer than 22 hours? I can't find any! I see the vast bulk of GPU tasks taking around 2.7 hours each with an odd one or two taking a lot longer - perhaps 15 hours. This is possibly just an artifact due to the GPU not being properly serviced by the CPU for some odd reason. You can be pretty sure that your GPU tasks will take only 2.7 hours or so, on average, no matter what the estimate claims.

Quote:
... there are 27 of these (the 27 marked as in progress) - so using the estimated run time that is around 25 days to complete and the deadline for these is 15/12 only 12 days away.


Are you really talking about GPU tasks? At 2.7 hours real crunch time, the whole 27 you have left will take less than 3 days - nothing like the 25 days you mention. Until everything settles, you need to ignore the estimates when they differ so widely from reality.

Quote:
If E@H is recognising my local setting for two days work the number of tasks I have been sent is not considering my local setting. I have also set this project to to disallow more tasks, as I would get a bunch of new tasks on an almost daily basis.


As I previously suggested, and as Claggy has subsequently proved, you have too big a work cache. You appear to have a 10 + 2 = 12 day cache setting. There are two settings you need to examine on your computing preferences page. Because of the highly customised old server code being used at this project, the first one says, "Computer is connected to the Internet about every x.xx days", when it probably should say something like, "What is the minimum amount of work to be maintained". The second one says, "Maintain enough work for an additional x.xx days". The first setting acts as a 'low water' mark and the second as the extra on top to create a 'high water' mark. What two values do you have?

Quote:
... equal resource sharing rights - 4 projects with 25% resource share each ...


Sounds perfect.

Quote:
... but E@H appears to hog the machine ...


No project can 'hog the machine', against your wishes. Think about it for a moment. Did the project force you to set a 12 day cache? The BOINC client tries to comply with your choice of settings. The BOINC client requests work. The project just passively responds to the request as best it can. If you apparently get too much work, BOINC (through panic mode) forces the Einstein app to run. The project has no control over this.

Quote:
... see little evidence of any task relinquishing its processor with my defined 60 minute run period setting.


Do I need to explain this all again?? That setting is NOT a command to change to another task. It's simply the period after which BOINC will make a decision as to whether or not some other task should be started (or re-started from suspension) in place of the current one. If BOINC doesn't see a need to change, why are you worried about it? Now that we can assume that your machine is in panic mode (whether or not BOINC is showing that fact - thanks Claggy, I didn't know about that change - aarrrggghhhh!!!!), BOINC will keep the supposedly 'at risk' tasks running without change until the panic is over.

Quote:
... the whole point of the post was to determine what is the basis that a project uses to determine the amount of work to send ...


The project simply sends what it is asked for. You need to convince your BOINC client to ask for an appropriate amount.

Quote:
... and the deadline to set.


Except for beta test runs, all BRP5, BRP4G, FGRP4, and Grav Wave (GW) tasks have a standard 14 day deadline.

Quote:
I agree with your 15 hour figure - it is in line with what I have seen but the in progress tasks all have an estimated time of 10 hours 20 minutes which, if the complete in that time then they would complete within the 12 days, if no other project tasks ran. However, they appear to take around 15 hours to run as evidenced by the returned results but the current task (these are the Perseus Arm GPU tasks) has been running for just over 14 hours with over 21 hours estimated as remaining. AT just over 40% complete, it looks like this will significantly exceed the 15 hour typical completion time. The initial set of tasks I had for the Perseus Arm Survey were of the order of 6 hours estimated and must have taken around half that time to complete. Every time I looked at progress, these tasks would clock up 1 second in run time and reduce time remaining by 2 seconds


You appear to be confusing the FGRP4 CPU tasks (which take around 15 hours) with BRP5 (Binary radio pulsar - Perseus Arm Survey) GPU tasks which seem to average around 2.7 hours. If you go back and re-read my previous message you can see I was talking about the FGRP4 CPU tasks taking 15 hours.

For the BRP5 GPU tasks, there are just a couple of examples of much longer run times than the normal 2.7 hours. This may well be a sign that you need to free up a CPU core for GPU support duties. To achieve that you could try setting the use of 87.5% of the available processors rather than the full 100% in your computing preferences.

Quote:
I do not understand what the 0.1 cache setting suggestion you make refers to. I have 2 days work set and clearly E@H sends much more than this amount of work. I can see no other setting that I think the 0.1 applies to.


Set 2.0 for the "Connect to internet ..." setting. Set 0.1 for the "Additional days ..." setting. Every time your work on hand drops below 2.0 days, BOINC will top up to 2.1 days.

Cheers,
Gary.

grn
grn
Joined: 6 Nov 14
Posts: 17
Credit: 30339583
RAC: 0

This thread is becoming

This thread is becoming pointless as it seems we are taking at cross-purposes. Claggy did set me straight on the settings to restrict work items sent. I had misread the settings and as stated in my previous post have changed these.

Quote:
Are you really talking about GPU tasks? At 2.7 hours real crunch time, the whole 27 you have left will take less than 3 days - nothing like the 25 days you mention. Until everything settles, you need to ignore the estimates when they differ so widely from reality.

Yes I am, the initials set of tasks I got for GPU tasks (Perseus Arm Survey) had an estimated time just under 6 hours I believe and did run in about 2.5. The current set of tasks, for which there are 27 "in progress" have an estimated run time of 15 hours 49 minutes hours. I don't have any data as yet as to how long they actually take but it is significantly more than 2.7 hours. The one that is currently running has been running for almost 6.5 hours with a remaining estimate of 5.5 hours and is 65% complete. Since I have no data to base a more accurate estimation of the total time, I'm using the initial estimate to calculate the time.

I'll explain in simple terms: there is only one GPU so the tasks have to be run serially. There are 27 of them so, based on the estimated run time that is 27*15:49:31 = 1460771 seconds. Now I'll convert back. 1460771/(3600*24) = 16.9 days but the deadline for these is the 15th of this month which is less than 16 days away.

From what you say and I quote:

Quote:
Except for beta test runs, all BRP5, BRP4G, FGRP4, and Grav Wave (GW) tasks have a standard 14 day deadline.

So it is clear to me that I was sent more work than could be achieved by the deadline, regardless of any incorrect settings on the amount of work I allow. So my statement that settings are ignored may be incorrect but the project should adjust the deadline based on the amount of work I have configured to allow.

Also, you go on at great length that BOINC manager does not switch tasks yet the setting states:

Quote:
Switch between tasks every 60 minutes
Recommended: 60 minutes

This suggests to the uninitiated that it does switch tasks. If it doesn't I see no point in having the setting at all. BOinc manager should just fix it at the recommended value and not offer it as an option. I was questioning this as I had a bunch of Binary pulsar tasks and a bunch of Seti@home tasks. The latter had a shorter deadline than the Binary pulsar tasks, yet Boinc Manager in its wisdom continually selected the einstein tasks to the exclusion of the seti tasks. Both sets of tasks could and were completed by their respective deadlines, but only by me suspending the einstein project, allowing the seti task to complete then resuming the einstein project.

There is clearly a deficiency in the einstein project (and maybe other projects) if deadlines are fixed without any regard to the total time the work sent is not taken into consideration. And either Boinc Manager does not communicate information on the workload from other projects that also need to be taken into account by each project in determining the amount of work sent for a given deadline.

If users are given the ability to say they want 10 days work sent, then the only bearing this should have on the amount of work sent is how many tasks can be completed within the deadline period with the outstanding resource requirements from other projects. Without that, it makes subscribing to multiple projects subject to receiving workloads with deadlines that it are impossible to meet.

Although I admit my misunderstanding of the work load settings I do not see this solving the basic issue that the lack of consideration of total workload across all projects can result in an oversubscription of the compute resource and failure to meet deadlines. From the work sent just for GPU tasks by Einstein it clearly does not take into account the time the tasks it is sending take for the fixed 14 day deadline set never mind the resource requirements of other projects that are running. This is NOT solved by retricting the tasks to 2 days worth of work if this is consuming 100% of the compute resource and all other tasks do exactly the same.

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5872
Credit: 117759262069
RAC: 34780510

RE: This thread is becoming

Quote:
This thread is becoming pointless ...


George,

With the greatest of respect, I must say that I entirely agree, but probably for rather different reasons than the ones you have in mind.

It actually takes quite a long time to compose answers like the preceding ones and obviously I'm rather stupid to do so. I suffer from the delusion that if I attempt to explain the full details about what is likely to be happening to cause the issue, the recipient of the advice might actually try to digest it. I'm genuinely sorry if my delusion has caused you angst. I'll stop bothering you very shortly.

It's good that Claggy was smart enough to get the server log excerpt that finally proved to you just how much work your BOINC client had been asking for. However, if you want to see where credit should go, you should go back to the start of this thread and look at the very first response you received from archae86, who told you specifically:-

Quote:
1. If you have a local preference set on a host (which if present takes precedence over website requested preferences), adjust the minimum work buffer and maximum additional work buffer parameters to well under a day. Try 0.2, 0.1 for a start.


Local preferences are the ones you set for just the current host using BOINC Manager. If you are actually using website preferences which would apply to all hosts, he said:-

Quote:
2. If you do not have a local preference set then go to your user account page on Einstein, click the Computing Preferences link, and adjust the "Computer is connected to the Internet about every" number to 0.2 days, and the "Maintain enough work for an additional" to 0.1 days. You should do this separately for all for locations (aka venues) unless you are sure one is not in use in your fleet.


Archae86 had nailed the cause of your woes right there. The thing I don't understand is why you didn't read and act on the very clear advice given? If you didn't understand, why didn't you ask for clarification?

It really doesn't matter any more since you've got your work cache settings under control now.

At the risk of causing you further angst, I'm going to point out that you still have a further problem that you really need to address. I've pointed it out previously but I'm going to make one last effort to help. Referring specifically to BRP5 GPU tasks, your GTX670 has proved that it can do these in 2.7 hours. Go check all the validated tasks on the website for that host. Notice that the vast bulk of these took around 2.7 hours. Notice also that there were two tasks only that deviated from this and took around 15.5 hours. This is the problem that only you can fix.

You've been telling me that some tasks take longer than 2.7 hours and I've tried several times to explain why this might be happening and what you could try to correct it. The fact remains that your GPU can do these tasks in 2.7 hours and you need to make sure you stop the 'long running' behaviour.

There are a number of things that could be causing this - it could even be a driver issue. However the first thing to try is to make sure there is a CPU core available to support the GPU. It seems likely that the excessive time is being caused by the fact that all 8 virtual cores in your quad core host appear to be tied up crunching CPU tasks full bore.

As the GPU is local to this machine, you could change local preferences using BOINC Manager as a quick way of testing. Make sure you are in BOINC Manager Advanced View and select Tools -> Preferences and then select the Processors tab. Down towards the bottom of the page you will see a setting for the % of processors to use. It will be on 100%. Change it to 87.5% and click OK. If you then go to the event log you should see an entry showing the number of processors changing from 8 to 7. If you look at your tasks tab, you should see that there is a CPU task that was running that is now 'waiting to run'.

If you have a long running GPU task you can easily tell if it is now going faster. Every thing updates on a per second basis. If a task takes 2.7 hours to complete, that represents 0.01% progress per second. Long running behaviour seems to be about 15.5 hours. That represents just under 0.002% progress per second. So you just need to ask yourself the question, "Is the % done value for the running GPU task on the tasks tab under the Progress column jumping by 0.010% per second or 0.002% per second?" If it is still running at the slow rate, I would stop BOINC, reboot the machine and then restart BOINC. If it still continues to run slow, the issue is not lack of CPU support and we have to try looking elsewhere.

The next time you see a slow running GPU task, why don't you give the above procedure a try and report back on what you are seeing? I hope you do realise that your GPU is capable of 2.7 hour completion times and you just need to work out why sometimes it's getting stuck in the slow lane.

Cheers,
Gary.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.