Just added Einstein, but only getting CPU tasks?

Gary Roberts

Moderator

Joined: 9 Feb 05

Posts: 5888

Credit: 119969425175

RAC: 26458167

RE: I am continuing to

4 Jan 2014 0:04:48 UTC

Message 114377 in response to message 114376

(moderation:

)

Quote:

I am continuing to crunch with my old machine for a limited time. I abort anything that I don't expect to finish before the deadline at the earliest opportunity, usually about 1 or 2 days before the deadline, based on the time remaining listed. I will probably abort all but 24 hours per core 1 day before switch-off.

That's all fine. You should use local preferences on that machine to set a quite small cache size so there isn't much to abort when you do finally decide to turn it off.

Quote:

... The resource shares have remained at CPDN 35% (but nothing available at present), Cosmology 10%, Einstein 5%, LHC 10%, MilkyWay 10%, Rosetta 7.5%, Seti 10% & WCG 12.5%.

Your 8 core machine (if running 24/7) has 8x24=192 CPU hours per day. The E@H 'share' is therefore 192/20=9.6 hours per day. A S6CasA task is taking about 8.3 hours so BOINC will be trying to limit E@H to just over 1 task per day.

Quote:

The earliest Einstein units are not due for another 9+ days (8+ days for the first Rosetta). They should be finished in time, provided that the system estimate is not too far out.

You have over 60 E@H tasks on board. BOINC will soon need to go into 'Panic Mode', thereby ignoring your resource share wishes in the short term in order to attempt to meet the deadline. Sure, you could intervene manually and suspend other projects or abort excess tasks in order to sort this out, but it would be better for you to prevent the problem in the first place.

Quote:

I am currently crunching the last of my WCG units, from before I released the other projects, which are due in 48 hours. Based on the system estimate, the system estimate would give a completion 10 minutes late, but I am finding that the units are being finished faster than the official estimate, so I don't expect any problems. I presume that the earlier date is why they are currently being given preference. When I restarted the other projects, they were given preference until the deadline for WCG units kicked in.

What's to stop BOINC from loading up with more WCG once this lot are gone? BOINC will always try to fill your cache. If a major project can't supply, it will fill up using whatever minor project that can supply. This is why you have 60+ E@H tasks on board. Your life would be a lot easier if you stopped BOINC from over-fetching in the first place.

Quote:

I hadn't noticed any advice of a need to adjust the cache settings but have now changed them to 5/1 instead of 0/10. I will see how that pans out.

It's a BOINC thing. You would have to be following BOINC's boards to see any sort of official announcement. I remember reading (rather heated at times) user discussion about it in various places. Unfortunately, that's the way it is with BOINC. Quite substantial changes can occur with little forewarning. There are a few dedicated volunteers who try to keep people informed, but usually 'after the event'.

When you had a setting of 0/10, BOINC V6 would try to keep your cache at 10 days at all times. BOINC V7 would initially fill to 10 days and then not request any further work until the cache had completely drained to zero. With your new setting of 5/1, BOINC V7 will fill up to 6 days and then only fill up again (to 6 days) once the cache level drops below 5 days. If, by chance at the crucial time, your major project(s) can't supply, BOINC could grab that full amount of work from a minor project such as E@H. That 1 day of work would take 20 days to complete at the 5% resource share rate.

So, what's the solution? Firstly, I reckon it helps to be 'topping up' in small sips rather than big gulps. 6/0 would actually be better than 5/1. But the bigger problem for an 8 project mix is to have a large cache in the first place. The 'optimum' cache setting is the one where BOINC can manage things on its own without undue levels of manual intervention. You can only find that by experiment and, sadly, it's hard to achieve if a number of projects are 'unreliable'.

You should start out with a 1/0 setting until the current backlog is worked through. Once things stabilise, increase to 1.5/0 and watch what happens. It could be that just one project gets the bulk of the extra work or it could be spread around. Wait a day and then increase to 2/0. Chances are that different projects will supply the extra work this time. You can rinse and repeat but (for 8 projects) I wouldn't go too much higher until I had a fair bit of evidence that BOINC was handling things OK. You might get up to 3-4 days but I doubt you would get much higher without some projects seriously over-fetching. Don't some of your projects have a 7 day deadline? I've found that a good 'rule-of-thumb' is to limit your cache setting to no more than half the shortest deadline (and less than that for more projects). A lot depends on the reliability of projects to supply when asked.

Quote:

From message 12848:
Quote:
I currently have work from Cosmology, Rosetta & WCG as well as Einstein. I am also subscribed to CPDN, LHC, MilkyWay & Seti. I suspect that it might be better to crunch what I have got and bring my BOINC Manager list to zero. Could I then be supplied with a new version of whatever file has been corrupted to overwrite what I do have or do the missing jobs need to be deleted from the websites.

The 2 records need to be reconciled, so have you any suggestions, please?

As I said earlier in my second reply, I was wrong with my first reply and that there wasn't a problem with corruption in your state file. As it turns out, it looks like the whole episode was a server-side issue of some sort since the scheduler seems to have now decided to resend all the lost tasks.

2014-01-02 04:58:51.2753 [PID=24997] [debug]   [HOST#7591557] MSG(high) Resent lost task h1_0692.55_S6Directed__S6CasAf40a_693.1Hz_317_2
2014-01-02 04:58:51.2753 [PID=24997] [debug]   [HOST#7591557] MSG(high) Resent lost task h1_0660.90_S6Directed__S6CasAf40a_661.75Hz_32_2
2014-01-02 04:58:51.2753 [PID=24997] [debug]   [HOST#7591557] MSG(high) Resent lost task h1_0692.45_S6Directed__S6CasAf40a_693Hz_310_2
2014-01-02 04:58:51.2753 [PID=24997] [debug]   [HOST#7591557] MSG(high) Resent lost task h1_0692.45_S6Directed__S6CasAf40a_693Hz_309_2
2014-01-02 04:58:51.2753 [PID=24997] [debug]   [HOST#7591557] MSG(high) Resent lost task h1_0692.45_S6Directed__S6CasAf40a_693Hz_308_2
2014-01-02 04:58:51.2753 [PID=24997] [debug]   [HOST#7591557] MSG(high) Resent lost task h1_0692.50_S6Directed__S6CasAf40a_693Hz_376_2
2014-01-02 04:58:51.2753 [PID=24997] [debug]   [HOST#7591557] MSG(high) Resent lost task h1_0692.50_S6Directed__S6CasAf40a_693Hz_375_2
2014-01-02 04:58:51.2753 [PID=24997] [debug]   [HOST#7591557] MSG(high) Resent lost task h1_0692.55_S6Directed__S6CasAf40a_693.1Hz_316_2
2014-01-02 04:58:51.2753 [PID=24997] [debug]   [HOST#7591557] MSG(high) Resent lost task h1_0692.55_S6Directed__S6CasAf40a_693.1Hz_315_2
2014-01-02 04:58:51.2753 [PID=24997] [debug]   [HOST#7591557] MSG(high) Resent lost task h1_0692.60_S6Directed__S6CasAf40a_693.15Hz_328_3
2014-01-02 04:58:51.2753 [PID=24997] [debug]   [HOST#7591557] MSG(high) Resent lost task h1_0692.60_S6Directed__S6CasAf40a_693.15Hz_327_3
2014-01-02 04:58:51.2753 [PID=24997] [debug]   [HOST#7591557] MSG(high) Resent lost task h1_0692.60_S6Directed__S6CasAf40a_693.15Hz_326_3
2014-01-02 04:58:51.2753 [PID=24997]    Sending reply to [HOST#7591557]: 12 results, delay req 60.00

That was a batch of 12 resends. The scheduler does send in batches of 12 per scheduler request so if you still don't have the full complement, just 'update' the E@H project one more time and you will get a further batch. You should be able to see the record of the resending of the above 12 in your event log if you browse back through it. The actual tasks themselves should now be visible in BOINC Manager. One nice thing about it will be that those resent tasks will have a new 14 day deadline which will ease the pressure a bit :-).

Cheers,
Gary.

Mike.Gibson

Joined: 17 Dec 07

Posts: 21

Credit: 4474586

RAC: 176

Thanks a lot, Gary. That was

4 Jan 2014 0:44:41 UTC

Message 114378 in response to message 114377

(moderation:

)

Thanks a lot, Gary. That was all clear and understandable.

I have set all projects on both machines to No New Tasks. I will let the system work through the current work, manually intervening, if necessary, to make sure that tasks with the earliest deadlines are crunched in time. The system should take care of that but units have a habit of taking longer, or shorter, than forecast. Any that would not make the deadline will be aborted earlier, as I have done with some of the WCG units about 11 hours ago due to the latest batch taking 20% longer than the official forecast.

Once they are almost clear, I will follow your suggestion of setting a low cache, initially, and allowing it to creep up. Because there are 8 projects, I will start at 2+1, which should mean 384 hours on my 3770 plus increments of 192 hours, to allow a spread of downloads between projects.

I will leave the GPU issue until things have stabilised CPUwise.

Many thanks

Mike

Just added Einstein, but only getting CPU tasks?

Forums › Getting Started

RE: I am continuing to

Thanks a lot, Gary. That was

Comment viewing options

Forums › Getting Started