CPU tasks.

Matt White
Matt White
Joined: 9 Jul 19
Posts: 120
Credit: 280798376
RAC: 0

Matt White wrote: Gary

Matt White wrote:

Gary Roberts wrote:

Once again, I've had a look at that same machine and can see tasks for different current data files haIn this message earlier in this very thread, I showed the big effect on crunch time that a change in the data file can make.  I showed actual times for tasks using different data files that were crunched on a single machine with no hardware changes.ving a similar ratio of 3 to 1 in the crunch time.  Tasks that have LATeah1043F or LATeah1044F in their name crunch in around 5.5 hours.  Tasks with 1042F or 1045F or 1038F take around 16.5 hours.

At any rate, I believe Gary is correct, in order to compare crunch differences, one must use files from the same family, in other words, compare the results of a 1045F without wisdom, to a 1045F using wisdom.

I'm going to run this test myself. I will post my findings later. :)

Clear skies,
Matt
Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5850
Credit: 110037647341
RAC: 22393229

Matt White wrote:... I

Matt White wrote:
... I suspect the difference between Gary's numbers and mine on the 1045F is caused by the processors used.

Yes, exactly.

In the message pointed to by the first link in my post, I did describe the machine as an old quad core iMac owned by my daughter.  It's actually around 2011 vintage.  iMacs are slow anyway and old iMacs are very slow so I wasn't trying to claim any sort of performance crown :-).  I was trying to point out the 3:1 ratio in overall times due to different data files.  There's something wrong if a more modern machine can't well and truly beat those numbers.  Independent of the age of the machine, there should still be a similar 3:1 ratio with different data.

Tom needs to stop blaming his wisdom file or his CPU for that big change.  I'd be surprised if a wisdom file could give more than something like a 10% - 15% improvement, if that.  Years ago, when the project first made the app for creating wisdom files available, I was running lots of CPU tasks and I spent time creating these files for a range of different architectures.  If I remember correctly, the average improvement I saw was around 3-5% or something like that.

Cheers,
Gary.

Matt White
Matt White
Joined: 9 Jul 19
Posts: 120
Credit: 280798376
RAC: 0

Gary Roberts wrote: Matt

Gary Roberts wrote:

Matt White wrote:
... I suspect the difference between Gary's numbers and mine on the 1045F is caused by the processors used.

Years ago, when the project first made the app for creating wisdom files available, I was running lots of CPU tasks and I spent time creating these files for a range of different architectures.  If I remember correctly, the average improvement I saw was around 3-5% or something like that.

I had about a dozen 1045F tasks which came in around 12 hours and 2 minutes for an average, and I'm using those as a baseline. A 1045F task with wisdom just finished in 10 hours and 32 minutes. Before I call it, I'd like to see the data from a few more tasks. By tomorrow evening I should have enough data to draw a conclusion on how much time wisdom files save on my setup. I would imagine other's mileage may vary. :)

 

Clear skies,
Matt
Tom M
Tom M
Joined: 2 Feb 06
Posts: 5662
Credit: 7742701202
RAC: 2458157

Gary Roberts wrote:Tom M

Gary Roberts wrote:

Tom M wrote:

The Wisdom file trick strikes again.

I had some Gamma Ray #5 ...

What you are seeing is absolutely nothing to do with wisdom files.  Changes of this magnitude are also not being caused by a change in CPU.  It's just primarily from changes in the data file being analyzed.

Gary,

I have read the posts you have made.  I am not ignoring them. I am not talking about them either which may be why you think I was ignoring them.

I appreciate your thoroughness and postings on variations in CPU processing time.  And I clearly over stated "The Wisdom file trick strikes again" phrase. 

I am reasonably confident the Wisdom file can account for maybe a 10% variation in my baselines.  And I didn't take that into account when I posted what you are reacting to.

It is clear that the differences in data have a massive effect on processing time.  I have NOT been tracking the specific tasks.

For some reason my 3950x box with Amd specific ram has been processing more than twice as fast as my 2700/3900x box.  Since prices are pretty reasonable I am going to drop that variable out of the mix by matching ram in both boxes (already have same MB). 

That leaves the variables of task Data, OS, OS compilers, cpu ipc/architecture differences between the 3950x and the 2700/3900x in play.

I have been assuming a "random" selection of tasks.  What you are telling me is the tasks are distributed in a much more "clumpy" way.  Which would skew my observed results :(

Your analysis of the tasks makes it clear that if I wanted to get a more specific estimate I would need to match specific task series id #'s for both machines.

I apologize for giving you the impression I was ignoring your posts.

Tom M

ps. Once I have the faster ram and later the cpu in place I will follow the protocol (bump the cache size) to get some comparable "before/after" results tasks so I can "see" differences.

 

 

A Proud member of the O.F.A.  (Old Farts Association).  Be well, do good work, and keep in touch.® (Garrison Keillor)

Tom M
Tom M
Joined: 2 Feb 06
Posts: 5662
Credit: 7742701202
RAC: 2458157

Matt wrote:I would imagine

Matt wrote:
I would imagine other's mileage may vary. :)

That is for sure :)

Your preliminary numbers would tend to support around a ~10% improvement by using the Wisdom file "tune up".

I look forward to your next results.

Tom M

 

 

A Proud member of the O.F.A.  (Old Farts Association).  Be well, do good work, and keep in touch.® (Garrison Keillor)

Matt White
Matt White
Joined: 9 Jul 19
Posts: 120
Credit: 280798376
RAC: 0

So, it looks like Einstein is

So, it looks like Einstein is handing out 1041F and 1046F now. However, I was able to get numbers on four more 1045F tasks which vary between 10 hours and 48 minutes, to 10 hours and 50 minutes. Based on this small sample, wisdom saves about an hour and ten minutes of crunch time for the same power consumption figures. Rounding to the nearest minute, and averaging, I'm seeing about a 10 percent reduction in crunch time. I would prefer to have a larger size sample, but these numbers don't seem unreasonable. Again. others might see different numbers based on their configurations and CPUs.

Clear skies,
Matt
Tom M
Tom M
Joined: 2 Feb 06
Posts: 5662
Credit: 7742701202
RAC: 2458157

I just got: "....Fri 24 Jul

I just got: "....Fri 24 Jul 2020 01:38:27 PM CDT | Einstein@Home | (reached daily quota of 1408 tasks)...."

On this box.

I still have cpu tasks.  It is beginning to seem like I have to turn off the cpu tasks to keep the gpu tasks coming?

Or over ride the # of cpu threads available? How?  I don't see a parameter in the cc_config.xml that seems to apply.

Tom M

A Proud member of the O.F.A.  (Old Farts Association).  Be well, do good work, and keep in touch.® (Garrison Keillor)

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5850
Credit: 110037647341
RAC: 22393229

Tom M wrote:I just got:

Tom M wrote:

I just got: "....Fri 24 Jul 2020 01:38:27 PM CDT | Einstein@Home | (reached daily quota of 1408 tasks)...."

On this box.

The tasks list for that box (at the time I looked) had 8147 entries of which 6110 show as 'error' - that's over 300 pages of errors.  I looked at a few of the most recent pages and all I saw was wall-to-wall aborts.  Aborting a task is regarded as an error.  Your daily allowance is reduced according to the number of 'errors'.  It gets increased again by returning completed tasks.

With so many recent 'errors' it's a wonder the server will send you any tasks at all.

Tom M wrote:
I still have cpu tasks.  It is beginning to seem like I have to turn off the cpu tasks to keep the gpu tasks coming?

No you don't have to do that.  If you allow the CPU tasks to crunch and be returned without any more aborts, your full allowance will be restored and you will start getting GPU tasks.  The best way to prevent excess CPU tasks is to set a small cache size and then you can stop aborting.

Your current problem is quite likely due to excessive cache size for the 'current conditions', coupled with your propensity to abort CPU tasks as a 'solution'.  Aborting tasks while the cache size is high might temporarily solve the issue but it's bound to return again at some point unless you lower your cache size.  This will be particularly nasty if the client doesn't 'know' the true crunch time for the new tasks it is requesting.  Below is a simple calculation to show what might have happened to create your current 'problem'.  This is just conjecture because you don't give any details - even basic ones like what size work cache are you setting and are you fiddling that up and down?

As you should now be aware, there are GRP CPU tasks that have crunch times in a 3:1 ratio, depending on which particular data file is in play.  Let's assume a 5 day cache size and you are crunching the 'fast' tasks.  You have lots of tasks (5 days worth) but no problem because the deadline is 14 days.

A change in data file happens and you start getting 'slow' tasks.  Your BOINC client cannot know about this for another 5 days when the first of the 'slow' tasks reaches the top of the queue and gets crunched.  The outcome of the first slow task being crunched will be a three times longer estimate for all tasks on board and this will send the client into a panic since the new three times longer estimate will now represent 15 days worth of work.  The client will think this even if you only got a couple of slow tasks and then started getting fast tasks again.  One slow task (even a resend) is enough to send the estimate through the roof.  You need a large number of fast tasks to bring that estimate back down again.

Another factor adding to this is that GPU tasks crunch faster than the estimate.  With a single duration correction factor (DCF), GPU tasks will cause the estimates for CPU tasks to be lowered and your client will request more CPU work.

You can easily counteract all of this by keeping your cache size low.  If you set 1 day maximum and let BOINC control things, you will never have excess CPU tasks and you will never need to abort any of them.

Tom M wrote:
Or over ride the # of cpu threads available? How?  I don't see a parameter in the cc_config.xml that seems to apply.

You can set a compute preference (either on the website or locally in BOINC Manager) that says, "use at most 'blank'% of the processors" and set 'blank' to be whatever percentage you want.  There are other more complicated ways by using app_config.xml files (not cc_config.xml).  You have 32 threads.  If you wanted BOINC to use half of those, just set 50%.

Changing that preference will not solve the current problem.  If you set 50%, BOINC would only download half the number of CPU tasks and with half the threads able to crunch them, what can possibly change?  Absolutely nothing!  BOINC will still get into the same panic if fast crunching tasks change to slow and, many days later, BOINC suddenly becomes aware of the change.  If you had a 1 day cache setting, the tasks on hand could only ever grow to 3 days worth, due to the 3:1 crunch time ratio and BOINC would easily cope with that.

You just need to set a suitable cache size for your system and the conditions under which you are running it.

Cheers,
Gary.

Tom M
Tom M
Joined: 2 Feb 06
Posts: 5662
Credit: 7742701202
RAC: 2458157

Gary Roberts wrote:You just

Gary Roberts wrote:
You just need to set a suitable cache size for your system and the conditions under which you are running it.

I have been running cache sizes from 0.1 to 0.25 the entire time I have been getting massive amounts of cpu tasks.

I still have gotten "over subscribed" with cpu tasks.

The back off has stopped.  I am processing gpu tasks again.

And is you and others noted the processing time for cpu tasks has been varying by file name series consistently.  eg. the shorty tasks have been sharing the same file name groupings.

Thank you for your patience Gary.

Tom M

 

A Proud member of the O.F.A.  (Old Farts Association).  Be well, do good work, and keep in touch.® (Garrison Keillor)

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5850
Credit: 110037647341
RAC: 22393229

Tom M wrote:I have been

Tom M wrote:
I have been running cache sizes from 0.1 to 0.25 the entire time ...

I'm sorry, but you can't have been.

Your tasks list on the website shows CPU tasks take between about 3 to around 9 hours or more on your machine - the 3:1 ratio, no doubt, for different data files.  Let's assume BOINC thinks that every task will take 3 hours.  A 0.25 day cache for 32 cores is just 64 tasks for tasks taking 3 hrs.  It's even less if a task takes longer.  Currently you have around 230 CPU tasks, so how do you explain that?  And that's not counting the more than 6000 error tasks.  Your client must have asked for those somehow.

So what is BOINC currently estimating that a CPU task will take??  To justify 230 tasks for a 0.25 day cache, BOINC would have to be thinking that a task will take around 50 mins.  It would be even less if the client knew it wasn't using the full 32 cores.  You have 2 GPUs so surely the client knows it can't use the full 32 cores??

The project doesn't send work unless the client asks for it.  To get the 6000 odd tasks that show as errors - many of which you downloaded and aborted, either you had a large cache size or the client had an incredibly low estimate of how long a CPU task would take to crunch.  You need to pay more attention to what is going on since nobody else can see this sort of information that you alone see.

Cheers,
Gary.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.