different cache values for different projects

gravywavy
gravywavy
Joined: 22 Jan 05
Posts: 392
Credit: 68,962
RAC: 0
Topic 188747

hi,

several people have complained about the 7-day deadline on E@H, compared to the longer deadlines on some other projects.

I agree with the responses that point out that we are free to join E@H or not. If you don't like the deadline you can go elsewhere.

But there is another issue too. With different deadlines there is a need for different cache sizes for different projects.

This need is increased by having different policies/bugs on what the connect interval means (E@H systematiucally gets 30% to 50% more work than other projects with the same settings).

It is great to have a user-friendly, set it once, value for how frequently you want to connect to the internet. It is a useful default to have the projects use that to guess cache sizes, but I wish we could tweak the guesses for E@H independently of those for other projects.

I guess that this argumant might also apply to some other user preferences as well, it just seems so much more necessary for cache sizes in the current circumstances.

~~gravywavy

John McLeod VII
John McLeod VII
Moderator
Joined: 10 Nov 04
Posts: 547
Credit: 632,255
RAC: 0

different cache values for different projects

> hi,
>
> several people have complained about the 7-day deadline on E@H, compared to
> the longer deadlines on some other projects.
>
> I agree with the responses that point out that we are free to join E@H or not.
> If you don't like the deadline you can go elsewhere.
>
> But there is another issue too. With different deadlines there is a need for
> different cache sizes for different projects.
>
> This need is increased by having different policies/bugs on what the connect
> interval means (E@H systematiucally gets 30% to 50% more work than other
> projects with the same settings).
>
> It is great to have a user-friendly, set it once, value for how frequently
> you want to connect to the internet. It is a useful default to have the
> projects use that to guess cache sizes, but I wish we could tweak the guesses
> for E@H independently of those for other projects.
>
> I guess that this argumant might also apply to some other user preferences as
> well, it just seems so much more necessary for cache sizes in the current
> circumstances.
>
First, at the moment, the amount of work delivered is based on the resource fraction for that project (it is sent to the server that does the calculation).

However, to get the CPU and download schedulers to work with all of the clients, these are probably going to be rewritten.

There is only one work queue, but each project should only download work such that it will get done by the deadline regardless of the max queue. If you are attached to several projects and have a reliable internet connection, there should be no need to keep a queue. I have mine set to 0.1 day, and on all machines that are attached to 3 or more projects, I have never run out of work to download from some project even though specific projects have been offline for months at a time. On the other hand, if you have an intermittent (usually dialup) connection, it would be wise to keep enough work on hand to get through between connections. If this conflicts with a projects deadlines, you may be better advised not to run that project on that machine.

Ben Christy
Ben Christy
Joined: 6 Mar 05
Posts: 40
Credit: 20,891
RAC: 0

This brings to mind the

This brings to mind the question "How big a cache?" I originally set it to 10 GB (its same amount I gave to windows for virtual memory) but Boinc says its only using 32 MB.... I can spare the memory for now and the foreseeable future but I reluctantly reduced it to one Gig.

Will it ever be used up?

==========================================
a Chicago user who likes to be usefull

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5,206
Credit: 43,310,390,224
RAC: 44,777,561

Gravywavy said with great

Gravywavy said with great eloquence:-

> With different deadlines there is a need for different cache sizes for different projects.

I agree with all your points absolutely.

I'd like to add a few comments of my own. The "connect to network" interval is a valiant attempt to automate the process of managing work for multiple projects that hasn't quite got it right yet. It probably never will because the "one size fits all" principle is just hopelessly wrong in this type of situation where there are just too many variables.

It seems to me that we need to keep the "automated" procedure using the default interval of 0.1 days (which should be prevented from being set higher than about 1.0 initially) just for newbies. This would only apply until a couple of successful results have been returned because some of the problems simply stem from inexperience.

When the user is deemed to have enough experience the system should allow that value to be set like it currently can be, although I question the need for any project to have that value go as high as 10.0. However, on top of the "automated" system there should be a "greyed-out" preference that can be selected once a total score of say 2000-5000 or so has been reached. That preference would then be available and it would allow the user (if so desired) to set a per machine and per project queue length.

For example, lets say I have a box that is capable of 4 E@H WUs/day or 8 S@H WUs/day. Let's say I choose 50/50 sharing but because I fear that S@H might be more unreliable, I set my E@H queue to 3 and my S@H queue to 16. In effect I'm making a judgement that I'm happy with one days' supply for E@H but I want 4 days' supply for S@H. Wouldn't this make many of the "tweakers" amongst us very happy indeed? Of course if you didn't want that level of complexity, you simply say no to that preference and stay with the current "one size fits all".

The other type of user that needs a system like this is the person that really only wants to support one project but doesn't want to have downtime if that project is down and we all know what that project is don't we :). They get screwed because they need enough cache for the unreliable project which then gives them far too much cache for the backup project. E@H would get a large number of "part-time" crunchers if you could set a queue length of 1 for E@H and a queue length of 20 for S@H. Every so often (like once every six days :).) an E@H unit would get completed and a new one downloaded by a large number of people, and then as they discovered how good the E@H project really was, they would start allocating less to S@H and more to E@H.

I know this is all a Boinc issue but the opportunity to pontificate came up here so my thoughts are here. I'd be interested in what others think. I'm not privy to the musings of the developers and realistically, I guess this has all been gone over previously, many times. However can it really be that impossible to give an experienced user the ability to say that "I want X WUs for project A and Y WUs for project B, for this particular CPU", etc???

Cheers,
Gary.

gravywavy
gravywavy
Joined: 22 Jan 05
Posts: 392
Credit: 68,962
RAC: 0

hi Gary, thanks for the

Message 9469 in response to message 9468

hi Gary,

thanks for the compliments!

> The "connect to network" interval
> is a valiant attempt to automate the process of managing work for multiple
> projects that hasn't quite got it right yet. It probably never will because
> the "one size fits all" principle is just hopelessly wrong in this type of
> situation where there are just too many variables.

Thinking of how many variables there are makes me wonder about how many variable there are even within a single project. For a one-parameter setting the BOINC approach is about as good as it can get on a single project, so I am not making a complaint here. I do think that to add a second variable, even within a single project, could add a useful amount of extra control without adding significantly to the code.

I'd wish for a setting for the minimum amount of work queued locally (the "float") that could be varied independently of the time between connections. At present both values are taken to be the same, so that (confusingly) when you first start, if you ask to connect every 1.3 days, the client asks for 2.6 days worth of work (ie enough to make sure that in 1.3 days you will still have 1.3 days work left).

I can imagine users wanting to set a higher or lower float than the connect interval. On a horrendously unreliable connection (like mine) I'd like a float of 2.75 days, but never to hold more than 4 days data for fear of timeout. Another user might want to cut down connections and go for a huge difference between max and min (say min 0.5 and max 6.0, meaning they'd typically connect every 5.5 days)

The prefs could ask for min and max length of queue, in days. IMHO this would be more meaningful to the typical user than the current value. The default for the max would be 2*min. Typical connection intervals would be max-min days. The current defaults the values would be implemented by min=0.1 and max=0.2.

Alternatively, the prefs could ask for the min and for the interval beween connections, and calculate the max as min+interval, and the current defaults implemented by min=0.1 and interval=0.1. This might offer an easier migration from the current values. Everyone would be given a min = their current interval, and unless they adjusted the settings they would notice no difference.

The software would now work by asking to top up the queue to the max figure every time it fell below the min. This change to the code is probably trivial, as the current algorith seems to be to top up to 2*interval every time it falls below 1*interval.

> It seems to me that we need to keep the "automated" procedure using the
> default interval of 0.1 days (which should be prevented from being set higher
> than about 1.0 initially) just for newbies.

agree totally; not just because of newbies, but because there are users who never adjust anything once they have done the bear minimum to get it working at all. The defaults have to work for them too.

~~gravywavy

Darren
Darren
Joined: 18 Jan 05
Posts: 94
Credit: 69,632
RAC: 0

> The prefs could ask for min

Message 9470 in response to message 9469

> The prefs could ask for min and max length of queue, in days. IMHO this
> would be more meaningful to the typical user than the current value. The
> default for the max would be 2*min. Typical connection intervals would be
> max-min days. The current defaults the values would be implemented by min=0.1
> and max=0.2.

For a very long time in the early days, this is how it was done. Instead of the "connect every..." type setting it has now, the setting was 2-part to cache "no more than xx days of work" and another setting to define the "no less than xx days of work" variable. For reasons I never fully understood, this caused a huge number of users major problems.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.