Bernd, Bruce, Gary, Mike, a question for you!

Mac-Nic
Mac-Nic
Joined: 25 Feb 05
Posts: 71
Credit: 547461
RAC: 0

RE: Any other way with

Message 40777 in response to message 40772

Quote:
Any other way with official 5.4.9?

Try Process Explorer from Sysinternals to set the affinity.
WWW.sysinternals.com

Best regards

"The FUTURE is only a PARTICLE away from the PRESENT and the PAST."

J Langley
J Langley
Joined: 30 Dec 05
Posts: 50
Credit: 58338
RAC: 0

RE: I really don't

Message 40778 in response to message 40767

Quote:

I really don't understand why people get upset with not seeing huge numbers of results flash past.

Well if you don't run your computer 24x7, and / or you run multiple projects, it's harder for slower computers to meet the deadlines for the long S5 WUs.

I understand that the load on the servers would rise if the WUs were shorter, and I see the need to avoid the 'curse of 32' again for fast dedicated crunchers, but I'd be happier if these long WUs were half their current size.

robert stone
robert stone
Joined: 19 Mar 05
Posts: 4
Credit: 16971
RAC: 0

I see a problem with resource

I see a problem with resource sharing between projects. I have several
Rosetta wu's that will push the Einstein wu (only have 1 with 35 hours projected to finish) past the deadline unless i manually suspend Einstein and let rosetta finish and hope the there is time for Einstein to finish, or flip it around and do the opposite and let Einstein finish first.

I suppose i can go tweak all my preferences on different projects, again.

These large WU's are going to create havoc with people running multiple projects.

Just my opinion

"Don't take life to serious .....after all none of us will make it out alive."

Joachim Schmidt
Joachim Schmidt
Joined: 19 Feb 05
Posts: 35
Credit: 391050
RAC: 0

then just set your "Connect

then just set your "Connect to network about every" to 0.1 days, so that you have a small cache and you won't have problems with deadlines.

greets

Odysseus
Odysseus
Joined: 17 Dec 05
Posts: 372
Credit: 20945996
RAC: 3897

RE: I see a problem with

Message 40781 in response to message 40779

Quote:
I see a problem with resource sharing between projects. I have several
Rosetta wu's that will push the Einstein wu (only have 1 with 35 hours projected to finish) past the deadline unless i manually suspend Einstein and let rosetta finish and hope the there is time for Einstein to finish, or flip it around and do the opposite and let Einstein finish first.


Your BOINC Manager should be able to deal with all that, without intervention: when it notices that a WU is at risk of missing the deadline, given the computer's up-time and the project's resource share, it will preëmpt (or refuse) other work to make sure the deadline is met. This may put it into ‘panic mode’ for a while, but once it's become accustomed to the larger WU sizes it’ll avoid overfilling its cache.

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5883
Credit: 119069164639
RAC: 24252443

RE: RE: ...I manage to

Message 40782 in response to message 40772

Quote:
Quote:
...I manage to keep 1 project on each virtual CPU most of the time....

Gary, how do you obtain this configuration? Which Manager do you use? AFAIK only TruXoft Manager can set the cpu affinity... Is this your case? Don't tell me that you manually suspend every n-1 wu in the cache... Any other way with official 5.4.9?

I use the official manager. What I do (with BOINC stopped) is edit client_state.xml to make both long term debts (LTD) to be zero and the short term debt (STD) for EAH to be +20 (seconds) and for Seti to be -20. STD controls what project will run if a decision has to be made. They have to balance to zero.

When BOINC is restarted, both threads will be EAH (+20) but after a very short time (probably 10 seconds per thread) the internally calculated values for STD will have reversed sign. At this point if I create a reason for the reschedule_cpus routine to be called, one of the EAH threads will preempt and a Seti thread will take over. Once this has been established, the machine seems quite happy to continue through multiple results for days at a time without losing the 1 project per cpu desirable goal. Also I have the preference for "Leave tasks in memory when preempted" set and this might be important to getting it to work. I haven't tried without this setting.

The key seems to be to get request_reschedule_cpus called at just the right time. I achieve this by picking the newest result in my cache and suspending and reenabling just that one result. This forces the rescheduling request without interfering with any of the actually running tasks. Most of the time the desired goal is achieved immediately but sometimes if I mistime it, I can catch it "on the way back" as the STDs are now heading in the other direction.

Someone mentioned using Process Explorer. The problem with this (I think) is that you would have to keep reapplying the affinity everytime a process exited and was respawned. With my way the affinity seems to survive quite happily through many consecutive results. The machine is right at my desk so I just check it once in the morning and once in the evening (if I even remember). At the moment (I've just checked it now) there is still one project per cpu and this has been going on for several days without interference from me. Of course it would be a bit tedious if the machine isn't running 24/7 :).

Quote:
Thanks for all the other answers!

You're welcome!!

Cheers,
Gary.

Nuadormrac
Nuadormrac
Joined: 9 Feb 05
Posts: 76
Credit: 229259947
RAC: 0

RE: RE: Luca, if you set

Message 40783 in response to message 40774

Quote:
Quote:
Luca,
if you set you ressource share to the same value for both projects (50/50) they will both run on one CPU each.
I had a server with 3 CPUs and had E@H at 66% and an other project at 33%. So I had 2 WUs for E@H and 1 WU of the other project running at the same time.
Udo

I'm not so lucky, Udo :-(
I've got 3 Eintein long wu and 14 Rosetta wu. The resurce share is 100 for both. The short and long debts are the same for both project... they are even... BUT... Rosy is running both task and Einstein is preempted...

The only thing that worked for me in the past was TruXoft tx36, but now I'm using official 5.4.9... and nothing to do!!

Thanks!

As a temporary fix, you could always set Rosseta to no new work and decrease it's time setting in preference... In this way, they'll clear out sooner, and it will get through these long E@H units. Afterwards, setting it back to normal, BOINC should have learned to predict the time to completion better, and not pull quite so much E@H work on the next connect...

Oh, and on the last Rossetta, reset the time setting back to your desired setting, so it estimates that value for the next download :D

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4343
Credit: 252801358
RAC: 41215

Back to the original

Back to the original question:

Roughly speaking anything you see in your cache (number of Tasks, movement etc.) would be the same in the database, multiplied by the number of users (or actually CPUs). If you cut the current WUs in a half, you have twice the number of results the database needs to keep track of. The database size is still our limiting factor, we're currently running a server with 24GB main memory and it's already tight.

BM

BM

Mike Hewson
Mike Hewson
Moderator
Joined: 1 Dec 05
Posts: 6592
Credit: 331812263
RAC: 314689

RE: we're currently running

Message 40785 in response to message 40784

Quote:
we're currently running a server with 24GB main memory


Pedal to the metal! Wowie! I want one ...... :-)
Cheers, Mike.

I have made this letter longer than usual because I lack the time to make it shorter ...

... and my other CPU is a Ryzen 5950X :-) Blaise Pascal

Ivan Minkov
Ivan Minkov
Joined: 1 Mar 05
Posts: 3
Credit: 228829
RAC: 0

RE: Which would have the

Message 40786 in response to message 40767

Quote:
Which would have the most value to you, (a) a 1Kg gold bar, or, (b) 100 x 10gm gold mini-bars

see the scenario:

you start counting your $1000 bucks,

1, 2, 3 ..... 55, 56, *telephone rings*, blah, blah, erm where was I, 55 !, 55, 56, 57, .... 88, .... *insert some other crap here*, 85, 86 ... DAMN I got that wrong, start over !

what you prefer getting BSOD on one 1hour WU, or getting it on 18h WU ...

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.