Workunit flooding - how to stop?

Magiceye04
Magiceye04
Joined: 18 Feb 06
Posts: 28
Credit: 780607988
RAC: 868168
Topic 225934

Hello,

on one of my PCs Einstein is flooding me with hundreds of work units.

It has an RX460 GPU which needs about 50 minutes per WU, only GPU is allowed.

https://einsteinathome.org/de/host/12523728

The expected run time is realistic shown by the Boinc Manager (50 minutes).

The buffer is set to 0,01 days + 0

Every 60 seconds the boinc manager is requesting new work units and also gets exactly 1 new WU.

I have now more then 500, counting up endless.

The duration correction factor in the client_state.xml was about 2.08... - I have lowered this to 0.99.

 

Why is Einstein asking for more and more and more work?

And why does the server deliver more and more?

How can I stop this? (not by pressing "no new work" - I want a realistic number of WUs as buffer = 2 or 3)

 

Best Regards

MagicEye

Harri Liljeroos
Harri Liljeroos
Joined: 10 Dec 05
Posts: 3609
Credit: 2901608986
RAC: 1033393

Do you have an app_config.xml

Do you have an app_config.xml file for Einstein where you limit the max_concurrent number of workunits? That would cause this because of a bug in Boinc client which makes it request new tasks over and over again.

Magiceye04
Magiceye04
Joined: 18 Feb 06
Posts: 28
Credit: 780607988
RAC: 868168

Yes, that was the

Yes, that was the reason.

Thank you!

I have used this to crunch one Einstein and one WCG-OPN WU in parallel.

Harri Liljeroos
Harri Liljeroos
Joined: 10 Dec 05
Posts: 3609
Credit: 2901608986
RAC: 1033393

If you use only your GPU for

If you use only your GPU for Einstein you can use the project_max_concurrent tag in app_config.xml to limit the number of tasks. Note that it goes to a different place in that file:  https://boinc.berkeley.edu/wiki/Client_configuration#Project-level_configuration

Eugene Stemple
Eugene Stemple
Joined: 9 Feb 11
Posts: 58
Credit: 262843410
RAC: 600390

@magiceye an optional fix

@magiceye

an optional fix is to revert to boinc-manager 7.14 .  That version was running fine for me with the app_config line for <project_max_concurrent> parameter.  I am running a mix of cpu and gpu tasks, which is known to be badly managed with regard to cache size, but with limits of 1 + 0.1 it was not overcommiting the cpu loading.  At least not past the typical 14-day deadlines. 

Recently upgraded the Linux system to Debian 11.0.0 and thought I might as well upgrade to the 7.16 boinc packages.  BAD IDEA...   I had drained the cache before the transition and (foolishly) resumed E@H late at night expecting it to refill the cache and resume normal operation.  The next morning I had 1000 tasks downloaded!  It had hit the 512 workunit limit before midnight and then got another 512 the "next day."  Looking at the event log, it was fetching 3 or 4 work units every (60-second) cycle with total disregard of the cache limit.  The thought occurred to me that the new 7.16 had no run-time history to base its estimates on; however, after letting it run for two days I tried enabling new tasks and it immediately downloaded 4 more cpu and 20 gpu tasks.  (Those for the gpu were expected as all gpu tasks in the cache had been completed.)   I've switched back to boinc 7.14 and now when I do a work fetch, to get more gpu work, it does NOT fetch any cpu work.  Alas, I'll have to abort a big bunch of cpu work as there's no way they'll get done before the deadline.

OT - the 7.16 boinc-manager is missing the "shut down connected client" control option.  Not a deal breaker but just inconvenient to close boinc gracefully for a system upgrade or such.

 

Harri Liljeroos
Harri Liljeroos
Joined: 10 Dec 05
Posts: 3609
Credit: 2901608986
RAC: 1033393

The problem is with

The problem is with <max_concurrent> tag, the <project_max_concurrent> should work OK.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.