Looking for way to limit number of work units

Gary Roberts

Moderator

Joined: 9 Feb 05

Posts: 5872

Credit: 117880901471

RAC: 34718670

Ian&Steve C. wrote:... what I

20 Jan 2020 22:53:02 UTC

Message 175315 in response to message 175310

(moderation:

)

Ian&Steve C. wrote:

... what I want to happen is this: When SETI tasks are present, do NOT get any more Einstein work. If Einstein work is present from when SETI was down but then comes back up, at least finish the Einstein work that I already have before continuing to process the new SETI work (then stop requesting new work)

how do I do this?

With just the standard BOINC controls, I don't think you can. I think there are two possible ways to achieve what you want. The tedious manual method -OR- writing some sort of intelligence into a script that could do automatically, what you do manually.

I do that now for an entirely different purpose. I have a lot of hosts which (potentially) could all be downloading the same sort of data files on a per host basis. A script controls this and pre-deploys the data before allowing a host to request extra work. That way a given data file is only ever downloaded once.

For your situation, here is an outline of what might work. Firstly, find a reliable way of automatically polling seti (every x minutes - eg say 30-60mins) to know when it's up and has available work. With that information and with the work cache of your host set to something like 0.05days, perform the following steps.

While true do (ie. run this forever - perhaps until some externally set stop point is encountered)
Check seti for availability.

If seti is available:-

Set NNT for Einstein
Set work cache size to what you require (perhaps in steps rather than one big hit)
Monitor until downloads stop (cache is full)
Return work cache to 0.05 days
Unset NNT for Einstein

Else if seti is unavailable:-

Set work cache size to what Einstein should top up to if you had no seti work at all (default 0.05 days)
Return cache size to 0.05 if you used some higher value

End of decisions about work fetch.
Sleep for balance of the chosen time interval (check for an end signal)
Done.

The script would always be running (sleeping for the balance) and would wake up every x mins to poll seti. By using 0.05 days work cache, Einstein never has more than a very small number of tasks. As soon as you detect that Seti has work available (maximum delay is whatever you set for x) you can fill up as required with seti only. NNT guarantees this.

Of course, there is no guarantee that the few remaining Einstein tasks will be done first. Chances are you might need them for the next seti outage anyway, so why not keep them :-). BOINC will take care of them if they get anywhere near the deadline, so don't worry about those few left.

All of the actions to change work cache size and to set or unset work fetch are easily achieved using boinccmd to interact with the client. I notice you run Linux so doing the above with a bash script should be relatively straight forward to do. I've never needed to poll a project to see if it's up and has available work so you might need to experiment a bit with that part.

Cheers,
Gary.

Richard Haselgrove

Joined: 10 Dec 05

Posts: 2143

Credit: 2962952453

RAC: 697711

Ian&Steve C. wrote:bumping

22 Jan 2020 11:54:56 UTC

Message 175332 in response to message 175310

(moderation:

)

Ian&Steve C. wrote:

bumping this back up. I seem to be having the same problem.

what I want to happen is this: When SETI tasks are present, do NOT get any more Einstein work. If Einstein work is present from when SETI was down but then comes back up, at least finish the Einstein work that I already have before continuing to process the new SETI work (then stop requesting new work)

It might be worth doing some long-term monitoring via the BOINC Event Log on an affected computer (with <sched_op_debug> active). What are the events surrounding an Einstein work fetch? Might it happen if a SETI update request fails ('couldn't connect to server')? Or if the connection succeeds, but no work is available?

Ian&Steve C.

Joined: 19 Jan 20

Posts: 3979

Credit: 47392202642

RAC: 64884310

Gary Roberts wrote:Of course,

22 Jan 2020 20:59:25 UTC

Message 175343 in response to message 175315

(moderation:

)

Gary Roberts wrote:

Of course, there is no guarantee that the few remaining Einstein tasks will be done first. Chances are you might need them for the next seti outage anyway, so why not keep them :-). BOINC will take care of them if they get anywhere near the deadline, so don't worry about those few left.

you're probably right about keeping the leftover WUs. Though I do try to stash as many SETI WUs as possible for the planned outages. lately the outages have been quite long, and even a cache of 10,000 WUs doesnt last more than 12hrs on my fastest system. (10x RTX 2070s with a very fast CUDA app). Ideally I would have enough WUs banked to cover the weekly Tuesday downtime.

I'll look into making a script in the future. you're correct that it shouldnt be too difficult with boinccmd.

I also realized that I may be getting more WUs than normal due to my GPU spoofing in the custom BOINC client I have. it reports that I have 64 GPUs, for the purposes of being sent more WUs to cache (SETI currently limits a system to 150 WUs per GPU, and it used to be only 100 per GPU). Instead of flopping between different boinc clients, i'll just deal with it.

_________________________________________________________________________

Nick Name

Joined: 29 Dec 09

Posts: 5

Credit: 260287152

RAC: 20198

I can confirm this is not a

22 Jan 2020 22:55:25 UTC

Message 175351

(moderation:

)

I can confirm this is not a project problem. I've been using Einstein as a backup project, e.g. Resource Share = 0, on one of my hosts for a long time. It works exactly as expected. The specific setup is GPUGrid = 100, PrimeGrid = 0, Einstein = 0. PrimeGrid and Einstein alternate fairly reliably.

Ian&Steve C.

Joined: 19 Jan 20

Posts: 3979

Credit: 47392202642

RAC: 64884310

Nick Name wrote:I can confirm

22 Jan 2020 23:25:08 UTC

Message 175353 in response to message 175351

(moderation:

)

Nick Name wrote:

I can confirm this is not a project problem. I've been using Einstein as a backup project, e.g. Resource Share = 0, on one of my hosts for a long time. It works exactly as expected. The specific setup is GPUGrid = 100, PrimeGrid = 0, Einstein = 0. PrimeGrid and Einstein alternate fairly reliably.

out of curiosity. When you run out of work on GPU grid, how many Einstein WUs does your system download? Does it only download 1 and then get another when it’s done? Or does it download a handful?

_________________________________________________________________________

Richard Haselgrove

Joined: 10 Dec 05

Posts: 2143

Credit: 2962952453

RAC: 697711

Ian&Steve C. wrote:Nick Name

23 Jan 2020 10:32:59 UTC

Message 175367 in response to message 175353

(moderation:

)

Ian&Steve C. wrote:

Nick Name wrote:
I can confirm this is not a project problem. I've been using Einstein as a backup project, e.g. Resource Share = 0, on one of my hosts for a long time. It works exactly as expected. The specific setup is GPUGrid = 100, PrimeGrid = 0, Einstein = 0. PrimeGrid and Einstein alternate fairly reliably.

out of curiosity. When you run out of work on GPU grid, how many Einstein WUs does your system download? Does it only download 1 and then get another when it’s done? Or does it download a handful?

It should download one task for each idle resource. If you are running a spoofed client (I spoof 16 GPUs), every GPU counts as 'idle', so I would get 16 tasks.

I have seen problems where the **client** repeatedly requests the same work, without taking account of the work already downloaded following the previous successful request. That's why I say it's important to look at the client Event Log, to distinguish between a project problem (sending more than expected) and a client problem (asking too often).

I don't, personally, use the 'Resource Share 0' setting: I normally run Einstein tasks on a different resource (iGPU), so I have a normal cache setting for those. I'd need RS 0 for NVidia, RS 100 for Intel - but we can't do that.

Looking for way to limit number of work units

Forums › Problems and Bug Reports

Ian&Steve C. wrote:... what I

Ian&Steve C. wrote:bumping

Gary Roberts wrote:Of course,

I can confirm this is not a

Nick Name wrote:I can confirm

Ian&Steve C. wrote:Nick Name

Comment viewing options

Forums › Problems and Bug Reports