Managing the "typical" weekend outage

Tom M

Joined: 2 Feb 06

Posts: 6884

Credit: 9799422606

RAC: 3541825

17 Jan 2021 16:04:46 UTC

Topic 224527

(moderation:

)

I have two systems as of the moment that typically runs out of gamma-ray gpu tasks to process during the "regular" weekend upload outage.

This one and that one.

How much more downloads should I set and how high a fake cpu should I set to keep enough gpu tasks available to crunch while I am waiting for the upload log jam to clear.

Currently, both are set at store at least 0.1 days and store at least an additional 0.1

These systems process a GR task about every 16 minutes. The first system has 1 gpu and a 32 thread cpu. The 2nd has 4 gpus and a 16 thread cpu.

Any guidance?

I know we have a hard limit on the total # of gpu tasks unless we fake a higher # of cpus. So what are some ideas on what I should set everything for?

Tom M

A Proud member of the O.F.A. (Old Farts Association).

archae86

Joined: 6 Dec 05

Posts: 3165

Credit: 7395381687

RAC: 1983332

The advice to run super-low

17 Jan 2021 16:33:43 UTC

Message 182502

(moderation:

)

The advice to run super-low cache settings is apt when a person is operating in a way which gives rise to big fluctuations.

1. running more than one task type (say GW GPU plus GR GPU)
2. changing system configuration (say going from 1X to 2X, or adding cards, or substracting cards)

GR here, in particular, has very stable execution times, so one can configure a cache setting to give, for example, a couple of days of work and expect not to have wild excursions.

So when (if) you stabilize those systems as to configuration and commit them to running GR GPU only, then I think you might reasonably creep up your cache settings until you are really getting about 2 days in stock. This may nor may not occur at a setting of 2 days. So, adjust and observe.

An obstacle for super-productive systems, which may hit you, but does not limit most, is that there is a hard-wired limit that causes your BOINC not to request additional work if the tasks already in stock exceed 1000. If this is less than 2 days for you, you'll see a bit of odd behavior as it goes above and back down to that threshold.

You may not need to fake extra CPU cores at all. Doing that is a way to get more tasks per day, but the allocations when applied to machines like yours are probably plenty generous.

Now, people with more than one VII card and few real cores do have a good reason to push up their daily quota.

San-Fernando-Valley

Joined: 16 Mar 16

Posts: 561

Credit: 10930529501

RAC: 15975585

My solution: As soon as

17 Jan 2021 16:42:02 UTC

Message 182503

(moderation:

)

My solution:

As soon as the Sunday uploading stops, I have time to go for an extended and excellent walk in the snow covered nature at the back of house up the hills, past fields and forests!

Especially since I don't see any GPUs or tasks in that area.

Great for my nerves and my unimportant and trivial PCs are happy to be able to RELAX.

Have a nice week!

archae86

Joined: 6 Dec 05

Posts: 3165

Credit: 7395381687

RAC: 1983332

The resource-dependent daily

17 Jan 2021 17:14:02 UTC

Message 182504 in response to message 182503

(moderation:

)

The resource-dependent daily task quota allows 32 per available CPU and 256 per available GPU.

Falsified higher CPU counts do increase quota up to some maximum, which I don't currently have handy but think may be 64 CPUs.

Restriction of the number of CPUs BOINC is allowed to assume it can use by the "Use at most" setting in Computing preferences cuts the number proportionally. I don't know how quota responds to limitation by other available means.

Since each GPU adds 256 tasks/day to the quota, then only GPU/task type combinations for which the GPU on average completes more than one task every 338 seconds need help to sustain daily nutrition. 570/580 cards don't get close enough to be any problem.

But VII cards do, and also some of the other highly capable types. But many people running those cards run high core-count CPUs, so the 32/CPU term gets them enough work anyway.

Ian&Steve C.

Joined: 19 Jan 20

Posts: 4155

Credit: 50091971553

RAC: 42331774

when the GR upload problems

17 Jan 2021 17:33:03 UTC

Message 182505

(moderation:

)

when the GR upload problems start, just flip over to GW which doesnt have the problem.

_________________________________________________________________________

Erich56

Joined: 16 Dec 15

Posts: 3

Credit: 158923838

RAC: 0

Basically, to me the question

17 Jan 2021 18:23:14 UTC

Message 182507

(moderation:

)

Basically, to me the question rather is: why are no steps being taken server-side to avoid this upload-jam every weekend?

DanNeely

Joined: 4 Sep 05

Posts: 1364

Credit: 3603329223

RAC: 765869

Erich56 wrote: Basically, to

17 Jan 2021 20:51:38 UTC

Message 182508 in response to message 182507

(moderation:

)

Erich56 wrote:

Basically, to me the question rather is: why are no steps being taken server-side to avoid this upload-jam every weekend?

You're looking at it backwards from the admins. The problem is that uploads + weekly server maintenance (IIRC a backup is the big thing adding extra IO); overloads the servers. Pausing uploads during the maintenance window is the free way to address the problem. The other option involves spending a €lots (probably upwards of €10k, and wouldn't surprise if me several times that) for a bigger server. It's possible that they'll size a new server big enough for the load at the next planned hardware refresh; shelling out that kind of money for an unplanned upgrade when there's a work around isn't going to happen in 99% of cases.

MAGIC Quantum M...

Joined: 18 Jan 05

Posts: 1964

Credit: 1529732534

RAC: 1837423

It looks like we are able to

17 Jan 2021 20:55:16 UTC

Message 182511

(moderation:

)

It looks like we are able to send the finished work in again.

Tom M

Joined: 2 Feb 06

Posts: 6884

Credit: 9799422606

RAC: 3541825

San-Fernando-Valley

17 Jan 2021 21:54:53 UTC

Message 182516 in response to message 182503

(moderation:

)

San-Fernando-Valley wrote:

My solution:

As soon as the Sunday uploading stops, I have time to go for an extended and excellent walk in the snow-covered nature at the back of the house up the hills, past fields, and forests!

Superb. TY.

A Proud member of the O.F.A. (Old Farts Association).

Tom M

Joined: 2 Feb 06

Posts: 6884

Credit: 9799422606

RAC: 3541825

MAGIC Quantum Mechanic

17 Jan 2021 21:55:55 UTC

Message 182517 in response to message 182511

(moderation:

)

MAGIC Quantum Mechanic wrote:

It looks like we are able to send the finished work in again.

It did some on each machine and then got jammed up into "backoff" land again.

Tom M

A Proud member of the O.F.A. (Old Farts Association).

Tom M

Joined: 2 Feb 06

Posts: 6884

Credit: 9799422606

RAC: 3541825

archae86 wrote:Since each GPU

17 Jan 2021 22:02:26 UTC

Message 182518 in response to message 182504

(moderation:

)

archae86 wrote:

Since each GPU adds 256 tasks/day to the quota, then only GPU/task type combinations for which the GPU on average completes more than one task every 338 seconds need help to sustain daily nutrition. 570/580 cards don't get close enough to be any problem.

338 / 60 = 5.633 minutes.

My 5700's run north of 6 minutes on a single task per GPU.

And even though one of the machines is crunching significant GW CPU tasks, those are leftovers. Right now both machines are on GR only GPU profiles.

It looks like I can safely switch to 2 days and 0.25 addons. I have done so. Now all I have to do is wait for the backlog to clear and then see what next weekend brings.

Tom M

A Proud member of the O.F.A. (Old Farts Association).

Managing the "typical" weekend outage

Forums › Cruncher's Corner

Comment viewing options

Forums › Cruncher's Corner