Latest data file for FGRPB1G GPU tasks

archae86

Joined: 6 Dec 05

Posts: 3157

Credit: 7228964896

RAC: 1134335

One element of the old

18 Dec 2018 14:50:07 UTC

Message 168333 in response to message 168313

(moderation:

)

One element of the old pattern does seem to be showing itself. The field after the data file in the task names for currently issuing tasks has moved up at an accelerating rate.

Tasks for which that value was 20 issued over more than a six hour period, but a big gulp my primary host took about an hour ago (when the 1000 runnable tasks limit stopped forbidding my host from making a request) got a mix of 60, 84, 108, and 124. If the old pattern forecasts behavior this time, this will soon get to values over 200, and then the rate of rise will progressively slow.

If another element of the old pattern is in effect, then even the 124 units will have appreciably longer run times than the 20 units we established the 1/4 relationship on. But not nearly enough to get us back to comparable times from the 2002 tasks.

I was unaware of the 1000 runnable tasks limit until I ran into it late yesterday. If I understand the behavior I've seen and a snippit of code I found with a Google search, this limit stops BOINC on my host from asking for any new work if the current runnable task count is over 1000. However once the runnable task count is 999, this particular limitation has no effect, so an individual request can ask for a lot of work, and bounce the count on the host well over 1000 if the project servers happen to be able to grant a large number of tasks to a single request at that moment. (mine got well over 100 from a single request this morning). This limit is, I think, entirely separate from daily fetch quotas and such.

Richard Haselgrove

Joined: 10 Dec 05

Posts: 2143

Credit: 2960599355

RAC: 706254

archae86 wrote:I was unaware

18 Dec 2018 16:00:21 UTC

Message 168335 in response to message 168333

(moderation:

)

archae86 wrote:

I was unaware of the 1000 runnable tasks limit ...

That sounds like the 'maximum tasks in progress' limit I wrote about in Gary's "quota" thread, and also in Technical News. Under current BOINC Central code, it doesn't stop the client asking for new work, but it stops the server sending it. So, if the limit is 1,000, and you ask for more work when you have 1,000 tasks on board, you get nothing back except a little note in the Event Log saying "This computer has reached a limit on tasks in progress". If you have 999 tasks on board, you get at most one new task, and no note.

As we know, we run older server code here, so we may not have all those tools at our disposal. The new code, for example, is capable of keeping separate limits for each of CPUs and GPUs, and can even set separate limits by app. That would be a much better solution for the 2003L problem than a daily limit which could be used up part-way through the 24 hours, and leave resources idle until time was up.

archae86

Joined: 6 Dec 05

Posts: 3157

Credit: 7228964896

RAC: 1134335

Richard Haselgrove

18 Dec 2018 16:11:15 UTC

Message 168336 in response to message 168335

(moderation:

)

Richard Haselgrove wrote:

archae86 wrote:
I was unaware of the 1000 runnable tasks limit ...
That sounds like the 'maximum tasks in progress' limit I wrote about in Gary's "quota" thread, and also in Technical News. Under current BOINC Central code, it doesn't stop the client asking for new work, but it stops the server sending it. So, if the limit is 1,000, and you ask for more work when you have 1,000 tasks on board, you get nothing back except a little note in the Event Log saying "This computer has reached a limit on tasks in progress". If you have 999 tasks on board, you get at most one new task, and no note.

Richard, possibly this is something different, as it is clearly a limitation on "asking for new work".

The behavior and the messages I've seen in this case correspond to this bit of documentation:

https://gitlab.aei.uni-hannover.de/einsteinathome/boinc/commit/4d47e2f170ae638a0121c4a31cc4a9f54a75848a

Richard Haselgrove

Joined: 10 Dec 05

Posts: 2143

Credit: 2960599355

RAC: 706254

I was beginning to worry

18 Dec 2018 16:25:01 UTC

Message 168338 in response to message 168336

(moderation:

)

I was beginning to worry about that, as soon as I'd written it! That's the problem with modern software, there are so many ways of achieving the same thing.

The problem with the one you found is that it doesn't distinguish between CPU and GPU tasks, so somebody could end up with 1,000+ FGRPB1G tasks, and find it hard to fetch CPU tasks. That could be less than half a day of cache for a multi-GPU machine running 2003L jobs.

Gary Roberts

Moderator

Joined: 9 Feb 05

Posts: 5872

Credit: 117759568734

RAC: 34783514

Richard Haselgrove

18 Dec 2018 20:42:41 UTC

Message 168344 in response to message 168335

(moderation:

)

Richard Haselgrove wrote:

archae86 wrote:
I was unaware of the 1000 runnable tasks limit ...
That sounds like the 'maximum tasks in progress' limit I wrote about in Gary's "quota" thread, and also in Technical News.

Richard, thanks for contributing to the discussion. I'm just rejoining after some zzzz.

I, too, have not seen the runnable tasks limit previously. Not surprising as I tend to avoid (like the plague) large work caches. I have read your message in Technical news. The post from Bernd you referred to was from December 2016 - I'm not sure if you realised that. I imagine you did, but I just wanted to make sure :-).

The problem file LATeah2003L.dat was first issued back then. It's now appearing again (for whatever reason) and that seems to be very unusual. I'm now wondering if (when first deployed) it might have been providing 'CPU sized tasks' so that the new GPU app could be validated against tasks also being crunched by a CPU core. If so, and if the same 'sized' tasks are being generated again, that would explain why current tasks are running so fast.

All this is quite independent of your information about the BOINC mechanisms for daily quotas and maximum runnable limits. I'm certainly glad you've focused attention on those parameters.

Cheers,
Gary.

archae86

Joined: 6 Dec 05

Posts: 3157

Credit: 7228964896

RAC: 1134335

Current issue has reached the

19 Dec 2018 15:02:53 UTC

Message 168345 in response to message 168333

(moderation:

)

Current issue has reached the 292 frequency point. I promoted selected tasks to run ahead of turn to see if there was evidence of increasing elapsed time with increasing frequency. There is, some.

The host in question is my wife's daily use PC, runs Einstein GRP work 1X on a GTX 1050, and when she is not playing solitaire has pretty reproducible elapsed times.

I'll make a pretty table later, but the message so far has just three main elements:

elapsed time for 2002L tasks: 29:40

elapsed time for 2003L tasks of frequency 20, 28, 36 and 44: 7:14

elapsed times for 2003L tasks of frequency 68, 156: 8:25.

[edited a day later to add these entries]

elapsed times for 2003L tasks of frequency 204, 244, 284, 292: 8:26

elapsed times for 2003L tasks of frequency 364: 8:32

Tentatively it seems to be a stepwise climb, but not at a very rapid rate.

Richard Haselgrove

Joined: 10 Dec 05

Posts: 2143

Credit: 2960599355

RAC: 706254

I'm running some 292 already

18 Dec 2018 22:52:24 UTC

Message 168346

(moderation:

)

I'm running some 292 already - in round figures, up from 7:30 this morning to 8:50 now. I can pull a more exact list from the job log in the morning.

But I've got a new BOINC problem to report, too:

18/12/2018 22:44:30 | Einstein@Home | update requested by user
18/12/2018 22:44:31 | Einstein@Home | [work_fetch] REC 349206.724 prio -2.064 can request work
18/12/2018 22:44:31 | | [work_fetch] --- state for NVIDIA GPU ---
18/12/2018 22:44:31 | | [work_fetch] shortfall 76704.20 nidle 0.00 saturated 4842.19 busy 0.00
18/12/2018 22:44:31 | Einstein@Home | Sending scheduler request: Requested by user.
18/12/2018 22:44:31 | Einstein@Home | Not requesting tasks: don't need (CPU: job cache full; NVIDIA GPU: ; Intel GPU: job cache full)
18/12/2018 22:44:31 | Einstein@Home | [sched_op] NVIDIA GPU work request: 0.00 seconds; 0.00 devices
18/12/2018 22:45:34 | | [work_fetch] Request work fetch: Backoff ended for Einstein@Home
18/12/2018 22:45:35 | Einstein@Home | [work_fetch] REC 349200.733 prio -2.064 can request work
18/12/2018 22:45:35 | | [work_fetch] --- state for NVIDIA GPU ---
18/12/2018 22:45:35 | | [work_fetch] shortfall 76923.93 nidle 0.00 saturated 4803.51 busy 0.00

Big NVIDIA shortfall, Einstein can fetch work, but doesn't - with no reason stated.

Holmis

Joined: 4 Jan 05

Posts: 1118

Credit: 1055935564

RAC: 0

On my RX Vega 56 I've

18 Dec 2018 23:43:42 UTC

Message 168348

(moderation:

)

On my RX Vega 56 I've observed completion times of about 2:50 min for 44.0 tasks and 3:50 min for 260.0 tasks. All that running 3 tasks at once and no CPU tasks. Had to up my ncpus count or I'd run into quota restraints and run dry half way through the day. So for the time being my i7 3770K (4C 8T) is showing as a 96 threaded beast! That got me a daily quota of 1280 tasks.
The previous version of tasks was in the range of 11 - 16 min running in the same conditions.

Gary Roberts

Moderator

Joined: 9 Feb 05

Posts: 5872

Credit: 117759568734

RAC: 34783514

Richard Haselgrove wrote:Big

19 Dec 2018 0:49:36 UTC

Message 168349 in response to message 168346

(moderation:

)

Richard Haselgrove wrote:

Big NVIDIA shortfall, Einstein can fetch work, but doesn't - with no reason stated.

Now that a 'new day' has ticked over, did that cause any sort of change?

Cheers,
Gary.

Gary Roberts

Moderator

Joined: 9 Feb 05

Posts: 5872

Credit: 117759568734

RAC: 34783514

Holmis wrote:... my i7 3770K

19 Dec 2018 1:32:00 UTC

Message 168350 in response to message 168348

(moderation:

)

Holmis wrote:

... my i7 3770K (4C 8T) is showing as a 96 threaded beast! That got me a daily quota of 1280 tasks.

What a beast indeed!

This might allow us to confirm how the daily quota is currently implemented. I'm guessing it might still be just 128 for the GPU since that's what Bernd mentioned 2 years ago. I believe the 'allocation' is the sum total of both CPU tasks and GPU tasks and since you aren't running CPU tasks it's all available for the GPU.

The possibilities that seem to work for GPU allocation plus CPU allocation are:-

      32 *  4  (128)  plus  96 * 12  (1152)  =  1280
      32 * 10  (320)  plus  96 * 10  ( 960)  =  1280
      32 * 16  (512)  plus  96 *  8  ( 768)  =  1280

I don't know which is actually correct but the first seems a bit skimpy for a modern fast GPU in a dual core host whilst the last (8 per thread) may be a little restrictive for a fast CPU only host. My guess it's the first in the list. What a nice benefit from having 96 threads :-). I wonder if the 1,000 runnable tasks is going to get you at some point :-).

EDIT: I found this comment where there seems to be enough information to know the current formula. A 12 thread host with a single GPU is currently allocated 480 as a daily quota whilst a 2nd host with the same number of threads but 2 GPUs has 768. The difference of 288 must be the allocation for a single GPU. If each CPU thread is allocated 16, the two stated quotas work out correctly. So 12 cores plus a GPU gives 480 whilst 12 cores plus 2 GPUs gives 768 - as per the details given.

However, that doesn't give you a quota of 1280 since 96*16+288=1824. I'm really confused now :-). Are you sure your quota is only 1280??

Cheers,
Gary.

Latest data file for FGRPB1G GPU tasks

Forums › Cruncher's Corner

Comment viewing options

Forums › Cruncher's Corner