Latest data file for FGRPB1G GPU tasks

archae86
archae86
Joined: 6 Dec 05
Posts: 3163
Credit: 7325621687
RAC: 2326473

One element of the old

One element of the old pattern does seem to be showing itself.  The field after the data file in the task names for currently issuing tasks has moved up at an accelerating rate.

Tasks for which that value was 20 issued over more than a six hour period, but a big gulp my primary host took about an hour ago (when the 1000 runnable tasks limit stopped forbidding my host from making a request) got a mix of 60, 84, 108, and 124.  If the old pattern forecasts behavior this time, this will soon get to values over 200, and then the rate of rise will progressively slow.

If another element of the old pattern is in effect, then even the 124 units will have appreciably longer run times than the 20 units we established the 1/4 relationship on.   But not nearly enough to get us back to comparable times from the 2002 tasks.

I was unaware of the 1000 runnable tasks limit until I ran into it late yesterday.  If I understand the behavior I've seen and a snippit of code I found with a Google search, this limit stops BOINC on my host from asking for any new work if the current runnable task count is over 1000.  However once the runnable task count is 999, this particular limitation has no effect, so an individual request can ask for a lot of work, and bounce the count on the host well over 1000 if the project servers happen to be able to grant a large number of tasks to a single request at that moment. (mine got well over 100 from a single request this morning).  This limit is, I think, entirely separate from daily fetch quotas and such.

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 2143
Credit: 2996022479
RAC: 715783

archae86 wrote:I was unaware

archae86 wrote:
I was unaware of the 1000 runnable tasks limit ...

That sounds like the 'maximum tasks in progress' limit I wrote about in Gary's "quota" thread, and also in Technical News. Under current BOINC Central code, it doesn't stop the client asking for new work, but it stops the server sending it. So, if the limit is 1,000, and you ask for more work when you have 1,000 tasks on board, you get nothing back except a little note in the Event Log saying "This computer has reached a limit on tasks in progress". If you have 999 tasks on board, you get at most one new task, and no note.

As we know, we run older server code here, so we may not have all those tools at our disposal. The new code, for example, is capable of keeping separate limits for each of CPUs and GPUs, and can even set separate limits by app. That would be a much better solution for the 2003L problem than a daily limit which could be used up part-way through the 24 hours, and leave resources idle until time was up.

archae86
archae86
Joined: 6 Dec 05
Posts: 3163
Credit: 7325621687
RAC: 2326473

Richard Haselgrove

Richard Haselgrove wrote:
archae86 wrote:
I was unaware of the 1000 runnable tasks limit ...
That sounds like the 'maximum tasks in progress' limit I wrote about in Gary's "quota" thread, and also in Technical News. Under current BOINC Central code, it doesn't stop the client asking for new work, but it stops the server sending it. So, if the limit is 1,000, and you ask for more work when you have 1,000 tasks on board, you get nothing back except a little note in the Event Log saying "This computer has reached a limit on tasks in progress". If you have 999 tasks on board, you get at most one new task, and no note.

Richard, possibly this is something different, as it is clearly a limitation on "asking for new work".

The behavior and the messages I've seen in this case correspond to this bit of documentation:

https://gitlab.aei.uni-hannover.de/einsteinathome/boinc/commit/4d47e2f170ae638a0121c4a31cc4a9f54a75848a

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 2143
Credit: 2996022479
RAC: 715783

I was beginning to worry

I was beginning to worry about that, as soon as I'd written it! That's the problem with modern software, there are so many ways of achieving the same thing.

The problem with the one you found is that it doesn't distinguish between CPU and GPU tasks, so somebody could end up with 1,000+ FGRPB1G tasks, and find it hard to fetch CPU tasks. That could be less than half a day of cache for a multi-GPU machine running 2003L jobs.

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5878
Credit: 118828306149
RAC: 22393671

Richard Haselgrove

Richard Haselgrove wrote:
archae86 wrote:
I was unaware of the 1000 runnable tasks limit ...
That sounds like the 'maximum tasks in progress' limit I wrote about in Gary's "quota" thread, and also in Technical News.

Richard, thanks for contributing to the discussion.  I'm just rejoining after some zzzz.

I, too, have not seen the runnable tasks limit previously.  Not surprising as I tend to avoid (like the plague) large work caches.  I have read your message in Technical news.  The post from Bernd you referred to was from December 2016 - I'm not sure if you realised that.  I imagine you did, but I just wanted to make sure :-).

The problem file LATeah2003L.dat was first issued back then.  It's now appearing again (for whatever reason) and that seems to be very unusual.  I'm now wondering if (when first deployed) it might have been providing 'CPU sized tasks' so that the new GPU app could be validated against tasks also being crunched by a CPU core.  If so, and if the same 'sized' tasks are being generated again, that would explain why current tasks are running so fast.

All this is quite independent of your information about the BOINC mechanisms for daily quotas and maximum runnable limits.  I'm certainly glad you've focused attention on those parameters.

 

Cheers,
Gary.

archae86
archae86
Joined: 6 Dec 05
Posts: 3163
Credit: 7325621687
RAC: 2326473

Current issue has reached the

Current issue has reached the 292 frequency point.  I promoted selected tasks to run ahead of turn to see if there was evidence of increasing elapsed time with increasing frequency.  There is, some.

The host in question is my wife's daily use PC, runs Einstein GRP work 1X on a GTX 1050, and when she is not playing solitaire has pretty reproducible elapsed times.

I'll make a pretty table later, but the message so far has just three main elements:

elapsed time for 2002L tasks: 29:40

elapsed time for 2003L tasks of frequency 20, 28, 36 and 44: 7:14

elapsed times for 2003L tasks of frequency 68, 156: 8:25.

[edited a day later to add these entries]

elapsed times for 2003L tasks of frequency 204, 244, 284, 292: 8:26

elapsed times for 2003L tasks of frequency 364: 8:32

Tentatively it seems to be a stepwise climb, but not at a very rapid rate.

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 2143
Credit: 2996022479
RAC: 715783

I'm running some 292 already

I'm running some 292 already - in round figures, up from 7:30 this morning to 8:50 now. I can pull a more exact list from the job log in the morning.

But I've got a new BOINC problem to report, too:

18/12/2018 22:44:30 | Einstein@Home | update requested by user
18/12/2018 22:44:31 | Einstein@Home | [work_fetch] REC 349206.724 prio -2.064 can request work
18/12/2018 22:44:31 | | [work_fetch] --- state for NVIDIA GPU ---
18/12/2018 22:44:31 | | [work_fetch] shortfall 76704.20 nidle 0.00 saturated 4842.19 busy 0.00
18/12/2018 22:44:31 | Einstein@Home | Sending scheduler request: Requested by user.
18/12/2018 22:44:31 | Einstein@Home | Not requesting tasks: don't need (CPU: job cache full; NVIDIA GPU: ; Intel GPU: job cache full)
18/12/2018 22:44:31 | Einstein@Home | [sched_op] NVIDIA GPU work request: 0.00 seconds; 0.00 devices
18/12/2018 22:45:34 | | [work_fetch] Request work fetch: Backoff ended for Einstein@Home
18/12/2018 22:45:35 | Einstein@Home | [work_fetch] REC 349200.733 prio -2.064 can request work
18/12/2018 22:45:35 | | [work_fetch] --- state for NVIDIA GPU ---
18/12/2018 22:45:35 | | [work_fetch] shortfall 76923.93 nidle 0.00 saturated 4803.51 busy 0.00

Big NVIDIA shortfall, Einstein can fetch work, but doesn't - with no reason stated.

Holmis
Joined: 4 Jan 05
Posts: 1118
Credit: 1055935564
RAC: 0

On my RX Vega 56 I've

On my RX Vega 56 I've observed completion times of about 2:50 min for 44.0 tasks and 3:50 min for 260.0 tasks. All that running 3 tasks at once and no CPU tasks. Had to up my ncpus count or I'd run into quota restraints and run dry half way through the day. So for the time being my i7 3770K (4C 8T) is showing as a 96 threaded beast! Cool  That got me a daily quota of 1280 tasks.
The previous version of tasks was in the range of 11 - 16 min running in the same conditions.

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5878
Credit: 118828306149
RAC: 22393671

Richard Haselgrove wrote:Big

Richard Haselgrove wrote:
Big NVIDIA shortfall, Einstein can fetch work, but doesn't - with no reason stated.

Now that a 'new day' has ticked over, did that cause any sort of change?

 

Cheers,
Gary.

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5878
Credit: 118828306149
RAC: 22393671

Holmis wrote:... my i7 3770K

Holmis wrote:
... my i7 3770K (4C 8T) is showing as a 96 threaded beast! Cool  That got me a daily quota of 1280 tasks.

What a beast indeed!

This might allow us to confirm how the daily quota is currently implemented.  I'm guessing it might still be just 128 for the GPU since that's what Bernd mentioned 2 years ago.  I believe the 'allocation' is the sum total of both CPU tasks and GPU tasks and since you aren't running CPU tasks it's all available for the GPU.

The possibilities that seem to work for GPU allocation plus CPU allocation are:-

      32 *  4  (128)  plus  96 * 12  (1152)  =  1280
      32 * 10  (320)  plus  96 * 10  ( 960)  =  1280
      32 * 16  (512)  plus  96 *  8  ( 768)  =  1280

I don't know which is actually correct but the first seems a bit skimpy for a modern fast GPU in a dual core host whilst the last (8 per thread) may be a little restrictive for a fast CPU only host.  My guess it's the first in the list.  What a nice benefit from having 96 threads :-).  I wonder if the 1,000 runnable tasks is going to get you at some point :-).

EDIT:  I found this comment where there seems to be enough information to know the current formula.  A 12 thread host with a single GPU is currently allocated 480 as a daily quota whilst a 2nd host with the same number of threads but 2 GPUs has 768.  The difference of 288 must be the allocation for a single GPU.  If each CPU thread is allocated 16, the two stated quotas work out correctly.  So 12 cores plus a GPU gives 480 whilst 12 cores plus 2 GPUs gives 768 - as per the details given.

However, that doesn't give you a quota of 1280 since 96*16+288=1824.  I'm really confused now :-).  Are you sure your quota is only 1280??

 

Cheers,
Gary.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.