Latest data file for FGRPB1G GPU tasks

Holmis

Joined: 4 Jan 05

Posts: 1118

Credit: 1055935564

RAC: 0

Gary Roberts wrote:However,

19 Dec 2018 8:04:41 UTC

Message 168357 in response to message 168350

(moderation:

)

Gary Roberts wrote:

However, that doesn't give you a quota of 1280 since 96*16+288=1824. I'm really confused now :-). Are you sure your quota is only 1280??

That's what the event log said. Don't have time to do the math right now but remember that I have 2 GPUs as I think the Intel iGPU fully counts.

Richie

Joined: 7 Mar 14

Posts: 656

Credit: 1702989778

RAC: 0

My GTX 960 had reached daily

19 Dec 2018 8:54:22 UTC

Message 168358

(moderation:

)

My GTX 960 had reached daily quota limit (X amount... I don't remember exactly what it was at that point). I had been running 1x so far. I hit 'update' and no more tasks were sent for that host.

Then I made changes to app_config.xml ... GPU max tasks 1 --> 2 and GPU usage 1 --> .5. I hit 'update' and host immediately got over 100 tasks more. This happened when the total amount of running tasks was nowhere near the limit. The "daily" day border wasn't crossed at that time either.

That made me wonder why that happened and if running 1x or 2x .. 3x could also have some kind of an effect on the daily quota. It's most likely something else, but that just felt so strongly at that moment like the effect had come from that change. Sadly I don't really have any hard evidence to prove that

I tried that same maneuver with another host that was running 1x and it had encountere quota limit, but the change from 1x to 2x didn't have any effect on the quota limit on that host.

Richard Haselgrove

Joined: 10 Dec 05

Posts: 2143

Credit: 3041764571

RAC: 1962792

Gary Roberts wrote:Richard

19 Dec 2018 15:40:42 UTC

Message 168367 in response to message 168349

(moderation:

)

Gary Roberts wrote:

Richard Haselgrove wrote:
Big NVIDIA shortfall, Einstein can fetch work, but doesn't - with no reason stated.

Now that a 'new day' has ticked over, did that cause any sort of change?

My client did initiate a request shortly afterwards (22:59:59), but I don't think it was related to midnight in any part of the world. It was a perfectly normal and reasonable transaction:

18-Dec-2018 22:59:59 [Einstein@Home] [sched_op] NVIDIA GPU work request: 18378.82 seconds; 0.00 devices
18-Dec-2018 23:00:01 [Einstein@Home] Scheduler request completed: got 12 new tasks
18-Dec-2018 23:00:01 [Einstein@Home] [sched_op] estimated total NVIDIA GPU task duration: 19292 seconds

Further testing today has produced

19/12/2018 15:16:50 | Einstein@Home | (reached daily quota of 896 tasks)
19/12/2018 15:16:50 | Einstein@Home | Project requested delay of 33066 seconds

- that's the first and only time the word 'quota' has appeared in the message log in 3 months. Machine has four CPU cores, two NVidia GPUs, and one Intel GPU, for those solving the simultaneous equations - I had to bump the work request from the normal 0.5 days to 4 days to get that. 2 days was still below the max, and the highest I went yesterday was 1 day, so I'm pretty certain the failure even to request work yesterday was a client issue, not related to Einstein or any other server setting.

For future reference, note that the server deferral is for 09:11:06, or until 00:27:56 tomorrow. That's midnight UTC, plus a random fiddle-factor so we don't all hit the server at once.

Betreger

Joined: 25 Feb 05

Posts: 992

Credit: 1644591369

RAC: 621021

A few comments on this

19 Dec 2018 19:06:52 UTC

Message 168369

(moderation:

)

A few comments on this subject.

What a boon for my RAC.

My daily task limit will cause me to run out of work.

I'm not going to micro manage my cache so my back up project on my Einstein box will get some love.

A big mess will happen when we go back to "normal" length tasks.

Sid

Joined: 17 Oct 10

Posts: 164

Credit: 1014689434

RAC: 682369

On NVidia GPUs an elapsed

19 Dec 2018 20:40:32 UTC

Message 168376

(moderation:

)

On NVidia GPUs an elapsed time almost 10 times less then it was before.

Does it mean that this area of the sky just has nothing to discover?

Gary Roberts

Moderator

Joined: 9 Feb 05

Posts: 5888

Credit: 119644364226

RAC: 25061897

Sid wrote:On NVidia GPUs an

19 Dec 2018 20:58:39 UTC

Message 168377 in response to message 168376

(moderation:

)

Sid wrote:

On NVidia GPUs an elapsed time almost 10 times less then it was before.

Does it mean that this area of the sky just has nothing to discover?

Most people are seeing a 'speedup' factor of around 4 - 5, not 10. Are you sure about that?

GPU tasks are supposed to have 5 times the 'work content' of CPU tasks. The cause of the current fast running behaviour isn't known for sure but it most certainly isn't anything to do with "area of the sky". The device that collects the gamma-ray photons is called LAT - Large Area Telescope - which presumably means that a large area is being sampled (and averaged).

Cheers,
Gary.

Gary Roberts

Moderator

Joined: 9 Feb 05

Posts: 5888

Credit: 119644364226

RAC: 25061897

Betreger wrote:My daily task

19 Dec 2018 21:29:56 UTC

Message 168378 in response to message 168369

(moderation:

)

Betreger wrote:

My daily task limit will cause me to run out of work.

That's not really correct :-). The "cause" is that you wont change an available preference setting :-).

Also, it's not a case of "micro manage my cache". Your work cache stays the same. The one pref change (hardly micro-management) just allows your set cache to be maintained for longer. Of course, that's entirely up to you.

Yes, your "big mess" comment is quite correct. This is going to revert to the former behaviour at some point. The fast running tasks processed will severely disrupt (through the DCF mechanism) what could happen when the first tasks of the batch that follows 2003L are crunched. They will be estimated very low and overfetch is likely to have happened. It would be wise to make sure the work cache setting is nice and low at that point since the DCF can't actually return to normal until the first task of the new batch reaches the top of the queue.

Cheers,
Gary.

DanNeely

Joined: 4 Sep 05

Posts: 1364

Credit: 3602017044

RAC: 754671

Gary Roberts wrote:Betreger

19 Dec 2018 23:24:00 UTC

Message 168382 in response to message 168378

(moderation:

)

Gary Roberts wrote:

Betreger wrote:
My daily task limit will cause me to run out of work.

That's not really correct :-). The "cause" is that you wont change an available preference setting :-).

Also, it's not a case of "micro manage my cache". Your work cache stays the same. The one pref change (hardly micro-management) just allows your set cache to be maintained for longer. Of course, that's entirely up to you.

Yes, your "big mess" comment is quite correct. This is going to revert to the former behaviour at some point. The fast running tasks processed will severely disrupt (through the DCF mechanism) what could happen when the first tasks of the batch that follows 2003L are crunched. They will be estimated very low and overfetch is likely to have happened. It would be wise to make sure the work cache setting is nice and low at that point since the DCF can't actually return to normal until the first task of the new batch reaches the top of the queue.

DCF is a permanent garbage fire due to CPU/GPU mismatch. Daily download limits should constrain the size of the resulting abort fest to something that doesn't break the project. If it turns out I'm wrong about the latter, it might finally give the admins the impetus needed to do something to unfubar it.

Richard Haselgrove

Joined: 10 Dec 05

Posts: 2143

Credit: 3041764571

RAC: 1962792

DanNeely wrote:DCF is a

19 Dec 2018 23:35:10 UTC

Message 168383 in response to message 168382

(moderation:

)

DanNeely wrote:

DCF is a permanent garbage fire due to CPU/GPU mismatch. Daily download limits should constrain the size of the resulting abort fest to something that doesn't break the project. If it turns out I'm wrong about the latter, it might finally give the admins the impetus needed to do something to unfubar it.

DCF is certainly inappropriate for a mixed-platform project like Einstein. but it is NOT an abort fest.

Einstein is, in general, a very reliable supplier of work: when you need it, the work is there. So there is no need for extended cache sizes (unless you're absent from internet connections for days at a time?)

In my most complicated setup, I generally run a 0.5 day cache (bumped to 1 day on Tuesdays so SETI doesn't run dry). It fetches what it needs, when it needs, and all work is returned on time for both the Einstein apps it's running.

Richie

Joined: 7 Mar 14

Posts: 656

Credit: 1702989778

RAC: 0

R7 270X, running 2x*

20 Dec 2018 8:14:09 UTC

Message 168391

(moderation:

)

R9 270X, running 2x

* 2003L_340 and 2003L_420 difference seems to be about 7 sec (total time very close to 8 minutes)

Not much... but step by step it's getting... somewhere.

* I still have some 2003L_36 , they complete in slightly under 6 minutes

Latest data file for FGRPB1G GPU tasks

Forums › Cruncher's Corner

Comment viewing options

Forums › Cruncher's Corner