Latest data file for FGRPB1G GPU tasks

Holmis
Joined: 4 Jan 05
Posts: 1118
Credit: 1055935564
RAC: 0

Gary Roberts wrote:However,

Gary Roberts wrote:
However, that doesn't give you a quota of 1280 since 96*16+288=1824.  I'm really confused now :-).  Are you sure your quota is only 1280??


That's what the event log said. Don't have time to do the math right now but remember that I have 2 GPUs as I think the Intel iGPU fully counts.

Richie
Richie
Joined: 7 Mar 14
Posts: 656
Credit: 1702989778
RAC: 0

My GTX 960 had reached daily

My GTX 960 had reached daily quota limit (X amount... I don't remember exactly what it was at that point). I had been running 1x so far. I hit 'update' and no more tasks were sent for that host.

Then I made changes to app_config.xml ... GPU max tasks 1 --> 2 and GPU usage 1 --> .5. I hit 'update' and host immediately got over 100 tasks more. This happened when the total amount of running tasks was nowhere near the limit. The "daily" day border wasn't crossed at that time either.

That made me wonder why that happened and if running 1x or 2x .. 3x could also have some kind of an effect on the daily quota. It's most likely something else, but that just felt so strongly at that moment like the effect had come from that change. Sadly I don't really have any hard evidence to prove that Tongue Out

I tried that same maneuver with another host that was running 1x and it had encountere quota limit, but the change from 1x to 2x didn't have any effect on the quota limit on that host.

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 2139
Credit: 2738135217
RAC: 1427759

Gary Roberts wrote:Richard

Gary Roberts wrote:
Richard Haselgrove wrote:
Big NVIDIA shortfall, Einstein can fetch work, but doesn't - with no reason stated.

Now that a 'new day' has ticked over, did that cause any sort of change?

My client did initiate a request shortly afterwards (22:59:59), but I don't think it was related to midnight in any part of the world. It was a perfectly normal and reasonable transaction:

18-Dec-2018 22:59:59 [Einstein@Home] [sched_op] NVIDIA GPU work request: 18378.82 seconds; 0.00 devices
18-Dec-2018 23:00:01 [Einstein@Home] Scheduler request completed: got 12 new tasks
18-Dec-2018 23:00:01 [Einstein@Home] [sched_op] estimated total NVIDIA GPU task duration: 19292 seconds

Further testing today has produced

19/12/2018 15:16:50 | Einstein@Home | (reached daily quota of 896 tasks)
19/12/2018 15:16:50 | Einstein@Home | Project requested delay of 33066 seconds

- that's the first and only time the word 'quota' has appeared in the message log in 3 months. Machine has four CPU cores, two NVidia GPUs, and one Intel GPU, for those solving the simultaneous equations - I had to bump the work request from the normal 0.5 days to 4 days to get that. 2 days was still below the max, and the highest I went yesterday was 1 day, so I'm pretty certain the failure even to request work yesterday was a client issue, not related to Einstein or any other server setting.

For future reference, note that the server deferral is for 09:11:06, or until 00:27:56 tomorrow. That's midnight UTC, plus a random fiddle-factor so we don't all hit the server at once.

Betreger
Betreger
Joined: 25 Feb 05
Posts: 987
Credit: 1413521760
RAC: 740806

A few comments on this

A few comments on this subject.

What a boon for my RAC.

My daily task limit will cause me to run out of work.

I'm not going to micro manage my cache so my back up project on my Einstein box will get some love. 

A big mess will happen when we go back to "normal" length tasks.

Sid
Sid
Joined: 17 Oct 10
Posts: 160
Credit: 918163110
RAC: 281607

On NVidia GPUs an elapsed

On NVidia GPUs an elapsed time almost 10 times less then it was before.

Does it mean that this area of the sky just has nothing to discover?

 

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5840
Credit: 109038473765
RAC: 33862529

Sid wrote:On NVidia GPUs an

Sid wrote:

On NVidia GPUs an elapsed time almost 10 times less then it was before.

Does it mean that this area of the sky just has nothing to discover?

Most people are seeing a 'speedup' factor of around 4 - 5, not 10.  Are you sure about that?

GPU tasks are supposed to have 5 times the 'work content' of CPU tasks.  The cause of the current fast running behaviour isn't known for sure but it most certainly isn't anything to do with "area of the sky".   The device that collects the gamma-ray photons is called LAT - Large Area Telescope - which presumably means that a large area is being sampled (and averaged).

 

Cheers,
Gary.

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5840
Credit: 109038473765
RAC: 33862529

Betreger wrote:My daily task

Betreger wrote:
My daily task limit will cause me to run out of work.

That's not really correct :-).  The "cause" is that you wont change an available preference setting :-).

Also, it's not a case of "micro manage my cache".  Your work cache stays the same.  The one pref change (hardly micro-management) just allows your set cache to be maintained for longer.  Of course, that's entirely up to you.

Yes, your "big mess" comment is quite correct.  This is going to revert to the former behaviour at some point.  The fast running tasks processed will severely disrupt (through the DCF mechanism) what could happen when the first tasks of the batch that follows 2003L are crunched.  They will be estimated very low and overfetch is likely to have happened.  It would be wise to make sure the work cache setting is nice and low at that point since the DCF can't actually return to normal until the first task of the new batch reaches the top of the queue.

 

Cheers,
Gary.

DanNeely
DanNeely
Joined: 4 Sep 05
Posts: 1364
Credit: 3562358667
RAC: 4253

Gary Roberts wrote:Betreger

Gary Roberts wrote:
Betreger wrote:
My daily task limit will cause me to run out of work.

That's not really correct :-).  The "cause" is that you wont change an available preference setting :-).

Also, it's not a case of "micro manage my cache".  Your work cache stays the same.  The one pref change (hardly micro-management) just allows your set cache to be maintained for longer.  Of course, that's entirely up to you.

Yes, your "big mess" comment is quite correct.  This is going to revert to the former behaviour at some point.  The fast running tasks processed will severely disrupt (through the DCF mechanism) what could happen when the first tasks of the batch that follows 2003L are crunched.  They will be estimated very low and overfetch is likely to have happened.  It would be wise to make sure the work cache setting is nice and low at that point since the DCF can't actually return to normal until the first task of the new batch reaches the top of the queue.

 

 

DCF is a permanent garbage fire due to CPU/GPU mismatch.  Daily download limits should constrain the size of the resulting abort fest to something that doesn't break the project.  If it turns out I'm wrong about the latter, it might finally give the admins the impetus needed  to do something to unfubar it.

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 2139
Credit: 2738135217
RAC: 1427759

DanNeely wrote:DCF is a

DanNeely wrote:
DCF is a permanent garbage fire due to CPU/GPU mismatch.  Daily download limits should constrain the size of the resulting abort fest to something that doesn't break the project.  If it turns out I'm wrong about the latter, it might finally give the admins the impetus needed  to do something to unfubar it.

DCF is certainly inappropriate for a mixed-platform project like Einstein. but it is NOT an abort fest.

Einstein is, in general, a very reliable supplier of work: when you need it, the work is there. So there is no need for extended cache sizes (unless you're absent from internet connections for days at a time?)

In my most complicated setup, I generally run a 0.5 day cache (bumped to 1 day on Tuesdays so SETI doesn't run dry). It fetches what it needs, when it needs, and all work is returned on time for both the Einstein apps it's running.

Richie
Richie
Joined: 7 Mar 14
Posts: 656
Credit: 1702989778
RAC: 0

R7 270X, running 2x*

R9 270X, running 2x

* 2003L_340 and 2003L_420 difference seems to be about 7 sec (total time very close to 8 minutes)

Not much... but step by step it's getting... somewhere.

* I still have some 2003L_36 , they complete in slightly under 6 minutes

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.