My GTX 960 had reached daily quota limit (X amount... I don't remember exactly what it was at that point). I had been running 1x so far. I hit 'update' and no more tasks were sent for that host.
Then I made changes to app_config.xml ... GPU max tasks 1 --> 2 and GPU usage 1 --> .5. I hit 'update' and host immediately got over 100 tasks more. This happened when the total amount of running tasks was nowhere near the limit. The "daily" day border wasn't crossed at that time either.
That made me wonder why that happened and if running 1x or 2x .. 3x could also have some kind of an effect on the daily quota. It's most likely something else, but that just felt so strongly at that moment like the effect had come from that change. Sadly I don't really have any hard evidence to prove that
I tried that same maneuver with another host that was running 1x and it had encountere quota limit, but the change from 1x to 2x didn't have any effect on the quota limit on that host.
Big NVIDIA shortfall, Einstein can fetch work, but doesn't - with no reason stated.
Now that a 'new day' has ticked over, did that cause any sort of change?
My client did initiate a request shortly afterwards (22:59:59), but I don't think it was related to midnight in any part of the world. It was a perfectly normal and reasonable transaction:
- that's the first and only time the word 'quota' has appeared in the message log in 3 months. Machine has four CPU cores, two NVidia GPUs, and one Intel GPU, for those solving the simultaneous equations - I had to bump the work request from the normal 0.5 days to 4 days to get that. 2 days was still below the max, and the highest I went yesterday was 1 day, so I'm pretty certain the failure even to request work yesterday was a client issue, not related to Einstein or any other server setting.
For future reference, note that the server deferral is for 09:11:06, or until 00:27:56 tomorrow. That's midnight UTC, plus a random fiddle-factor so we don't all hit the server at once.
On NVidia GPUs an elapsed time almost 10 times less then it was before.
Does it mean that this area of the sky just has nothing to discover?
Most people are seeing a 'speedup' factor of around 4 - 5, not 10. Are you sure about that?
GPU tasks are supposed to have 5 times the 'work content' of CPU tasks. The cause of the current fast running behaviour isn't known for sure but it most certainly isn't anything to do with "area of the sky". The device that collects the gamma-ray photons is called LAT - Large Area Telescope - which presumably means that a large area is being sampled (and averaged).
My daily task limit will cause me to run out of work.
That's not really correct :-). The "cause" is that you wont change an available preference setting :-).
Also, it's not a case of "micro manage my cache". Your work cache stays the same. The one pref change (hardly micro-management) just allows your set cache to be maintained for longer. Of course, that's entirely up to you.
Yes, your "big mess" comment is quite correct. This is going to revert to the former behaviour at some point. The fast running tasks processed will severely disrupt (through the DCF mechanism) what could happen when the first tasks of the batch that follows 2003L are crunched. They will be estimated very low and overfetch is likely to have happened. It would be wise to make sure the work cache setting is nice and low at that point since the DCF can't actually return to normal until the first task of the new batch reaches the top of the queue.
My daily task limit will cause me to run out of work.
That's not really correct :-). The "cause" is that you wont change an available preference setting :-).
Also, it's not a case of "micro manage my cache". Your work cache stays the same. The one pref change (hardly micro-management) just allows your set cache to be maintained for longer. Of course, that's entirely up to you.
Yes, your "big mess" comment is quite correct. This is going to revert to the former behaviour at some point. The fast running tasks processed will severely disrupt (through the DCF mechanism) what could happen when the first tasks of the batch that follows 2003L are crunched. They will be estimated very low and overfetch is likely to have happened. It would be wise to make sure the work cache setting is nice and low at that point since the DCF can't actually return to normal until the first task of the new batch reaches the top of the queue.
DCF is a permanent garbage fire due to CPU/GPU mismatch. Daily download limits should constrain the size of the resulting abort fest to something that doesn't break the project. If it turns out I'm wrong about the latter, it might finally give the admins the impetus needed to do something to unfubar it.
DCF is a permanent garbage fire due to CPU/GPU mismatch. Daily download limits should constrain the size of the resulting abort fest to something that doesn't break the project. If it turns out I'm wrong about the latter, it might finally give the admins the impetus needed to do something to unfubar it.
DCF is certainly inappropriate for a mixed-platform project like Einstein. but it is NOT an abort fest.
Einstein is, in general, a very reliable supplier of work: when you need it, the work is there. So there is no need for extended cache sizes (unless you're absent from internet connections for days at a time?)
In my most complicated setup, I generally run a 0.5 day cache (bumped to 1 day on Tuesdays so SETI doesn't run dry). It fetches what it needs, when it needs, and all work is returned on time for both the Einstein apps it's running.
Gary Roberts wrote:However,
)
That's what the event log said. Don't have time to do the math right now but remember that I have 2 GPUs as I think the Intel iGPU fully counts.
My GTX 960 had reached daily
)
My GTX 960 had reached daily quota limit (X amount... I don't remember exactly what it was at that point). I had been running 1x so far. I hit 'update' and no more tasks were sent for that host.
Then I made changes to app_config.xml ... GPU max tasks 1 --> 2 and GPU usage 1 --> .5. I hit 'update' and host immediately got over 100 tasks more. This happened when the total amount of running tasks was nowhere near the limit. The "daily" day border wasn't crossed at that time either.
That made me wonder why that happened and if running 1x or 2x .. 3x could also have some kind of an effect on the daily quota. It's most likely something else, but that just felt so strongly at that moment like the effect had come from that change. Sadly I don't really have any hard evidence to prove that
I tried that same maneuver with another host that was running 1x and it had encountere quota limit, but the change from 1x to 2x didn't have any effect on the quota limit on that host.
Gary Roberts wrote:Richard
)
My client did initiate a request shortly afterwards (22:59:59), but I don't think it was related to midnight in any part of the world. It was a perfectly normal and reasonable transaction:
18-Dec-2018 22:59:59 [Einstein@Home] [sched_op] NVIDIA GPU work request: 18378.82 seconds; 0.00 devices
18-Dec-2018 23:00:01 [Einstein@Home] Scheduler request completed: got 12 new tasks
18-Dec-2018 23:00:01 [Einstein@Home] [sched_op] estimated total NVIDIA GPU task duration: 19292 seconds
Further testing today has produced
19/12/2018 15:16:50 | Einstein@Home | (reached daily quota of 896 tasks)
19/12/2018 15:16:50 | Einstein@Home | Project requested delay of 33066 seconds
- that's the first and only time the word 'quota' has appeared in the message log in 3 months. Machine has four CPU cores, two NVidia GPUs, and one Intel GPU, for those solving the simultaneous equations - I had to bump the work request from the normal 0.5 days to 4 days to get that. 2 days was still below the max, and the highest I went yesterday was 1 day, so I'm pretty certain the failure even to request work yesterday was a client issue, not related to Einstein or any other server setting.
For future reference, note that the server deferral is for 09:11:06, or until 00:27:56 tomorrow. That's midnight UTC, plus a random fiddle-factor so we don't all hit the server at once.
A few comments on this
)
A few comments on this subject.
What a boon for my RAC.
My daily task limit will cause me to run out of work.
I'm not going to micro manage my cache so my back up project on my Einstein box will get some love.
A big mess will happen when we go back to "normal" length tasks.
On NVidia GPUs an elapsed
)
On NVidia GPUs an elapsed time almost 10 times less then it was before.
Does it mean that this area of the sky just has nothing to discover?
Sid wrote:On NVidia GPUs an
)
Most people are seeing a 'speedup' factor of around 4 - 5, not 10. Are you sure about that?
GPU tasks are supposed to have 5 times the 'work content' of CPU tasks. The cause of the current fast running behaviour isn't known for sure but it most certainly isn't anything to do with "area of the sky". The device that collects the gamma-ray photons is called LAT - Large Area Telescope - which presumably means that a large area is being sampled (and averaged).
Cheers,
Gary.
Betreger wrote:My daily task
)
That's not really correct :-). The "cause" is that you wont change an available preference setting :-).
Also, it's not a case of "micro manage my cache". Your work cache stays the same. The one pref change (hardly micro-management) just allows your set cache to be maintained for longer. Of course, that's entirely up to you.
Yes, your "big mess" comment is quite correct. This is going to revert to the former behaviour at some point. The fast running tasks processed will severely disrupt (through the DCF mechanism) what could happen when the first tasks of the batch that follows 2003L are crunched. They will be estimated very low and overfetch is likely to have happened. It would be wise to make sure the work cache setting is nice and low at that point since the DCF can't actually return to normal until the first task of the new batch reaches the top of the queue.
Cheers,
Gary.
Gary Roberts wrote:Betreger
)
DCF is a permanent garbage fire due to CPU/GPU mismatch. Daily download limits should constrain the size of the resulting abort fest to something that doesn't break the project. If it turns out I'm wrong about the latter, it might finally give the admins the impetus needed to do something to unfubar it.
DanNeely wrote:DCF is a
)
DCF is certainly inappropriate for a mixed-platform project like Einstein. but it is NOT an abort fest.
Einstein is, in general, a very reliable supplier of work: when you need it, the work is there. So there is no need for extended cache sizes (unless you're absent from internet connections for days at a time?)
In my most complicated setup, I generally run a 0.5 day cache (bumped to 1 day on Tuesdays so SETI doesn't run dry). It fetches what it needs, when it needs, and all work is returned on time for both the Einstein apps it's running.
R7 270X, running 2x*
)
R9 270X, running 2x
* 2003L_340 and 2003L_420 difference seems to be about 7 sec (total time very close to 8 minutes)
Not much... but step by step it's getting... somewhere.
* I still have some 2003L_36 , they complete in slightly under 6 minutes