If doing Einstein tasks is that important, shouldn't you have a somewhat bigger work cache setting?
When using a combination of FGRP4 on the CPUs and multiple BRP6 on the GPU, the CPU tasks are overestimated and the BRP6 are underestimated, causing a constant back-and-forth with the duration correction factor (DCF). Because of this overestimation of FGRP4, I always have less than what the cache setting specifies. If I want CPU tasks to last at least 2 days, I always set 3 days or more.
Now that the flow of FGRP4 work has resumed and been going for a few hours, I'm surprised that nobody seems to be asking any questions about why there was an 'outage' in the first place.
I've looked at a couple of my machines and there is now a new, seemingly unannounced (if not I certainly missed it) app version - v1.15. These hosts were previously running v1.14. The last 1.14 tasks received were for the data file LATeah1078E.dat. The first new tasks seem to be for a new data file LATeah1080E.dat. I only looked at a couple of hosts and I haven't immediately seen any tasks for LATeah1079E.dat so I don't know if there were only a few or perhaps none at all.
My guess is that the change of version needed to happen with the change in data file. The old data ran out after close of business yesterday so the new app couldn't be used and tasks for the new data file couldn't be generated until start of business today.
Or something like that :-).
I might 'promote' a couple of tasks to make sure there aren't going to be any 'new version' surprises :-).
Now that the flow of FGRP4 work has resumed and been going for a few hours, I'm surprised that nobody seems to be asking any questions about why there was an 'outage' in the first place.
I've looked at a couple of my machines and there is now a new, seemingly unannounced (if not I certainly missed it) app version - v1.15.
Good spot that.
Quote:
These hosts were previously running v1.14. The last 1.14 tasks received were for the data file LATeah1078E.dat. The first new tasks seem to be for a new data file LATeah1080E.dat. I only looked at a couple of hosts and I haven't immediately seen any tasks for LATeah1079E.dat so I don't know if there were only a few or perhaps none at all.
I just checked my tasks, there were some 1079E but clearly not many (yet?) - see my In progress tasks
Quote:
I might 'promote' a couple of tasks to make sure there aren't going to be any 'new version' surprises :-).
The new tasks have started running ok but not yet completed, that is about 6 hours away for my CPU cows.
I noticed the server status showed an increase in the number of available tasks, (and continues to rise) but the generators remain "Not Running"
I expect we'll get a note after the fixing is done, while the CPU herd graze the FGRP4 haystack, looking for the needles.
CPU work
)
I sent a PM to Bernd and HB a couple of hours ago.
Cheers,
Gary.
Well if it does not get fixed
)
Well if it does not get fixed soon this host will be crunching seti on it's CPU, something I really don't want to do.
If doing Einstein tasks is
)
If doing Einstein tasks is that important, shouldn't you have a somewhat bigger work cache setting?
When using a combination of FGRP4 on the CPUs and multiple BRP6 on the GPU, the CPU tasks are overestimated and the BRP6 are underestimated, causing a constant back-and-forth with the duration correction factor (DCF). Because of this overestimation of FGRP4, I always have less than what the cache setting specifies. If I want CPU tasks to last at least 2 days, I always set 3 days or more.
Cheers,
Gary.
Now that the flow of FGRP4
)
Now that the flow of FGRP4 work has resumed and been going for a few hours, I'm surprised that nobody seems to be asking any questions about why there was an 'outage' in the first place.
I've looked at a couple of my machines and there is now a new, seemingly unannounced (if not I certainly missed it) app version - v1.15. These hosts were previously running v1.14. The last 1.14 tasks received were for the data file LATeah1078E.dat. The first new tasks seem to be for a new data file LATeah1080E.dat. I only looked at a couple of hosts and I haven't immediately seen any tasks for LATeah1079E.dat so I don't know if there were only a few or perhaps none at all.
My guess is that the change of version needed to happen with the change in data file. The old data ran out after close of business yesterday so the new app couldn't be used and tasks for the new data file couldn't be generated until start of business today.
Or something like that :-).
I might 'promote' a couple of tasks to make sure there aren't going to be any 'new version' surprises :-).
Cheers,
Gary.
RE: Now that the flow of
)
Good spot that.
I just checked my tasks, there were some 1079E but clearly not many (yet?) - see my In progress tasks
The new tasks have started running ok but not yet completed, that is about 6 hours away for my CPU cows.
I noticed the server status showed an increase in the number of available tasks, (and continues to rise) but the generators remain "Not Running"
I expect we'll get a note after the fixing is done, while the CPU herd graze the FGRP4 haystack, looking for the needles.