I can see that the new tasks run 75% longer than the old ones ON AVERAGE. It seems that high power GPUs (like the ones we tested on) are affected less than older or integrated ones.
The validator needs some more adjustments, it'll have to wait until tomorrow office hours (CET).
Curious to know where that 75% figure comes from. Is it across the entire project or what the development team is using?
The run time of all (successful) new tasks on a host is averaged, and then the run times of all old tasks on that host (in the DB) is averaged. Then the ratio avg(new)/avg(old) is taken per host, and finally this ratio is averaged over all hosts of the project that successfully completed both old and new tasks. The resulting overall ratio is 1.75, which means +75%.
Curious to know where that 75% figure comes from. Is it across the entire project or what the development team is using?
The run time of all (successful) new tasks on a host is averaged, and then the run times of all old tasks on that host (in the DB) is averaged. Then the ratio avg(new)/avg(old) is taken per host, and finally this ratio is averaged over all hosts of the project that successfully completed both old and new tasks. The resulting overall ratio is 1.75, which means +75%.
Did the host with "high power gpus" that you were using run a single task per gpu? What was the ratio for that single host alone? I previously ran O3AS with higher frequency batch with more than one task per GPU to improve gpu utilization.
comparing the average "tasks completed per day" is probably a more useful metric that will account for any changes to host configuration with old an new tasks.
on one of my hosts, my old runtimes were like ~1800s, and new runtimes are ~3900s. on it's face that looks like only a 2x slowdown, but the missing context is that I was running 5 tasks at a time to get that 1800s runtime, and only 3 tasks at a time to come to the 3900s runtime. these are two very optimized configurations that get the most out of each kind of task.
86400/1800 = 48, 48x5 = 240 tasks per day
86400/3900 = 22.15, 22.15*3 = 66.5 tasks per day
that puts the new tasks about 3.6x less productive than the old tasks.
and with the credit adjustment from increasing the estimated flops (which means credit should now be 20,000), that's about a reduction of about 1.8x in effective ppd, credit per day.
I was running 5 tasks at a time to get that 1800s runtime, and only 3 tasks at a time to come to the 3900s runtime.
I am also seeing that 3, maybe 4, simultaneous tasks is the "limit" even on high end gpus. The cores/memory bus are staying fully saturated 95% of the time running 3x. Running 4x will keep cores/bus fully saturated but we have yet to determine which is better (it might be 3x). This general behavior was seen across 4 hosts (rtx a4500, rtx a6000, and both 4090 systems).
Genral observation: It seems that the amount of concurrent work units will be cut in half on most of ours systems. Or, that is where we will start and then further optimize. I would suggest others "start" optimization by cutting concurrent in half and then adjusting from there. This suggestion would only apply if your system was already optimized.
comparing the average "tasks completed per day" is probably a more useful metric that will account for any changes to host configuration with old an new tasks.
Yes, this is the right way to compute that would cancel out concurrency configurations.
I have a script recording all finished tasks. On my 7950X+4070Ti host, I was finishing ~200-220 tasks per day with two tasks splayed to run concurrently. With the new tasks having very little idle GPU time, I configured it to run one per GPU and it finished 55 tasks yesterday. I tried running two tasks per GPU with the new tasks too, but that pretty much just doubled runtime.
Because of the significant reduction in CPU-only periods and increase in GPU compute, the slowdown ratio depends a lot on how under-powered one's CPU is relative to GPU. The weaker the CPU relatively, the less slowdown one would notice from the new tasks.
When I was running O3AS on Radeon VII, I was running 6 tasks per gpu to keep the average gpu utilization in the high 90%+ most of the time. At least now the low freq tasks will let me free up some cpu cores. Also did anyone notice that the cpu time for doing the stat recalculation appears to have dropped by about half (at least for those with fast cpu)?
The host configuration
)
The host configuration doesn't change on my host.
Or please explain how this coud happen.
Maybe I'm not aware of that.
sfv
I can see that the new tasks
)
I can see that the new tasks run 75% longer than the old ones ON AVERAGE. It seems that high power GPUs (like the ones we tested on) are affected less than older or integrated ones.
The validator needs some more adjustments, it'll have to wait until tomorrow office hours (CET).
BM
Curious to know where that
)
Curious to know where that 75% figure comes from. Is it across the entire project or what the development team is using?
Looking forward to seeing if validation rates are good as well. That will help inform us on where to commit limited resources. Thanks.
Soli Deo Gloria
So close to reaching 10k
)
So close to reaching 10k hours with the app. I'm not sure if the tasks I have left on one PC will get me there.
My tasks went from:
h1_1683.60_O3aC01Cl1In0__O3ASHF1d_1684.00Hz_625_1
to
h1_0162.80_O3aLC01Cl1In0__O3ASBu_163.00Hz_51642_0
and almost tripled in run time. Some are 160, 167, 198 Hz
Wedge009 wrote:Curious to
)
The run time of all (successful) new tasks on a host is averaged, and then the run times of all old tasks on that host (in the DB) is averaged. Then the ratio avg(new)/avg(old) is taken per host, and finally this ratio is averaged over all hosts of the project that successfully completed both old and new tasks. The resulting overall ratio is 1.75, which means +75%.
BM
Bernd Machenschalk
)
Did the host with "high power gpus" that you were using run a single task per gpu? What was the ratio for that single host alone? I previously ran O3AS with higher frequency batch with more than one task per GPU to improve gpu utilization.
comparing the average "tasks
)
comparing the average "tasks completed per day" is probably a more useful metric that will account for any changes to host configuration with old an new tasks.
on one of my hosts, my old runtimes were like ~1800s, and new runtimes are ~3900s. on it's face that looks like only a 2x slowdown, but the missing context is that I was running 5 tasks at a time to get that 1800s runtime, and only 3 tasks at a time to come to the 3900s runtime. these are two very optimized configurations that get the most out of each kind of task.
86400/1800 = 48, 48x5 = 240 tasks per day
86400/3900 = 22.15, 22.15*3 = 66.5 tasks per day
that puts the new tasks about 3.6x less productive than the old tasks.
and with the credit adjustment from increasing the estimated flops (which means credit should now be 20,000), that's about a reduction of about 1.8x in effective ppd, credit per day.
_________________________________________________________________________
Ian&Steve C. wrote:I was
)
I am also seeing that 3, maybe 4, simultaneous tasks is the "limit" even on high end gpus. The cores/memory bus are staying fully saturated 95% of the time running 3x. Running 4x will keep cores/bus fully saturated but we have yet to determine which is better (it might be 3x). This general behavior was seen across 4 hosts (rtx a4500, rtx a6000, and both 4090 systems).
Genral observation: It seems that the amount of concurrent work units will be cut in half on most of ours systems. Or, that is where we will start and then further optimize. I would suggest others "start" optimization by cutting concurrent in half and then adjusting from there. This suggestion would only apply if your system was already optimized.
Ian&Steve C. wrote:comparing
)
Yes, this is the right way to compute that would cancel out concurrency configurations.
I have a script recording all finished tasks. On my 7950X+4070Ti host, I was finishing ~200-220 tasks per day with two tasks splayed to run concurrently. With the new tasks having very little idle GPU time, I configured it to run one per GPU and it finished 55 tasks yesterday. I tried running two tasks per GPU with the new tasks too, but that pretty much just doubled runtime.
Because of the significant reduction in CPU-only periods and increase in GPU compute, the slowdown ratio depends a lot on how under-powered one's CPU is relative to GPU. The weaker the CPU relatively, the less slowdown one would notice from the new tasks.
When I was running O3AS on
)
When I was running O3AS on Radeon VII, I was running 6 tasks per gpu to keep the average gpu utilization in the high 90%+ most of the time. At least now the low freq tasks will let me free up some cpu cores. Also did anyone notice that the cpu time for doing the stat recalculation appears to have dropped by about half (at least for those with fast cpu)?