B.t.w Have you tried running O3SA on your RTX2080(vanilla or Ti) or on your other similar to mine to take a look of run times for reference? "One task only, just one. -- Sir? -- One ping only, just one. It is time."
all tasks were "h1_0298.40_O3aM1In1__O3ASE1_298.50Hz_xxxx_x"
CPU loaded with Universe@home on other threads, total CPU utilization ~92-95%
app_config.xml forcing 1CPU - 1GPU, only 1 task per GPU
Observations:
~1200MB GPU memory used
Task Progress:
~0-15%, (110-115% CPU thread utilization, ~75% GPU utilization)
~15-99%, (102-105% CPU thread utilization, ~85% GPU utilization)
99-100%, (99-100% CPU thread utilization, ~1-2% GPU utilization)
8 tasks, time to reach: 99% / 100% (m:ss)
5:07 / 7:07
5:08 / 7:09
5:00 / 7:00
5:02 / 7:02
5:04 / 7:05
5:07 / 7:02
5:00 / 6:59
5:07 / 7:07
So in my case, it always took about 2 minutes for that final 99-100% push. I'm sure CPU speed plays a big role in the time for this final "recalculation" to complete.
Mutex approach was used in SETI app at the beginning of GPU era in AP ATi Brook+ apps if I recall correctly, before BOINC was able to support GPU at all. It was not only optimization need, rather compatibility one - cause BOINC didn't differentiate CPU and GPU apps, setup was to use more "CPU" instances and do inter-communications between them to schedule in BOINC-independent way. One, "global" mutex aligned number of CPU vs Brook+ GPU apps while second guarded GPU parts of Brook AstroPulse code (I didn't find good enough FFT lib that didn't require to pass data w/o moving it to system memory (and that killed all performance boost from "Brooked" part) hence full GPU load required few instances in fly).
EDIT: More precisely there was so called "team" version of AstroPulse. Via Anonymous platform of course.
BOINC knew about it's CPU part. When BONC scheduler launched CPU part of team app it checked global mutex for current number of GPU instances in fly. If target was reached it continued as CPU optimised app. If not - it launched Brook+ version and awaited its finish. From BOINC side of view there was CPU app associated with particular WU.
Same approach could be used in modern days to use devices, unknown to BOINC like FPGAs (actually modern BOINC version has some mechanism to describe such unknowns in XML configurations, but can;t say how good it - never tested).
lol, that 5950X is a beast. My 3700X needed roughly 6:30-7:00 min for the CPU-intensive toplist recalculation that essentially renders the O3 GW tasks much less rewarding than the O2-run WUs and FGRPB1G tasks alike.
lol, that 5950X is a beast. My 3700X needed roughly 6:30-7:00 min for the CPU-intensive toplist recalculation that essentially renders the O3 GW tasks much less rewarding than the O2-run WUs and FGRPB1G tasks alike.
unless you were running the CPU at 100% load (overcommiting with too many tasks) i wouldn't expect the 3700X to be more than ~20% slower clock for clock since this is a single threaded process.
i took a peek at your tasks and noticed your wingmen were usually doing the final toplist in 2-3mins too. even on arguably weaker CPUs like an i3-9100. you really need to leave spare CPU resources for the GW tasks, they are CPU bound. the faster your GPU is, the faster your CPU needs to be. and you can never run with 100% CPU, it'll just slow everything down. I also recommend running an app_config.xml file to force 1CPU-1GPU so that BOINC properly accounts for free resources and doesn't try to run too many tasks.
Thanks Ian&Steve for taking a look! This surprised me as well, but somehow all of my E@H tasks showed some weirdness lateley. Threw 15+ errors recently on my system while not having changed any setting and everything running smoothly before. I'll likely revisit this tomorrow morning after running a few tasks over night. I do run an app config at a setting of 1 CPU & 1 GPU and always leave one thread for system overhead, so that is cannot be an issue. Might however be that at the time of processing, windows spawned some stupid system processes, that might have temporarily lead to CPU overcommitment... I'll keep an closer look next time I run these tasks.
all tasks were "h1_0398.80_O3aM1In1__O3ASE1_399.00Hz_xxxx_x"
CPU loaded with Universe@home on other threads, total CPU utilization ~95%
app_config.xml forcing 1CPU - 1GPU, only 1 task per GPU
Observations:
~1200MB GPU memory used
Task Progress:
~0-15%, (115-118% CPU thread utilization, ~70% GPU utilization)
~15-99%, (105-108% CPU thread utilization, ~80% GPU utilization)
99-100%, (99-100% CPU thread utilization, ~0% GPU utilization)
8 tasks, time to reach: 99% / 100% (m:ss)
5:34 / 9:31
5:24 / 9:19
5:28 / 9:24
5:30 / 9:32
5:35 / 9:31
5:27 / 9:23
5:41 / 9:35
5:30 / 9:27
So in this case, it always took around 4 minutes for that final 99-100% push. Slower CPU/Mem here, but also I think the type of task (higher freq, and DF?) play some role too. I'll have to see what the wingmen do.
all tasks were "h1_0398.80_O3aM1In1__O3ASE1_399.00Hz_xxxx_x"
CPU loaded with Universe@home on other threads, total CPU utilization ~95%
app_config.xml forcing 1CPU - 1GPU, only 1 task per GPU
Observations:
~1200MB GPU memory used
Task Progress:
~0-15%, (115-118% CPU thread utilization, ~70% GPU utilization)
~15-99%, (105-108% CPU thread utilization, ~80% GPU utilization)
99-100%, (99-100% CPU thread utilization, ~0% GPU utilization)
8 tasks, time to reach: 99% / 100% (m:ss)
5:34 / 9:31
5:24 / 9:19
5:28 / 9:24
5:30 / 9:32
5:35 / 9:31
5:27 / 9:23
5:41 / 9:35
5:30 / 9:27
So in this case, it always took around 4 minutes for that final 99-100% push. Slower CPU/Mem here, but also I think the type of task (higher freq, and DF?) play some role too. I'll have to see what the wingmen do.
Just as a point of comparison, I too have some O3ASE tasks on a 3950X platform.
Running environment/setup:
CPU: Ryzen 9 3950X at 4,299 MHz (OC)
MEM: DDR4 3200MHz CL14 Non-ECC DIMM
GPU: (2x) RTX 2070 Super @ ~1980 MHz and ~2025 MHz; No power limit (presently peak draw ~200W)
Windows 10 version 1909 (OS Build 18363.1379)
Tasks were "h1_0399.80_O3aM1In1__O3ASE1_399.80Hz_xxxx_x" and "h1_0399.80_O3aM1In1__O3ASE1_399.40Hz_xxxx_x"
CPU loaded with Universe@home and Milkyway@home on other threads, total CPU utilization 100%
app_config.xml forcing 1CPU - 1GPU, only 1 task per GPU
Observations:
~1,120 MB to ~1,520 GPU memory used
~1,750 MHz (both) memory clock
My present progress for O3ASE tasks for my 3950X (computer 12851564):
Keith Myers
)
Hi Keith!
It is time get some sleep now. I'll be back.
Petri
petri33 wrote:B.t.w Have
)
petri, I have run through several tasks. see the task results here: https://einsteinathome.org/host/12830750/tasks/0/55 (just ignore the aborted ones)
Running environment/setup:
Observations:
Task Progress:
8 tasks, time to reach: 99% / 100% (m:ss)
So in my case, it always took about 2 minutes for that final 99-100% push. I'm sure CPU speed plays a big role in the time for this final "recalculation" to complete.
_________________________________________________________________________
Thank you Ian. You sure
)
Thank you Ian. You sure do have a fast CPU.
Petri
petri33 wrote:Where can I
)
You can find Bernd's explanation in this message.
In particular, read the paragraph starting, "Regarding the "cleanup at 99%": ...".
Cheers,
Gary.
New is forgotten old quite
)
New is forgotten old quite often :)))
Mutex approach was used in SETI app at the beginning of GPU era in AP ATi Brook+ apps if I recall correctly, before BOINC was able to support GPU at all. It was not only optimization need, rather compatibility one - cause BOINC didn't differentiate CPU and GPU apps, setup was to use more "CPU" instances and do inter-communications between them to schedule in BOINC-independent way. One, "global" mutex aligned number of CPU vs Brook+ GPU apps while second guarded GPU parts of Brook AstroPulse code (I didn't find good enough FFT lib that didn't require to pass data w/o moving it to system memory (and that killed all performance boost from "Brooked" part) hence full GPU load required few instances in fly).
EDIT: More precisely there was so called "team" version of AstroPulse. Via Anonymous platform of course.
BOINC knew about it's CPU part. When BONC scheduler launched CPU part of team app it checked global mutex for current number of GPU instances in fly. If target was reached it continued as CPU optimised app. If not - it launched Brook+ version and awaited its finish. From BOINC side of view there was CPU app associated with particular WU.
Same approach could be used in modern days to use devices, unknown to BOINC like FPGAs (actually modern BOINC version has some mechanism to describe such unknowns in XML configurations, but can;t say how good it - never tested).
lol, that 5950X is a beast.
)
lol, that 5950X is a beast. My 3700X needed roughly 6:30-7:00 min for the CPU-intensive toplist recalculation that essentially renders the O3 GW tasks much less rewarding than the O2-run WUs and FGRPB1G tasks alike.
bozz4science wrote: lol,
)
unless you were running the CPU at 100% load (overcommiting with too many tasks) i wouldn't expect the 3700X to be more than ~20% slower clock for clock since this is a single threaded process.
i took a peek at your tasks and noticed your wingmen were usually doing the final toplist in 2-3mins too. even on arguably weaker CPUs like an i3-9100. you really need to leave spare CPU resources for the GW tasks, they are CPU bound. the faster your GPU is, the faster your CPU needs to be. and you can never run with 100% CPU, it'll just slow everything down. I also recommend running an app_config.xml file to force 1CPU-1GPU so that BOINC properly accounts for free resources and doesn't try to run too many tasks.
_________________________________________________________________________
Thanks Ian&Steve for taking a
)
Thanks Ian&Steve for taking a look! This surprised me as well, but somehow all of my E@H tasks showed some weirdness lateley. Threw 15+ errors recently on my system while not having changed any setting and everything running smoothly before. I'll likely revisit this tomorrow morning after running a few tasks over night. I do run an app config at a setting of 1 CPU & 1 GPU and always leave one thread for system overhead, so that is cannot be an issue. Might however be that at the time of processing, windows spawned some stupid system processes, that might have temporarily lead to CPU overcommitment... I'll keep an closer look next time I run these tasks.
Thanks again for your advice!
ran through some more O3ASE
)
ran through some more O3ASE tasks on a different platform.
see the task results here: https://einsteinathome.org/host/12803486/tasks/0/55
Running environment/setup:
Observations:
Task Progress:
8 tasks, time to reach: 99% / 100% (m:ss)
So in this case, it always took around 4 minutes for that final 99-100% push. Slower CPU/Mem here, but also I think the type of task (higher freq, and DF?) play some role too. I'll have to see what the wingmen do.
_________________________________________________________________________
Ian&Steve C. wrote: ran
)
Just as a point of comparison, I too have some O3ASE tasks on a 3950X platform.
Running environment/setup:
Observations:
My present progress for O3ASE tasks for my 3950X (computer 12851564):
Too few to make a real observation. The CPU time required to complete the 5 valid tasks are ~895 secs (14:54 min:sec) up to ~918 secs (15:18 min:sec)
I have no Gamma-ray Pulsar tasks currently in progress in my tasks. https://einsteinathome.org/host/12851564/tasks/0/0
Proud member of the Old Farts Association