Discussion Thread for the Continuous GW Search known as O2MD1 (now O2MDF - GPUs only)

petri33

Joined: 4 Mar 20

Posts: 123

Credit: 4037845819

RAC: 7024758

Keith Myers

3 May 2021 21:57:24 UTC

Message 185460 in response to message 185458

(moderation:

)

Keith Myers wrote:

Quote:

"One task only, just one. -- Sir? -- One ping only, just one. It is time."

Ha ha hah. Big thanks, Petri for the meme.

Hi Keith!

It is time get some sleep now. I'll be back.

Petri

Ian&Steve C.

Joined: 19 Jan 20

Posts: 3953

Credit: 46816242642

RAC: 64249415

petri33 wrote:B.t.w Have

4 May 2021 1:45:44 UTC

Message 185471 in response to message 185455

(moderation:

)

petri33 wrote:

B.t.w Have you tried running O3SA on your RTX2080(vanilla or Ti) or on your other similar to mine to take a look of run times for reference? "One task only, just one. -- Sir? -- One ping only, just one. It is time."

Thanks Ian&Steve C.

petri, I have run through several tasks. see the task results here: https://einsteinathome.org/host/12830750/tasks/0/55 (just ignore the aborted ones)

Running environment/setup:

CPU: Ryzen 9 5950X @ 4.45GHz
MEM: DDR4 3600MHz CL14 non-ECC UDIMM
GPU: RTX 2080 Ti @ 1980MHz; power limit 225W
Ubuntu 20.04.2, kernel 5.8.0-50, nvidia driver 460.73
all tasks were "h1_0298.40_O3aM1In1__O3ASE1_298.50Hz_xxxx_x"
CPU loaded with Universe@home on other threads, total CPU utilization ~92-95%
app_config.xml forcing 1CPU - 1GPU, only 1 task per GPU

Observations:

~1200MB GPU memory used

Task Progress:
- ~0-15%, (110-115% CPU thread utilization, ~75% GPU utilization)
- ~15-99%, (102-105% CPU thread utilization, ~85% GPU utilization)
- 99-100%, (99-100% CPU thread utilization, ~1-2% GPU utilization)

8 tasks, time to reach: 99% / 100% (m:ss)
- 5:07 / 7:07
- 5:08 / 7:09
- 5:00 / 7:00
- 5:02 / 7:02
- 5:04 / 7:05
- 5:07 / 7:02
- 5:00 / 6:59
- 5:07 / 7:07

So in my case, it always took about 2 minutes for that final 99-100% push. I'm sure CPU speed plays a big role in the time for this final "recalculation" to complete.

_________________________________________________________________________

petri33

Joined: 4 Mar 20

Posts: 123

Credit: 4037845819

RAC: 7024758

Thank you Ian. You sure

4 May 2021 4:11:14 UTC

Message 185474 in response to message 185471

(moderation:

)

Thank you Ian. You sure do have a fast CPU.

Petri

Gary Roberts

Moderator

Joined: 9 Feb 05

Posts: 5872

Credit: 117580189872

RAC: 35201131

petri33 wrote:Where can I

4 May 2021 4:36:04 UTC

Message 185475 in response to message 185439

(moderation:

)

petri33 wrote:

Where can I find the explanation for the three minute computation that used to take 50 seconds.

You can find Bernd's explanation in this message.

In particular, read the paragraph starting, "Regarding the "cleanup at 99%": ...".

Cheers,
Gary.

Raistmer*

Joined: 20 Feb 05

Posts: 208

Credit: 181419938

RAC: 8164

New is forgotten old quite

4 May 2021 7:39:43 UTC

Message 185479

(moderation:

)

New is forgotten old quite often :)))

Mutex approach was used in SETI app at the beginning of GPU era in AP ATi Brook+ apps if I recall correctly, before BOINC was able to support GPU at all. It was not only optimization need, rather compatibility one - cause BOINC didn't differentiate CPU and GPU apps, setup was to use more "CPU" instances and do inter-communications between them to schedule in BOINC-independent way. One, "global" mutex aligned number of CPU vs Brook+ GPU apps while second guarded GPU parts of Brook AstroPulse code (I didn't find good enough FFT lib that didn't require to pass data w/o moving it to system memory (and that killed all performance boost from "Brooked" part) hence full GPU load required few instances in fly).

EDIT: More precisely there was so called "team" version of AstroPulse. Via Anonymous platform of course.

BOINC knew about it's CPU part. When BONC scheduler launched CPU part of team app it checked global mutex for current number of GPU instances in fly. If target was reached it continued as CPU optimised app. If not - it launched Brook+ version and awaited its finish. From BOINC side of view there was CPU app associated with particular WU.

Same approach could be used in modern days to use devices, unknown to BOINC like FPGAs (actually modern BOINC version has some mechanism to describe such unknowns in XML configurations, but can;t say how good it - never tested).

bozz4science

Joined: 4 May 20

Posts: 15

Credit: 67643923

RAC: 2621

lol, that 5950X is a beast.

4 May 2021 12:12:27 UTC

Message 185486

(moderation:

)

lol, that 5950X is a beast. My 3700X needed roughly 6:30-7:00 min for the CPU-intensive toplist recalculation that essentially renders the O3 GW tasks much less rewarding than the O2-run WUs and FGRPB1G tasks alike.

Ian&Steve C.

Joined: 19 Jan 20

Posts: 3953

Credit: 46816242642

RAC: 64249415

bozz4science wrote: lol,

4 May 2021 13:42:21 UTC

Message 185491 in response to message 185486

(moderation:

)

bozz4science wrote:

lol, that 5950X is a beast. My 3700X needed roughly 6:30-7:00 min for the CPU-intensive toplist recalculation that essentially renders the O3 GW tasks much less rewarding than the O2-run WUs and FGRPB1G tasks alike.

unless you were running the CPU at 100% load (overcommiting with too many tasks) i wouldn't expect the 3700X to be more than ~20% slower clock for clock since this is a single threaded process.

i took a peek at your tasks and noticed your wingmen were usually doing the final toplist in 2-3mins too. even on arguably weaker CPUs like an i3-9100. you really need to leave spare CPU resources for the GW tasks, they are CPU bound. the faster your GPU is, the faster your CPU needs to be. and you can never run with 100% CPU, it'll just slow everything down. I also recommend running an app_config.xml file to force 1CPU-1GPU so that BOINC properly accounts for free resources and doesn't try to run too many tasks.

_________________________________________________________________________

bozz4science

Joined: 4 May 20

Posts: 15

Credit: 67643923

RAC: 2621

Thanks Ian&Steve for taking a

4 May 2021 16:58:47 UTC

Message 185496 in response to message 185491

(moderation:

)

Thanks Ian&Steve for taking a look! This surprised me as well, but somehow all of my E@H tasks showed some weirdness lateley. Threw 15+ errors recently on my system while not having changed any setting and everything running smoothly before. I'll likely revisit this tomorrow morning after running a few tasks over night. I do run an app config at a setting of 1 CPU & 1 GPU and always leave one thread for system overhead, so that is cannot be an issue. Might however be that at the time of processing, windows spawned some stupid system processes, that might have temporarily lead to CPU overcommitment... I'll keep an closer look next time I run these tasks.

Thanks again for your advice!

Ian&Steve C.

Joined: 19 Jan 20

Posts: 3953

Credit: 46816242642

RAC: 64249415

ran through some more O3ASE

6 May 2021 15:56:48 UTC

Message 185544 in response to message 185471

(moderation:

)

ran through some more O3ASE tasks on a different platform.

see the task results here: https://einsteinathome.org/host/12803486/tasks/0/55

Running environment/setup:

CPU: EPYC 7402P @ 3.30GHz
MEM: DDR4 3200MHz CL22 ECC RDIMM
GPU: RTX 2080 Ti @ 1980MHz; power limit 225W
Ubuntu 20.04.2, kernel 5.8.0-50, nvidia driver 460.73
all tasks were "h1_0398.80_O3aM1In1__O3ASE1_399.00Hz_xxxx_x"
CPU loaded with Universe@home on other threads, total CPU utilization ~95%
app_config.xml forcing 1CPU - 1GPU, only 1 task per GPU

Observations:

~1200MB GPU memory used

Task Progress:
- ~0-15%, (115-118% CPU thread utilization, ~70% GPU utilization)
- ~15-99%, (105-108% CPU thread utilization, ~80% GPU utilization)
- 99-100%, (99-100% CPU thread utilization, ~0% GPU utilization)

8 tasks, time to reach: 99% / 100% (m:ss)
- 5:34 / 9:31
- 5:24 / 9:19
- 5:28 / 9:24
- 5:30 / 9:32
- 5:35 / 9:31
- 5:27 / 9:23
- 5:41 / 9:35
- 5:30 / 9:27

So in this case, it always took around 4 minutes for that final 99-100% push. Slower CPU/Mem here, but also I think the type of task (higher freq, and DF?) play some role too. I'll have to see what the wingmen do.

_________________________________________________________________________

GWGeorge007

Joined: 8 Jan 18

Posts: 3062

Credit: 4966517686

RAC: 1411111

Ian&Steve C. wrote: ran

6 May 2021 17:34:58 UTC

Message 185545 in response to message 185544

(moderation:

)

Ian&Steve C. wrote:

ran through some more O3ASE tasks on a different platform.

see the task results here: https://einsteinathome.org/host/12803486/tasks/0/55

Running environment/setup:

CPU: EPYC 7402P @ 3.30GHz

MEM: DDR4 3200MHz CL22 ECC RDIMM

GPU: RTX 2080 Ti @ 1980MHz; power limit 225W

Ubuntu 20.04.2, kernel 5.8.0-50, nvidia driver 460.73

all tasks were "h1_0398.80_O3aM1In1__O3ASE1_399.00Hz_xxxx_x"

CPU loaded with Universe@home on other threads, total CPU utilization ~95%

app_config.xml forcing 1CPU - 1GPU, only 1 task per GPU

Observations:

~1200MB GPU memory used

Task Progress:

~0-15%, (115-118% CPU thread utilization, ~70% GPU utilization)

~15-99%, (105-108% CPU thread utilization, ~80% GPU utilization)

99-100%, (99-100% CPU thread utilization, ~0% GPU utilization)

8 tasks, time to reach: 99% / 100% (m:ss)

5:34 / 9:31

5:24 / 9:19

5:28 / 9:24

5:30 / 9:32

5:35 / 9:31

5:27 / 9:23

5:41 / 9:35

5:30 / 9:27

So in this case, it always took around 4 minutes for that final 99-100% push. Slower CPU/Mem here, but also I think the type of task (higher freq, and DF?) play some role too. I'll have to see what the wingmen do.

Just as a point of comparison, I too have some O3ASE tasks on a 3950X platform.

Running environment/setup:

CPU: Ryzen 9 3950X at 4,299 MHz (OC)
MEM: DDR4 3200MHz CL14 Non-ECC DIMM
GPU: (2x) RTX 2070 Super @ ~1980 MHz and ~2025 MHz; No power limit (presently peak draw ~200W)
Windows 10 version 1909 (OS Build 18363.1379)
Tasks were "h1_0399.80_O3aM1In1__O3ASE1_399.80Hz_xxxx_x" and "h1_0399.80_O3aM1In1__O3ASE1_399.40Hz_xxxx_x"
CPU loaded with Universe@home and Milkyway@home on other threads, total CPU utilization 100%
app_config.xml forcing 1CPU - 1GPU, only 1 task per GPU

Observations:

~1,120 MB to ~1,520 GPU memory used
~1,750 MHz (both) memory clock

My present progress for O3ASE tasks for my 3950X (computer 12851564):

Too few to make a real observation. The CPU time required to complete the 5 valid tasks are ~895 secs (14:54 min:sec) up to ~918 secs (15:18 min:sec)

I have no Gamma-ray Pulsar tasks currently in progress in my tasks. https://einsteinathome.org/host/12851564/tasks/0/0

George

Proud member of the Old Farts Association

Discussion Thread for the Continuous GW Search known as O2MD1 (now O2MDF - GPUs only)

Forums › Cruncher's Corner

Comment viewing options

Forums › Cruncher's Corner