All-Sky Gravitational Wave Search on O3 data (O3ASHF1)

Wedge009

Joined: 5 Mar 05

Posts: 138

Credit: 17912899820

RAC: 7382849

Bernd Machenschalk wrote:On

3 Dec 2024 22:06:09 UTC

Message 230459 in response to message 230444

(moderation:

)

Bernd Machenschalk wrote:

On our machines the runtimes of the old an new workunits are comparable. So far I don't have enough results for a statistical evaluation.

My observations thus far across several of my hosts (those that have been able to get any new O3 tasks, that is) with different architectures and power ratings suggests that the run-time for the new low-frequency tasks are 2-3 times of the old high-frequency tasks. That in itself isn't a major problem, but the estimated run-time for the new tasks is a fraction of the old tasks. This causes a huge skew in work requests and I suspect that as we collectively make this transition to the new tasks, people's machines are requesting on the basis of estimated time for the old tasks and therefore are over-requesting work, hence the dearth of available work for O3.

Edit: If run-times are similar between old and new tasks for CUDA-based applications, then there may need to be some optimisation of the OpenCL applications. I only have v1.07 available.

Soli Deo Gloria

Ian&Steve C.

Joined: 19 Jan 20

Posts: 4147

Credit: 49555448650

RAC: 36146723

Wedge009 wrote:Edit: If

3 Dec 2024 22:43:58 UTC

Message 230461 in response to message 230459

(moderation:

)

Wedge009 wrote:

Edit: If run-times are similar between old and new tasks for CUDA-based applications, then there may need to be some optimisation of the OpenCL applications. I only have v1.07 available.

they're not. it's about the same 2-3x runtime that you observed on 1.07 with OpenCL. though I havent tested both 1.08 and 1.14 apps extensively or checked further optimization yet.

just need to wait for more tasks to be available to do more testing.

_________________________________________________________________________

Boca Raton Comm...

Joined: 4 Nov 15

Posts: 302

Credit: 11330115900

RAC: 12388542

We also saw the 2x-3x

3 Dec 2024 23:07:02 UTC

Message 230462

(moderation:

)

We also saw the 2x-3x slowdown for the brief period that I saw a few of these tasks run (cuda).

I didn't catch them on the windows boxes (opencl).

If this app is going to be around for a while, I hope to see a windows version of the cuda application at some point! If not, opencl it is (3x slower than cuda).

John

Joined: 17 Jan 18

Posts: 7

Credit: 3630301890

RAC: 8865794

I havent received any OAS

3 Dec 2024 23:41:47 UTC

Message 230464

(moderation:

)

I havent received any OAS work since Monday

tish

Joined: 12 Jan 20

Posts: 6

Credit: 1403539404

RAC: 7403290

and me too. why?

4 Dec 2024 1:39:12 UTC

Message 230465 in response to message 230464

(moderation:

)

and me too.

why?

Wedge009

Joined: 5 Mar 05

Posts: 138

Credit: 17912899820

RAC: 7382849

Wedge009 wrote:...the

4 Dec 2024 2:11:03 UTC

Message 230466 in response to message 230459

(moderation:

)

Wedge009 wrote:

...the estimated run-time for the new tasks is a fraction of the old tasks. This causes a huge skew in work requests and I suspect that as we collectively make this transition to the new tasks, people's machines are requesting on the basis of estimated time for the old tasks and therefore are over-requesting work, hence the dearth of available work for O3.

I forgot to mention that compounding the problem is the short deadline for these new tasks. I suppose some tasks may become available for redistribution after they're expired/aborted.

Soli Deo Gloria

Speedy

Joined: 11 Aug 05

Posts: 41

Credit: 24980179

RAC: 43953

tish wrote: and me

4 Dec 2024 2:27:50 UTC

Message 230468 in response to message 230465

(moderation:

)

tish wrote:

and me too.

why?

Most likely because there is a short supply of work available at the moment. No idea when supply will improve

br3achd

Joined: 14 Jun 24

Posts: 8

Credit: 153021562

RAC: 823150

As a point of reference from

4 Dec 2024 7:55:17 UTC

Message 230475 in response to message 230462

(moderation:

)

As a point of reference from a windows 11 pc running the stock version of boinc on a 4070 super + amd 7800x3d cpu: the new lower frequency tasks seem to be about the same ~3x increase in run time you guys are experiencing. They're up from ~10-11 to just under 30 minutes run time for single task processing.

If the new tasks still give 10,000 credit per task, the 3x increase makes them significantly "worse" to run from a RAC perspective for me in comparison to the BRP7 tasks, which complete in just under 6 minutes (~555.5 credit/min for BRP7 vs ~344.8 credit/min for 03 currently and ~900-1000 credit/min previously). Does anyone know if there are any plans to adjust the credit given for the new gravity wave tasks? Or maybe we are just being incentivized to run BRP7 haha

Oliver Behnke

Moderator

Administrator

Joined: 4 Sep 07

Posts: 987

Credit: 25171438

RAC: 0

Wedge009 wrote:I forgot to

4 Dec 2024 9:33:00 UTC

Message 230479 in response to message 230466

(moderation:

)

Wedge009 wrote:

I forgot to mention that compounding the problem is the short deadline for these new tasks. I suppose some tasks may become available for redistribution after they're expired/aborted.

There's a good reason for the small task pool and the short deadline: we're acting responsibly. See, whenever we set up a new run we run internal tests before a public release. However, given the vast heterogeneity of the volunteer system out there, we are rolling out new runs in stages. The first set of ~800 workunits is used to see if things work as expected. That is with regards to resource requirements, runtime/credit, stability and correctness. This way we minimize the waste of precious compute cycles at your end, caused by a potentially flawed release. Depending on the results we might issue a larger set or go full throttle. In order to keep the staging phase as short as possible we're setting the deadline to, say, 3 days only (instead of the usual 14). This way we're getting the required feedback quickly and can speed up the rollout for everyone.

Hope this helps,
Oliver

Einstein@Home Project

Link

Joined: 15 Mar 20

Posts: 137

Credit: 12911340

RAC: 48808

Oliver Behnke wrote:In order

4 Dec 2024 10:00:57 UTC

Message 230480 in response to message 230479

(moderation:

)

Oliver Behnke wrote:

In order to keep the staging phase as short as possible we're setting the deadline to, say, 3 days only (instead of the usual 14). This way we're getting the required feedback quickly and can speed up the rollout for everyone.

That makes perfect sense and the deadline wouldn't be an issue if the estimated runtime wasn't completely wrong, it's around 20% of the old tasks while the actual runtime is pretty much exactly 3x of the old tasks (on my system at least), that's a huge discrepancy. So my BOINC client was happily asking for more and more of those 17 minute tasks and than it found out, that in reality they take around 4 and a half hours, so now it's in panic mode and I might need to abort some of the tasks.

All-Sky Gravitational Wave Search on O3 data (O3ASHF1)

Forums › Technical News

Comment viewing options

Forums › Technical News