Multi-Directional Gravitational Wave Search on O3 data (O3MD1/F)

Richard Haselgrove

Joined: 10 Dec 05

Posts: 2143

Credit: 2974194662

RAC: 799444

Ta. I'm used to projects

12 Dec 2022 14:58:39 UTC

Message 205088 in response to message 205087

(moderation:

)

Ta. I'm used to projects where you have to read the errors from the bottom up. Yes, 0.6/0.4 will do it - I'll go round the shrubbery again.

Bernd Machenschalk

Moderator

Administrator

Joined: 15 Oct 04

Posts: 4330

Credit: 251353000

RAC: 37121

The workunit generator of

12 Dec 2022 15:59:52 UTC

Message 205090

(moderation:

)

The workunit generator of O3MD1(V2) (CPU) ran wild over the weekend and generated way too many WUs (1M) and Tasks (2M). The run needs to be re-started, but probably not this Year. GPU will continue. We also need to review our memory requirements, the G1 task were estimated to take 1.8GB, they really need >3GB.

mikey

Joined: 22 Jan 05

Posts: 12761

Credit: 1846267354

RAC: 598723

Ian&Steve C. wrote:Boca

12 Dec 2022 17:38:26 UTC

Message 205093 in response to message 205079

(moderation:

)

Ian&Steve C. wrote:

Boca Raton Community HS wrote:

Each CPU task is requiring ~2 GB of ram.(!) I don't think I have ever seen tasks with such large memory requirements. Our systems are chewing away at them, but wow- very memory intensive.

I'm not sure about currently, but I know at one time Rosetta@home was also using about 2GB per task.

GPUGRID's Python tasks, which are a hybrid CUDA/MT task, use ~10GB system ram, ~3GB VRAM, and 32+ cores for each task lol.

I'm running those a 12/24 core Ryzen, that is running a different Boinc Project on 23 of the cpu cores, and an Nvidia 3060 and they are not taking 10gb or ram for each task, they are taking a long time to run though:

208,848.50

87,500.00

Python apps for GPU hosts v4.03 (cuda1131)

As for the O3 gpu tasks I am doing really good on those:

All (4558) In Progress (584) Pending (571) Valid (3232) and Error (170)

GWGeorge007

Joined: 8 Jan 18

Posts: 3109

Credit: 4995473485

RAC: 1145899

mikey

12 Dec 2022 18:17:52 UTC

Message 205100 in response to message 205093

(moderation:

)

mikey wrote:

As for the O3 gpu tasks I am doing really good on those:

All (4558) In Progress (584) Pending (571) Valid (3232) and Error (170)

Hi Mikey,

What did you do to have a nice, successful processing of the 03 GPU tasks?

I don't have a single validation, and I have a bunch of errors. Another member said that I may not have the executions set to be enabled in my app. I don't recall any app... where would it be?

George

Proud member of the Old Farts Association

Boca Raton Comm...

Joined: 4 Nov 15

Posts: 263

Credit: 10819575802

RAC: 13279174

GWGeorge007 wrote: mikey

12 Dec 2022 18:35:07 UTC

Message 205103 in response to message 205100

(moderation:

)

GWGeorge007 wrote:

mikey wrote:

As for the O3 gpu tasks I am doing really good on those:

All (4558) In Progress (584) Pending (571) Valid (3232) and Error (170)

Hi Mikey,

What did you do to have a nice, successful processing of the 03 GPU tasks?

I don't have a single validation, and I have a bunch of errors. Another member said that I may not have the executions set to be enabled in my app. I don't recall any app... where would it be?

We did not really have too many issues with these tasks either (like Mikey). We were running three of the GPU tasks simultaneously. I ran them as hard as we could for about a week to be able to send a large enough sample set of completed tasks back in order to be somewhat helpful (well, hopefully large enough).

All (6481)
In progress (4)
Pending (267)
Valid (6119)
Invalid (0)
Error (84)

GWGeorge007

Joined: 8 Jan 18

Posts: 3109

Credit: 4995473485

RAC: 1145899

Boca Raton Community HS

12 Dec 2022 19:07:22 UTC

Message 205107 in response to message 205103

(moderation:

)

Boca Raton Community HS wrote:

We did not really have too many issues with these tasks either (like Mikey). We were running three of the GPU tasks simultaneously. I ran them as hard as we could for about a week to be able to send a large enough sample set of completed tasks back in order to be somewhat helpful (well, hopefully large enough).

All (6481)
In progress (4)
Pending (267)
Valid (6119)
Invalid (0)
Error (84)

Thanks for the response. I know I'll have to wait at least a couple of weeks to try it again.

Did you have to set the permissions for execution in order to get the tasks completed? I just set mine now.

George

Proud member of the Old Farts Association

Ian&Steve C.

Joined: 19 Jan 20

Posts: 4028

Credit: 47821642092

RAC: 39651595

mikey wrote:I'm running

12 Dec 2022 20:12:12 UTC

Message 205109 in response to message 205093

(moderation:

)

mikey wrote:

I'm running those a 12/24 core Ryzen, that is running a different Boinc Project on 23 of the cpu cores, and an Nvidia 3060 and they are not taking 10gb or ram for each task

when I said 10GB, I was referring to "system RAM", ie CPU memory. I was also rounding up to give yourself some breathing room. my big system with 2x 3060, running 4 tasks each (8 tasks total) uses ~76GB system memory.

not video memory or VRAM. uses about 3GB for each task on VRAM.

_________________________________________________________________________

Mr P Hucker

Joined: 12 Aug 06

Posts: 838

Credit: 519573811

RAC: 13853

Keith Myers wrote: Not when

12 Dec 2022 20:21:28 UTC

Message 205110 in response to message 205054

(moderation:

)

Keith Myers wrote:

Not when they impact all the other users who don't want to run beta applications.

The O3MD* work generators are running unthrottled and producing more than enough work for the few users who want to run beta tasks.

But they have overloaded the RTS buffers and everybody else that is running Gamma Ray and BRP4/7 work is getting no work when requested even though there is plenty of it in the Ready to Send categories.

The beta work is swamping the download servers and schedulers and preventing all the other work from being sent out.

I am down over a thousand tasks in my 3 card hosts from my set cache levels and continuing to fall without replenishment. I will be out of work in just 8 hours.

Seems like the server isn't too bright (although this is Boinc....)

You would think if there is A B C D and E needing done, the scheduler would take even amount of each. A seperate queue of each. Users take from whichever queue(s) they want. When a queue is running low, it takes more from the relevant generator. It would be monumentally stupid to just allow 1 generator to fill the scheduler up with 1 type of task. Milkyway for example has 10000 tasks queued for seperation and 1000 for nbody. One doesn't swamp out the other. What's gone wrong here?

If this page takes an hour to load, reduce posts per page to 20 in your settings, then the tinpot 486 Einstein uses can handle it.

Mr P Hucker

Joined: 12 Aug 06

Posts: 838

Credit: 519573811

RAC: 13853

Elphidieus wrote:Beta

12 Dec 2022 20:49:04 UTC

Message 205112 in response to message 205068

(moderation:

)

Elphidieus wrote:

Beta Settings: Run Test Applications = Yes, as long as they are NATIVE Arms app, neither Intel nor Legacy apps...

Allow non-preferred apps = Already No...

Looks like I have to turn Beta Settings OFF then... sad...

Thanks archae86...

Something is up if you allow beta and it gives you beta of an app you haven't selected.

If this page takes an hour to load, reduce posts per page to 20 in your settings, then the tinpot 486 Einstein uses can handle it.

Keith Myers

Joined: 11 Feb 11

Posts: 5002

Credit: 18872489334

RAC: 6151158

Milkyway had the same issue

12 Dec 2022 21:31:40 UTC

Message 205113 in response to message 205110

(moderation:

)

Milkyway had the same issue with N-body tasks swamping the download server buffers. Nobody was getting any Separation work even though there was plenty in the RTS buffers.

The RTS category is not the same thing as the download buffer. If projects follow suit as how Seti servers were configured, the download buffer holds 100 tasks. That is all. When you hit the scheduler for a work request the scheduler fills it out of that download server buffer of exactly 100 tasks.

When it gets emptied, it refills from all the Ready to Send sub-project caches. When you hit the scheduler right after a fast host has just emptied it right before your scheduler connection is serviced, the buffer is empty and you get the no tasks to send message.

When the Ready to Send caches of a single sub-project are 10X -100X the size of the other sub-project caches, the download buffer will be swamped and filled entirely by the unthrottled work 100X oversized cache and there will not be a single type of other work in that 100 task buffer.

So you get the same message from the scheduler . . . no work to send. The end result is that the one sub-project, in our case the new O3MD* work completely excluded all other sub-project work from being available.

Multi-Directional Gravitational Wave Search on O3 data (O3MD1/F)

Forums › Technical News

Comment viewing options

Forums › Technical News