Multi-Directional Gravitational Wave Search on O3 data (O3MD1/F)

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 2,143
Credit: 2,923,652,861
RAC: 923,457

Ta. I'm used to projects

Ta. I'm used to projects where you have to read the errors from the bottom up. Yes, 0.6/0.4 will do it - I'll go round the shrubbery again.

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4,305
Credit: 248,815,217
RAC: 33,259

The workunit generator of

The workunit generator of O3MD1(V2) (CPU) ran wild over the weekend and generated way too many WUs (1M) and Tasks (2M). The run needs to be re-started, but probably not this Year. GPU will continue. We also need to review our memory requirements, the G1 task were estimated to take 1.8GB, they really need >3GB.

BM

mikey
mikey
Joined: 22 Jan 05
Posts: 12,547
Credit: 1,838,665,704
RAC: 10,209

Ian&Steve C. wrote:Boca

Ian&Steve C. wrote:

Boca Raton Community HS wrote:

Each CPU task is requiring ~2 GB of ram.(!) I don't think I have ever seen tasks with such large memory requirements. Our systems are chewing away at them, but wow- very memory intensive. 

I'm not sure about currently, but I know at one time Rosetta@home was also using about 2GB per task.

 

GPUGRID's Python tasks, which are a hybrid CUDA/MT task, use ~10GB system ram, ~3GB VRAM, and 32+ cores for each task lol. 

I'm running those a 12/24 core Ryzen, that is running a different Boinc Project on 23 of the cpu cores, and an Nvidia 3060 and they are not taking 10gb or ram for each task, they are taking a long time to run though:

208,848.50 208,848.50 87,500.00

Python apps for GPU hosts v4.03 (cuda1131)

As for the O3 gpu tasks I am doing really good on those:

 

All (4558) In Progress (584) Pending (571) Valid (3232) and Error (170)

GWGeorge007
GWGeorge007
Joined: 8 Jan 18
Posts: 2,997
Credit: 4,926,044,438
RAC: 129,282

mikey

mikey wrote:

       

As for the O3 gpu tasks I am doing really good on those:

All (4558) In Progress (584) Pending (571) Valid (3232) and Error (170)

Hi Mikey,

What did you do to have a nice, successful processing of the 03 GPU tasks?

I don't have a single validation, and I have a bunch of errors.  Another member said that I may not have the executions set to be enabled in my app.  I don't recall any app...  where would it be?

George

Proud member of the Old Farts Association

Boca Raton Community HS
Boca Raton Comm...
Joined: 4 Nov 15
Posts: 232
Credit: 9,504,808,920
RAC: 23,132,538

GWGeorge007 wrote: mikey

GWGeorge007 wrote:

mikey wrote:

       

As for the O3 gpu tasks I am doing really good on those:

All (4558) In Progress (584) Pending (571) Valid (3232) and Error (170)

Hi Mikey,

What did you do to have a nice, successful processing of the 03 GPU tasks?

I don't have a single validation, and I have a bunch of errors.  Another member said that I may not have the executions set to be enabled in my app.  I don't recall any app...  where would it be?

 

We did not really have too many issues with these tasks either (like Mikey). We were running three of the GPU tasks simultaneously. I ran them as hard as we could for about a week to be able to send a large enough sample set of completed tasks back in order to be somewhat helpful (well, hopefully large enough). 

All (6481)
In progress (4)
Pending (267)
Valid (6119)
Invalid (0)
Error (84)

 

GWGeorge007
GWGeorge007
Joined: 8 Jan 18
Posts: 2,997
Credit: 4,926,044,438
RAC: 129,282

Boca Raton Community HS

Boca Raton Community HS wrote:

We did not really have too many issues with these tasks either (like Mikey). We were running three of the GPU tasks simultaneously. I ran them as hard as we could for about a week to be able to send a large enough sample set of completed tasks back in order to be somewhat helpful (well, hopefully large enough). 

All (6481)
In progress (4)
Pending (267)
Valid (6119)
Invalid (0)
Error (84)

Thanks for the response.  I know I'll have to wait at least a couple of weeks to try it again.

Did you have to set the permissions for execution in order to get the tasks completed?  I just set mine now.

George

Proud member of the Old Farts Association

Ian&Steve C.
Ian&Steve C.
Joined: 19 Jan 20
Posts: 3,912
Credit: 43,735,979,309
RAC: 63,197,745

mikey wrote:I'm running

mikey wrote:

I'm running those a 12/24 core Ryzen, that is running a different Boinc Project on 23 of the cpu cores, and an Nvidia 3060 and they are not taking 10gb or ram for each task

when I said 10GB, I was referring to "system RAM", ie CPU memory. I was also rounding up to give yourself some breathing room. my big system with 2x 3060, running 4 tasks each (8 tasks total) uses ~76GB system memory.

not video memory or VRAM. uses about 3GB for each task on VRAM.

_________________________________________________________________________

Mr P Hucker
Mr P Hucker
Joined: 12 Aug 06
Posts: 838
Credit: 518,657,359
RAC: 40,426

Keith Myers wrote: Not when

Keith Myers wrote:

Not when they impact all the other users who don't want to run beta applications.

The O3MD* work generators are running unthrottled and producing more than enough work for the few users who want to run beta tasks.

But they have overloaded the RTS buffers and everybody else that is running Gamma Ray and BRP4/7 work is getting no work when requested even though there is plenty of it in the Ready to Send categories.

The beta work is swamping the download servers and schedulers and preventing all the other work from being sent out.

I am down over a thousand tasks in my 3 card hosts from my set cache levels and continuing to fall without replenishment. I will be out of work in just 8 hours.

Seems like the server isn't too bright (although this is Boinc....)

You would think if there is A B C D and E needing done, the scheduler would take even amount of each.  A seperate queue of each.  Users take from whichever queue(s) they want.  When a queue is running low, it takes more from the relevant generator.  It would be monumentally stupid to just allow 1 generator to fill the scheduler up with 1 type of task.  Milkyway for example has 10000 tasks queued for seperation and 1000 for nbody.  One doesn't swamp out the other.  What's gone wrong here?

If this page takes an hour to load, reduce posts per page to 20 in your settings, then the tinpot 486 Einstein uses can handle it.

Mr P Hucker
Mr P Hucker
Joined: 12 Aug 06
Posts: 838
Credit: 518,657,359
RAC: 40,426

Elphidieus wrote:Beta

Elphidieus wrote:

Beta Settings: Run Test Applications = Yes, as long as they are NATIVE Arms app, neither Intel nor Legacy apps...

Allow non-preferred apps = Already No...

 

Looks like I have to turn Beta Settings OFF then... sad...

 

 

Thanks archae86...

Something is up if you allow beta and it gives you beta of an app you haven't selected.

If this page takes an hour to load, reduce posts per page to 20 in your settings, then the tinpot 486 Einstein uses can handle it.

Keith Myers
Keith Myers
Joined: 11 Feb 11
Posts: 4,895
Credit: 18,436,732,493
RAC: 5,700,752

Milkyway had the same issue

Milkyway had the same issue with N-body tasks swamping the download server buffers.  Nobody was getting any Separation work even though there was plenty in the RTS buffers.

The RTS category is not the same thing as the download buffer.  If projects follow suit as how Seti servers were configured, the download buffer holds 100 tasks.  That is all.  When you hit the scheduler for a work request the scheduler fills it out of that download server buffer of exactly 100 tasks.

When it gets emptied, it refills from all the Ready to Send sub-project caches.  When you hit the scheduler right after a fast host has just emptied it right before your scheduler connection is serviced, the buffer is empty and you get the no tasks to send message.

When the Ready to Send caches of a single sub-project are 10X -100X the size of the other sub-project caches,  the download buffer will be swamped and filled entirely by the unthrottled work 100X oversized cache and there will not be a single type of other work in that 100 task buffer.

So you get the same message from the scheduler . . . no work to send.  The end result is that the one sub-project, in our case the new O3MD* work completely excluded all other sub-project work from being available.

 

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.