Gravitational Wave search O2 Multi-Directional ("O2MD1")

Bernd Machenschalk

Moderator

Administrator

Joined: 15 Oct 04

Posts: 4312

Credit: 250459996

RAC: 35134

This memory issue is weird.

7 Jul 2020 11:35:30 UTC

Message 178819

(moderation:

)

This memory issue is weird. While investigating I suspended the GW search for now.

Gary Roberts

Moderator

Joined: 9 Feb 05

Posts: 5872

Credit: 117564679975

RAC: 35284350

In case anyone is wondering

7 Jul 2020 21:55:08 UTC

Message 178833

(moderation:

)

In case anyone is wondering about Bernd's message and what it's referring to, it's certainly not about the comments that immediately preceded it.

As regular readers would know, there have been ongoing issues with some GPUs being unable to handle the memory requirements of some tasks in the current VelaJr series. Back in April, Bernd announced that changes had been made to the scheduler to prevent unsuitable GPUs from receiving tasks for which they had insufficient memory.

Recently, I noticed a bit of a spike in resend tasks on my test machine. There were a bunch of consecutive issue numbers all of which would need close to the maximum amount of memory to process. On checking, I found they had originally been assigned to a host that had a 2GB GTX 650Ti GPU. It had around 100 failed tasks interspersed with a small number of successful GRP tasks which were probably what was partially restoring the daily limit and allowing the host to continue receiving more GW tasks.

I sent Bernd a PM, pointing him towards that host and asking if he might look into why the scheduler was sending high memory tasks to such a lowly GPU. I guess the above message is in response to the PM and not other messages in this thread.

Hopefully, why the scheduler is apparently not behaving as intended might get found quickly and we can get back to processing these tasks quite soon.

Thanks and good luck to Bernd as he sorts this out!

Cheers,
Gary.

Bernd Machenschalk

Moderator

Administrator

Joined: 15 Oct 04

Posts: 4312

Credit: 250459996

RAC: 35134

Turns out that there are

8 Jul 2020 14:14:00 UTC

Message 178851

(moderation:

)

Turns out that there are occasions where our estimates of required memory are way off (i.e. underestimate by almost a factor of two), so these tasks are sent to GPUs which certainly can't handle these. We'll fix the memory estimates, then continue the analysis.

Ian&Steve C.

Joined: 19 Jan 20

Posts: 3952

Credit: 46785822642

RAC: 64189200

tbf, I posted about this

8 Jul 2020 16:45:34 UTC

Message 178855 in response to message 178833

(moderation:

)

tbf, I posted about this issue over 2 months ago with specifics about exactly what is wrong and why it's happening, its not newly discovered. second post on this page: https://einsteinathome.org/content/discussion-thread-continuous-gw-search-known-o2md1-now-o2mdf-gpus-only?page=26, and you can see my reference to it on the previous page in this very thread with my comment dated May 6th.

the link to direct comments seems to not work on this forum anymore. a link to direct comments takes you to the wrong comment most of the time.

_________________________________________________________________________

[TA]Assimilator1

Joined: 22 Jan 05

Posts: 12

Credit: 189916619

RAC: 8815

Gary Roberts wrote: I turned

8 Jul 2020 19:25:06 UTC

Message 178860 in response to message 178797

(moderation:

)

Gary Roberts wrote:

I turned off sigs 15 years ago. Forum response times are now just slow, rather than completely woeful.

Even when people properly describe their hardware in a sig (and leave out all the other crap), I usually find that the host's 'details' page plus the information from the tasks list on the website gives a much quicker and easier route to problem resolution.

When reporting a problem, one of the most useful things to give is a direct link to the details page for the host in question. That page then has a direct link to the tasks. That will work even if hosts are 'hidden'.

Lol, ok, will do :) (if I remember!)

Team AnandTech - SETI@H, Muon1 DPAD, F@H, MW@H, A@H, LHC@H, POGS, R@H.

Main rig - Ryzen 5 3600, 32GB DDR4 3200, RTX 3060Ti 8GB, Win10 64bit

2nd rig - i7 4930k @4.1 GHz, 16GB DDR3 1866, HD 7870 XT 3GB(DS), Win 7 64bit

Bernd Machenschalk

Moderator

Administrator

Joined: 15 Oct 04

Posts: 4312

Credit: 250459996

RAC: 35134

We will probably generate the

20 Jul 2020 11:58:00 UTC

Message 179081

(moderation:

)

We will probably generate the last workunits of the current "sub-run" ("O2MDFV2i") tomorrow. After that there will be some thousand workunits still left to be finished, but for some time, no new GW work will be generated until we finished preparing the next run. There should be plenty of work for the Gamma-Ray pulsar search in the meantime.

DanNeely

Joined: 4 Sep 05

Posts: 1364

Credit: 3562358667

RAC: 0

Bernd Machenschalk wrote: We

21 Jul 2020 3:37:58 UTC

Message 179102 in response to message 179081

(moderation:

)

Bernd Machenschalk wrote:

We will probably generate the last workunits of the current "sub-run" ("O2MDFV2i") tomorrow. After that there will be some thousand workunits still left to be finished, but for some time, no new GW work will be generated until we finished preparing the next run. There should be plenty of work for the Gamma-Ray pulsar search in the meantime.

Will the next run be both CPU and GPU, or just GPU only again?

Bernd Machenschalk

Moderator

Administrator

Joined: 15 Oct 04

Posts: 4312

Credit: 250459996

RAC: 35134

DanNeely wrote: Will the

21 Jul 2020 9:46:18 UTC

Message 179104 in response to message 179102

(moderation:

)

DanNeely wrote:

Will the next run be both CPU and GPU, or just GPU only again?

That is still to be decided. If possible with reasonable effort we'll run part of it on the CPUs.

JohnMD

Joined: 11 May 12

Posts: 5

Credit: 26039195

RAC: 0

Bernd Machenschalk

11 Aug 2020 22:03:29 UTC

Message 179459 in response to message 178851

(moderation:

)

Bernd Machenschalk wrote:

Turns out that there are occasions where our estimates of required memory are way off (i.e. underestimate by almost a factor of two), so these tasks are sent to GPUs which certainly can't handle these. We'll fix the memory estimates, then continue the analysis.

Can you give a shout when your estimates are good enough for me to retry my 2GB Nvidia ?

Keith Myers

Joined: 11 Feb 11

Posts: 4964

Credit: 18718417433

RAC: 6372526

That won't be until the

11 Aug 2020 22:15:43 UTC

Message 179460 in response to message 179459

(moderation:

)

That won't be until the project once again distributes GW work. Watch for announcements in News forum.

Gravitational Wave search O2 Multi-Directional ("O2MD1")

Forums › Technical News

Comment viewing options

Forums › Technical News