Gravitational Wave search O2 Multi-Directional ("O2MD1")

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4312
Credit: 250585392
RAC: 34471

This memory issue is weird.

This memory issue is weird. While investigating I suspended the GW search for now.

BM

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5872
Credit: 117695609147
RAC: 35069014

In case anyone is wondering

In case anyone is wondering about Bernd's message and what it's referring to, it's certainly not about the comments that immediately preceded it.

As regular readers would know, there have been ongoing issues with some GPUs being unable to handle the memory requirements of some tasks in the current VelaJr series.  Back in April, Bernd announced that changes had been made to the scheduler to prevent unsuitable GPUs from receiving tasks for which they had insufficient memory.

Recently, I noticed a bit of a spike in resend tasks on my test machine.  There were a bunch of consecutive issue numbers all of which would need close to the maximum amount of memory to process.  On checking, I found they had originally been assigned to a host that had a 2GB GTX 650Ti GPU.  It had around 100 failed tasks interspersed with a small number of successful GRP tasks which were probably what was partially restoring the daily limit and allowing the host to continue receiving more GW tasks.

I sent Bernd a PM, pointing him towards that host and asking if he might look into why the scheduler was sending high memory tasks to such a lowly GPU.  I guess the above message is in response to the PM and not other messages in this thread.

Hopefully, why the scheduler is apparently not behaving as intended might get found quickly and we can get back to processing these tasks quite soon.

Thanks and good luck to Bernd as he sorts this out!

Cheers,
Gary.

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4312
Credit: 250585392
RAC: 34471

Turns out that there are

Turns out that there are occasions where our estimates of required memory are way off (i.e. underestimate by almost a factor of two), so these tasks are sent to GPUs which certainly can't handle these. We'll fix the memory estimates, then continue the analysis.

BM

Ian&Steve C.
Ian&Steve C.
Joined: 19 Jan 20
Posts: 3958
Credit: 47039532642
RAC: 65116705

tbf, I posted about this

tbf, I posted about this issue over 2 months ago with specifics about exactly what is wrong and why it's happening, its not newly discovered. second post on this page: https://einsteinathome.org/content/discussion-thread-continuous-gw-search-known-o2md1-now-o2mdf-gpus-only?page=26, and you can see my reference to it on the previous page in this very thread with my comment dated May 6th.

 

the link to direct comments seems to not work on this forum anymore. a link to direct comments takes you to the wrong comment most of the time.

_________________________________________________________________________

[TA]Assimilator1
[TA]Assimilator1
Joined: 22 Jan 05
Posts: 12
Credit: 189942260
RAC: 8408

Gary Roberts wrote: I turned

Gary Roberts wrote:

I turned off sigs 15 years ago.  Forum response times are now just slow, rather than completely woeful.

Even when people properly describe their hardware in a sig (and leave out all the other crap), I usually find that the host's 'details' page plus the information from the tasks list on the website gives a much quicker and easier route to problem resolution.

When reporting a problem, one of the most useful things to give is a direct link to the details page for the host in question.  That page then has a direct link to the tasks.  That will work even if hosts are 'hidden'.

Lol, ok, will do :) (if I remember!)

Team AnandTech - SETI@H, Muon1 DPAD, F@H, MW@H, A@H, LHC@H, POGS, R@H.

Main rig - Ryzen 5 3600, 32GB DDR4 3200, RTX 3060Ti 8GB, Win10 64bit

2nd rig - i7 4930k @4.1 GHz, 16GB DDR3 1866, HD 7870 XT 3GB(DS), Win 7 64bit

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4312
Credit: 250585392
RAC: 34471

We will probably generate the

We will probably generate the last workunits of the current "sub-run" ("O2MDFV2i") tomorrow. After that there will be some thousand workunits still left to be finished, but for some time, no new GW work will be generated until we finished preparing the next run. There should be plenty of work for the Gamma-Ray pulsar search in the meantime.

BM

DanNeely
DanNeely
Joined: 4 Sep 05
Posts: 1364
Credit: 3562358667
RAC: 0

Bernd Machenschalk wrote: We

Bernd Machenschalk wrote:

We will probably generate the last workunits of the current "sub-run" ("O2MDFV2i") tomorrow. After that there will be some thousand workunits still left to be finished, but for some time, no new GW work will be generated until we finished preparing the next run. There should be plenty of work for the Gamma-Ray pulsar search in the meantime.

 

Will the next run be both CPU and GPU, or just GPU only again?

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4312
Credit: 250585392
RAC: 34471

DanNeely wrote: Will the

DanNeely wrote:

Will the next run be both CPU and GPU, or just GPU only again?

That is still to be decided. If possible with reasonable effort we'll run part of it on the CPUs.

BM

JohnMD
JohnMD
Joined: 11 May 12
Posts: 5
Credit: 26039195
RAC: 0

Bernd Machenschalk

Bernd Machenschalk wrote:

Turns out that there are occasions where our estimates of required memory are way off (i.e. underestimate by almost a factor of two), so these tasks are sent to GPUs which certainly can't handle these. We'll fix the memory estimates, then continue the analysis.

Can you give a shout when your estimates are good enough for me to retry my 2GB Nvidia ?

Keith Myers
Keith Myers
Joined: 11 Feb 11
Posts: 4964
Credit: 18751809126
RAC: 7105815

That won't be until the

That won't be until the project once again distributes GW work. Watch for announcements in News forum.

 

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.