Discussion Thread for the Continuous GW Search known as O2MD1 (now O2MDF - GPUs only)

Richie
Richie
Joined: 7 Mar 14
Posts: 656
Credit: 1702989778
RAC: 0

Bernd Machenschalk wrote:The

Bernd Machenschalk wrote:
The CPU part of the run ("O2MD1S3") is also running now.

Great! There's still a glitch on the flow though... Downloads are failing. Success rate 0/10. Permanent HTTP errors.

https://einsteinathome.org/task/1069674861

Tom M
Tom M
Joined: 2 Feb 06
Posts: 5658
Credit: 7739053751
RAC: 2529644

Thank you for the update

Thank you for the update Bernd.

Tom M

A Proud member of the O.F.A.  (Old Farts Association).  Be well, do good work, and keep in touch.® (Garrison Keillor)

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4273
Credit: 245263696
RAC: 12623

Richie wrote: Bernd

Richie wrote:

Bernd Machenschalk wrote:
The CPU part of the run ("O2MD1S3") is also running now.

Great! There's still a glitch on the flow though... Downloads are failing. Success rate 0/10. Permanent HTTP errors.

https://einsteinathome.org/task/1069674861

Oh dear! Thanks for the note! Shouldn't take long to fix. Sending GW work is suspended until this is fixed.

BM

Tom M
Tom M
Joined: 2 Feb 06
Posts: 5658
Credit: 7739053751
RAC: 2529644

Does anyone have a clue about

Does anyone have a clue about the highest performing GW GPU tasks system?

I am in the process of working up a GW only system and am wondering what kind of goal I should be shooting for.

Tom M

A Proud member of the O.F.A.  (Old Farts Association).  Be well, do good work, and keep in touch.® (Garrison Keillor)

Tom M
Tom M
Joined: 2 Feb 06
Posts: 5658
Credit: 7739053751
RAC: 2529644

I just got done bumping this

I just got done bumping this system up to 6 gpus with 2 threads per GPU.  I wanted to keep as many cpus available for additional expansion of GPU cards as I could.  So I set the limit to 0.5 CPU per GPU thread.

The preliminary result was each thread was taking twice as long as it had previously.  This means there was no gain in running two threads per GPU. 

I have since reset the CPU to the original default (0.9) per GPU thread.  I thought I saw a speed up in massive speed up in processing. 

==edit===

It maybe that I am now processing at the same speed as a single thread per GPU.

====edit=deleted===

There probably is a hard limit on the number of GPU threads a single system can push on GW tasks. No more that 1 gpu task per thread.  Which means you likely can't run 38 GPU threads at "full speed" on a 18 slot MB like the B360-F Pro because it has an 8c/16t limit.

Tom M

A Proud member of the O.F.A.  (Old Farts Association).  Be well, do good work, and keep in touch.® (Garrison Keillor)

Cherokee150
Cherokee150
Joined: 13 May 11
Posts: 24
Credit: 812224014
RAC: 363485

Does anyone know why there

Does anyone know why there are no new work units available for O2MDF?

Ian&Steve C.
Ian&Steve C.
Joined: 19 Jan 20
Posts: 3713
Credit: 34661366416
RAC: 28436388

as far as I know they are

as far as I know they are working on the transition to O3ASE. O2MDF will go away.

_________________________________________________________________________

Cherokee150
Cherokee150
Joined: 13 May 11
Posts: 24
Credit: 812224014
RAC: 363485

Thank you. I look forward

Thank you.

I look forward to O3ASE.

I sincerely hope they program O3ASE to run on GPUs with only 2 GB of RAM.  I would like to be able to run them not only on my computer with an NVIDIA 1070, but also on my computer with an NVIDIA GeForce GTX 960.  There are still a lot of older GPUs out here with 2 GB of RAM that can process a lot of work units per day for Einstein. :-)

petri33
petri33
Joined: 4 Mar 20
Posts: 117
Credit: 3341045819
RAC: 0

Hi Bernd, I looked at my

Hi Bernd,

I looked at my .nv/ComputeCache and found some new OpenCL code in the source that has Xlal_BSGL stuff and SemiCoherent code

<code>

... this is just like it was before ..

#ifndef PULSAR_MAX_DETECTORS

#define PULSAR_MAX_DETECTORS 2

#endif

... and a couple of lines below is some new code, that is overidden by the previous definition and has no effect

#ifndef PULSAR_MAX_DETECTORS

#define PULSAR_MAX_DETECTORS 10

#endif

</code>

a) Is this the way it should be?

b) Another question: Is there a known hard upper limit in CopyBSGLSetup() for the UINT4 numDetectors argument ?

petri33
petri33
Joined: 4 Mar 20
Posts: 117
Credit: 3341045819
RAC: 0

  2021-05-03

 

2021-05-03 10:54:25.9746 (205377) [normal]: Recalc FstatMethod used: 'DemodSSE'

O3ASE tasks spend about three minutes recalculating statistics on CPU after main analysyis on GPU has finished:

2021-05-03 10:57:57.5918 (205377) [normal]: Finished main analysis.
2021-05-03 10:57:57.5921 (205377) [normal]: Recalculating statistics for the final toplist...
2021-05-03 11:00:48.1217 (205377) [normal]: Finished recalculating toplist statistics.

The O2MD did the same much faster (50 seconds)

2021-03-18 22:42:18.3670 (214468) [normal]: Finished main analysis.
2021-03-18 22:42:18.3672 (214468) [normal]: Recalculating statistics for the final toplist...
2021-03-18 22:43:11.3893 (214468) [normal]: Finished recalculating toplist statistics.

Please do not take this too seriously. I'm just wondering:

Is this recalculation just to check if GPU has done something right? It says re calculating. Is it necessary at all?

The GPU sits three minutes idle. Looks bad: 20 s initial verification + 200 s work + 180 seconds idle.

I know, I could run two at a time, but NVIDIA GPUS slow down a lot when doing that. Oh how I miss the Seti mutex implementation that let the other task do GPU-stuff when the other had finished its GPU part. Pre- and post-steps on CPU overlapped nicely with the other task that was doing GPU.

 

Petri33

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.