Are GW work units getting longer?

Jim1348
Jim1348
Joined: 19 Jan 06
Posts: 449
Credit: 208,736,569
RAC: 11,802
Topic 225039

I may have something wrong with my machine, but my 2.09 Gravitational Wave search O2 Multi-Directional GPU (GW-opencl-ati)  work units are now running over an hour on my RX 570 (Ubuntu 20.04.2), supported by a core of a Ryzen 3600, whereas yesterday they were running about 14 minutes. 

 

I have not seen that much variation before.  Is it normal?  (All the frequencies are the same, at  681.xx Hz).

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5,495
Credit: 66,049,864,124
RAC: 54,786,916

Jim1348 wrote:I have not seen

Jim1348 wrote:
I have not seen that much variation before.  Is it normal?

That much variation is not normal.

I checked your tasks list and saw a group of tasks with a DF of 0.65 (issue numbers from _680 down to _672.  I would regard the crunch times of around 900 - 1000s to be fairly 'normal' for your GPU and that DF, provided you are crunching tasks as singles with good CPU support.

Two of the group had that value whilst others ranged up to 5000s.  To me this looks suspiciously like lack of CPU support.  This group should all have much the same memory requirements so it shouldn't be lack of VRAM.  I have an 8GB RX 570 running x3 in around 1700s for that same DF.

Your CPU is 6C/12T.  How many CPU tasks are you runnning?  Have you tried one or two less to see if that returns the GPU crunch time to more normal values?

Cheers,
Gary.

Jim1348
Jim1348
Joined: 19 Jan 06
Posts: 449
Credit: 208,736,569
RAC: 11,802

Gary Roberts wrote:Your CPU

Gary Roberts wrote:
Your CPU is 6C/12T.  How many CPU tasks are you runnning?  Have you tried one or two less to see if that returns the GPU crunch time to more normal values?

I use an app_config.xml to reserve one core for the GPU.  It has not changed since I started crunching with this card four days ago, and in fact I have used it for years.

I would include it here, but I can't seem to copy/paste to this forum.

 

So I am running only 11 CPU projects.  But since around midnight UTC when the problem started, they have been all Rosetta.  Before that, they were LHC (CMS and native ATLAS).  So it appears that Rosetta does not get along with the GW here.  I had not seen that before.

Thanks for looking into it.

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5,495
Credit: 66,049,864,124
RAC: 54,786,916

Jim1348 wrote:I use an

Jim1348 wrote:
I use an app_config.xml to reserve one core for the GPU.  It has not changed since I started crunching with this card four days ago, and in fact I have used it for years.

That's fine, as long as you take into account the fact that you only have 6 physical cores.  You are sharing a core between a Rosetta task and the GPU support function.  The GW GPU app uses CPU support quite heavily and it seems quite sensitive to even small delays in getting the support immediately it's requested.

The other factor might be the PCIe bandwidth demands of 11 rosetta tasks.  I seem to recall reading somewhere that others have seen slowdowns when a lot of these are running.  There might be a 'sweet spot' somewhere less than 11.

Cheers,
Gary.

Jim1348
Jim1348
Joined: 19 Jan 06
Posts: 449
Credit: 208,736,569
RAC: 11,802

By reserving two cores and

By reserving two cores and running only 10 Rosetta, I am getting GW times of under 12 minutes.  So that explains it.

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5,495
Credit: 66,049,864,124
RAC: 54,786,916

Thanks for reporting back. 

Thanks for reporting back.  That's quite a startling result!

I suspect there may be others suffering from a similar 'penalty' if just a single thread used for GPU support.

Cheers,
Gary.

Jim1348
Jim1348
Joined: 19 Jan 06
Posts: 449
Credit: 208,736,569
RAC: 11,802

Gary Roberts wrote:I suspect

Gary Roberts wrote:
I suspect there may be others suffering from a similar 'penalty' if just a single thread used for GPU support.

I am surprised too.  It depends a lot on what other projects you are running.  Rosetta just seems to be a hard case.  I guess you have to check whenever you are setting it up.  Thanks for your input; it put me on the right track.

DanNeely
DanNeely
Joined: 4 Sep 05
Posts: 1,358
Credit: 2,360,431,169
RAC: 3,123,966

Jim1348 wrote: Gary Roberts

Jim1348 wrote:

Gary Roberts wrote:
I suspect there may be others suffering from a similar 'penalty' if just a single thread used for GPU support.

I am surprised too.  It depends a lot on what other projects you are running.  Rosetta just seems to be a hard case.  I guess you have to check whenever you are setting it up.  Thanks for your input; it put me on the right track.

 

It also depends what else the system is doing.  My systems are all 4/8 cores; the ones running just E@H run 7 CPU tasks at a time.  My main system only runs 5 or 6 depending on what else I've got going on at the time.

Jim1348
Jim1348
Joined: 19 Jan 06
Posts: 449
Credit: 208,736,569
RAC: 11,802

It is not just GW either,

It is not just GW either, though they take a lot of CPU time.  But on the new WCG/OPNG (beta) work units, the times on the RX 570 were longer than I saw on other comparable cards when I was running Rosetta.  So I think any OpenCl work should be checked for support.

Werinbert
Werinbert
Joined: 31 Dec 12
Posts: 8
Credit: 22,023,113
RAC: 38,499

MLC@home is another

MLC@home is another challenging project. Earlier today, on my CPU I was running a primegrid WW (-mt4) and two mlc@home tasks leaving two threads free for two GW GPU tasks. The WW task finished and my computer filled those 4 spaces with additional mlc tasks making it 6 concurrent tasks. The two GW tasks immediately took a nose dive. Grabbing another WW task fixed the problem.

Bill
Bill
Joined: 2 Jun 17
Posts: 31
Credit: 102,255,650
RAC: 56,204

This may not be exactly

This may not be exactly related, but I'm having similar concerns about the GW CPU tasks.

 

I just upgraded to a 2700 (non-X), and the GW 2.09 CPU tasks are taking about a day to complete on Windows 10.  I'm running E@H on a really old laptop, and those tasks are taking about 7 hours...but they are using the GW 2.08 application.  The old laptop, when it was able to, ran all cores, but memory requirements of these tasks has limited how many tasks run at once.  For the 2700, I have been experimenting with only using 14-15 cores for BOINC (two of those cores are supporting GPU tasks).

Something seems off, but I don't know what.  CPU is keeping very cool, and I've got a steady clock of 3.7 GHz.

 

Edit:  I forgot to check, I was running the 2.09 GWNew application on the old CPU of the desktop (it was a 2200G), and it was completing those tasks in four hours!  Okay...maybe I'm undervolting this CPU too much...I'm at 1.15625...more investigation needed...

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.