Discussion Thread for the Continuous GW Search known as O2MD1 (now O2MDF - GPUs only)

Matt White

Joined: 9 Jul 19

Posts: 120

Credit: 280798376

RAC: 0

archae86 wrote:We have had a

12 Oct 2019 19:52:42 UTC

Message 173837 in response to message 173829

(moderation:

)

archae86 wrote:

We have had a situation here at Einstein for some time that modern AMD cards over-perform relative to modern Nvidia cards on the Gamma--Ray Pulsar tasks relative to their capabilities on widely published competitive reviews.

Now we seem to have a developing situation in which modern Nvidia cards over-perform relative to modern AMD cards on the current (and a previous few) GW tasks.

If this persists, one might speculate that it would be inviting to equip a PC with one each of suitable AMD and Nvidia cards, directing the GW tasks to the green side and the GRP tasks to the red side.

Is this easy to do? Hard to do but possible, or out of the question?

And, yes, I suspect that for people with large fleets it probably would make much more sense to build dedicated boxes of flavor GW and flavor GRP. But there are plenty of us with one to four boxes who might find such a setup interesting.

I have done just that with the two boxes I am running. My DL360/NVIDIA GT1030 is running all GW work. My XW4600/AMD RX560 is doing Gamma Ray GPU only.

I have 11 tasks (O2MD) completed, yet waiting for validation.

Clear skies,

Matt

archae86

Joined: 6 Dec 05

Posts: 3157

Credit: 7226958258

RAC: 1073763

Zalster wrote:It is possible

12 Oct 2019 20:00:01 UTC

Message 173838 in response to message 173832

(moderation:

)

Zalster wrote:

It is possible to prevent certain work types from running on specific GPUs by use of the exclude option, type and work unit in a cc_config.xml. I know I did it a long time ago for his project, would need to brush up on it.

Relying on your tip, I come across this representation of the basic structure:

<exclude_gpu>
   <url>project_URL</url>
   [<device_num>N</device_num>]
   [<type>NVIDIA|ATI|intel_gpu</type>]
   [<app>appname</app>]
</exclude_gpu>

That implies finding suitable values and notation formats for this purpose for all of:

project_URL
device_num  (for each GPU of interest in the current box)
appname

But otherwise looks pretty simple.  Assuming this mechanism is actually properly functional in current and recent versions of BOINC, it seems my speculated machine configuration is pretty feasible.

Whether it is a good idea is another matter.  Probably as a public service I should try a few of the v2.01 GW tasks on my Radeon VII, which would be a more appropriate comparison to a 2080 Super than some others.   I did try the Radeon VII on the GW on offer about a month ago.  For that the Radeon VII performance was pathetic, but, I think, mostly because it sat around waiting for the CPU to do an excessive share of the work.  The last few GW executables seem able to keep my RX 570 pretty well occupied, so the extreme excess CPU requirement has gotten better.  Maybe a Radeon VII can do something.  Of course, now the problem is that the work units are coming in multiple effort-level sizes, and other than looking at the claimed flops work content in task properties, I don't know that we have yet learned how to tell them apart--which makes comparing host effectiveness rather a problem.

Thanks

Richie

Joined: 7 Mar 14

Posts: 656

Credit: 1702989778

RAC: 0

archae86 wrote:Zalster

12 Oct 2019 20:29:24 UTC

Message 173840 in response to message 173838

(moderation:

)

archae86 wrote:

Of course, now the problem is that the work units are coming in multiple effort-level sizes, and other than looking at the claimed flops work content in task properties, I don't know that we have yet learned how to tell them apart--which makes comparing host effectiveness rather a problem.

I wrote about run times looking to depeng on the frequencies some time ago, but I think that what you just said is a much more important thing to take into account. Frequencies and run times don't have much correlation after all. I still think that the very first tasks from a batch (with very low freq... 20 Hz or so) did run very fast. But that's about it. Tasks with higher freq's can have plenty of variance... and it comes from the FLOPS. Two tasks with exactly the same freq can take for example 5 or 20 minutes to run.

cecht

Joined: 7 Mar 18

Posts: 1535

Credit: 2910698709

RAC: 2091090

For the v2.01 app running 3x

13 Oct 2019 10:44:25 UTC

Message 173842

(moderation:

)

For the v2.01 app running 4x on RX 570s in a Linux host, I've 116 valid tasks with no invalids or errors. Nearly all the valids gave 1000 credits, averaging ~ 6.1 minutes per task run time from the most recent runs today, but there are a handful with a very short run times that gave 120 or 130 credits.

With two RX 570s running at 4x, CPU usage averages ~50% for my 4-thread Pentium G5600

EDIT: By CPU usage I mean the activity while running tasks, as reported by System Monitor. The proportional CPU task completion time reported by E@H results is close to 25%.

Ideas are not fixed, nor should they be; we live in model-dependent reality.

archae86

Joined: 6 Dec 05

Posts: 3157

Credit: 7226958258

RAC: 1073763

I tried running a few

12 Oct 2019 22:34:34 UTC

Message 173844

(moderation:

)

I tried running a few V2.01 O2MD1Gn GW tasks on my Radeon VII Windows host. I had some difficulty a couple of weeks ago, but for the last two days the system was stably running 1X GRP tasks with elapsed time tightly clustered around 3:35.

All seven of the 2.01 GW tasks I downloaded were stated in their properties to have 144,000 Gflops of work content. I ran them at 1X, with elapsed times somewhat variable but averaging about 7:52.

At first glance, the current application seemed able to keep the Radeon VII reasonably busy, and indeed GPU-Z indicated GPU load in the low 80s except for the beginning/end gap. However, the temperatures dropped drastically, with the measurement GPU-Z names GPU temperature averaging 56C, and the location named GPU Temperature {Hot Spot) averaging 63.1 Also reported power consumption for the GPU card was down over a factor of two. The reported GPU clock was rapidly time-varying, and averaged only 720 MHz.

Whether the fault is in my sample of the card, the card design, the things which control it, the application, or the data I can't say, but in fact my Radeon VII was loafing while running this work. Another possibility is that my CPU is not fast enough, as BOINC reported 96% CPU. Quite likely I'd get a considerable productivity gain from running higher multiplicity, at least as far as 3X (my CPU has only four cores, non-HT).

By contrast, my RX 570 running this version of the application kept usefully busy at 1X, and at 3X kept GPU temperature and load up about as high as the same system running current GRP work at 2X.

As my system has been a bit fragile in recent weeks, I'm not interested in further GW exploration on the Radeon VII. I hope other owners will take a look.

cecht

Joined: 7 Mar 18

Posts: 1535

Credit: 2910698709

RAC: 2091090

After downloading 1056 v2.10

13 Oct 2019 16:32:51 UTC

Message 173857

(moderation:

)

After downloading 1056 v2.10 GW tasks and completing 1055 of them, my one host stopped getting new v2.10 yesterday (but is now keeping busy with FGRBPG1 tasks). My other, slower, host is still getting downloads of v2.10 work.

The last task that came in before the well ran dry is listed on my results page as "In progress", but I cannot find it in BOINC Manager or client_state.xml. So, did I hit some quota for v2.10 beta runs or did that last "In progress" ghost task somehow gum up the works?

Ghost task info:

Task 889189433
Name:
h1_0218.40_O2C02Cl1In0__O2MD1Gn_G34731_218.50Hz_16_1
Workunit ID:
422419094
Created:
12 Oct 2019 11:04:39 UTC
Sent:
12 Oct 2019 21:05:04 UTC
Report deadline:
26 Oct 2019 21:05:04 UTC
Received:
1 Jan 1970 0:00:00 UTC
Server state:
In progress

Ideas are not fixed, nor should they be; we live in model-dependent reality.

Anonymous

cecht wrote:After downloading

13 Oct 2019 18:18:08 UTC

Message 173858 in response to message 173857

(moderation:

)

cecht wrote:

After downloading 1056 v2.10 GW tasks and completing 1055 of them, my one host stopped getting new v2.10 yesterday (but is now keeping busy with FGRBPG1 tasks). My other, slower, host is still getting downloads of v2.10 work.

The last task that came in before the well ran dry is listed on my results page as "In progress", but I cannot find it in BOINC Manager or client_state.xml. So, did I hit some quota for v2.10 beta runs or did that last "In progress" ghost task somehow gum up the works?

Ghost task info:
Task 889189433
Name:
h1_0218.40_O2C02Cl1In0__O2MD1Gn_G34731_218.50Hz_16_1
Workunit ID:
422419094
Created:
12 Oct 2019 11:04:39 UTC
Sent:
12 Oct 2019 21:05:04 UTC
Report deadline:
26 Oct 2019 21:05:04 UTC
Received:
1 Jan 1970 0:00:00 UTC
Server state:
In progress

My 3 pcs have received around 10 v2.01 tasks all sent today, the 13th.

Zalster

Joined: 26 Nov 13

Posts: 3117

Credit: 4050672230

RAC: 0

There is an issue with the

13 Oct 2019 18:32:04 UTC

Message 173860

(moderation:

)

There is an issue with the server. It's telling me I have 228 lost work units but won't resend them

mmonnin

Joined: 29 May 16

Posts: 291

Credit: 3416506540

RAC: 3564823

Was the CPU version of 2.00

14 Oct 2019 17:57:35 UTC

Message 173876

(moderation:

)

Was the CPU version of 2.00 (GWnew) set as MT? I only had 1 single task running out of 8 threads with 32GB of RAM. I was not out of memory. Stopping them allowed 8 other non E@H tasks to run. GWold did not act this way.

Matt White

Joined: 9 Jul 19

Posts: 120

Credit: 280798376

RAC: 0

mmonnin wrote:Was the CPU

14 Oct 2019 19:19:13 UTC

Message 173878 in response to message 173876

(moderation:

)

mmonnin wrote:

Was the CPU version of 2.00 (GWnew) set as MT? I only had 1 single task running out of 8 threads with 32GB of RAM. I was not out of memory. Stopping them allowed 8 other non E@H tasks to run. GWold did not act this way.

I have 12 instances of this job running on my DL360 with 36G of ram, along with 3 V2.01 GPU tasks. All running without issue.

Clear skies,

Matt

Discussion Thread for the Continuous GW Search known as O2MD1 (now O2MDF - GPUs only)

Forums › Cruncher's Corner

Comment viewing options

Forums › Cruncher's Corner