Discussion Thread for the Continuous GW Search known as O2MD1 (now O2MDF - GPUs only)

Matt White
Matt White
Joined: 9 Jul 19
Posts: 120
Credit: 280798376
RAC: 0

archae86 wrote:We have had a

archae86 wrote:

We have had a situation here at Einstein for some time that modern AMD cards over-perform relative to modern Nvidia cards on the Gamma--Ray Pulsar tasks relative to their capabilities on widely published competitive reviews.

Now we seem to have a developing situation in which modern Nvidia cards over-perform relative to modern AMD cards on the current (and a previous few) GW tasks.

If this persists, one might speculate that it would be inviting to equip a PC with one each of suitable AMD and Nvidia cards, directing the GW tasks to the green side and the GRP tasks to the red side.

Is this easy to do? Hard to do but possible, or out of the question?

And, yes, I suspect that for people with large fleets it probably would make much more sense to build dedicated boxes of flavor GW and flavor GRP.  But there are plenty of us with one to four boxes who might find such a setup interesting.

 

I have done just that with the two boxes I am running. My DL360/NVIDIA GT1030 is running all GW work. My XW4600/AMD RX560 is doing Gamma Ray GPU only.

I have 11 tasks (O2MD) completed, yet waiting for validation.

Clear skies,
Matt
archae86
archae86
Joined: 6 Dec 05
Posts: 3145
Credit: 7057654931
RAC: 1600080

Zalster wrote:It is possible

Zalster wrote:
It is possible to prevent certain work types from running on specific GPUs by use of the exclude option, type and work unit in a cc_config.xml.  I know I did it a long time ago for his project, would need to brush up on it.

Relying on your tip, I come across this representation of the basic structure:

<exclude_gpu>
   <url>project_URL</url>
   [<device_num>N</device_num>]
   [<type>NVIDIA|ATI|intel_gpu</type>]
   [<app>appname</app>]
</exclude_gpu>

That implies finding suitable values and notation formats for this purpose for all of:

project_URL
device_num  (for each GPU of interest in the current box)
appname

But otherwise looks pretty simple.  Assuming this mechanism is actually properly functional in current and recent versions of BOINC, it seems my speculated machine configuration is pretty feasible.

Whether it is a good idea is another matter.  Probably as a public service I should try a few of the v2.01 GW tasks on my Radeon VII, which would be a more appropriate comparison to a 2080 Super than some others.   I did try the Radeon VII on the GW on offer about a month ago.  For that the Radeon VII performance was pathetic, but, I think, mostly because it sat around waiting for the CPU to do an excessive share of the work.  The last few GW executables seem able to keep my RX 570 pretty well occupied, so the extreme excess CPU requirement has gotten better.  Maybe a Radeon VII can do something.  Of course, now the problem is that the work units are coming in multiple effort-level sizes, and other than looking at the claimed flops work content in task properties, I don't know that we have yet learned how to tell them apart--which makes comparing host effectiveness rather a problem.

Thanks
Richie
Richie
Joined: 7 Mar 14
Posts: 656
Credit: 1702989778
RAC: 0

archae86 wrote:Zalster

archae86 wrote:
Of course, now the problem is that the work units are coming in multiple effort-level sizes, and other than looking at the claimed flops work content in task properties, I don't know that we have yet learned how to tell them apart--which makes comparing host effectiveness rather a problem.

I wrote about run times looking to depeng on the frequencies some time ago, but I think that what you just said is a much more important thing to take into account. Frequencies and run times don't have much correlation after all. I still think that the very first tasks from a batch (with very low freq... 20 Hz or so) did run very fast. But that's about it. Tasks with higher freq's can have plenty of variance... and it comes from the FLOPS. Two tasks with exactly the same freq can take for example 5 or 20 minutes to run.

cecht
cecht
Joined: 7 Mar 18
Posts: 1432
Credit: 2468175260
RAC: 752545

For the v2.01 app running 3x

For the v2.01 app running 4x on RX 570s in a Linux host, I've 116 valid tasks with no invalids or errors.  Nearly all the valids gave 1000 credits, averaging ~ 6.1 minutes per task run time from the most recent runs today, but there are a handful with a very short run times that gave 120 or 130 credits.

With two RX 570s running at 4x, CPU usage averages ~50% for my 4-thread Pentium G5600

EDIT: By CPU usage I mean the activity while running tasks, as reported by System Monitor. The proportional CPU task completion time reported by E@H results is close to 25%.

Ideas are not fixed, nor should they be; we live in model-dependent reality.

archae86
archae86
Joined: 6 Dec 05
Posts: 3145
Credit: 7057654931
RAC: 1600080

I tried running a few

I tried running a few V2.01 O2MD1Gn GW tasks on my Radeon VII Windows host.  I had some difficulty a couple of weeks ago, but for the last two days the system was stably running 1X GRP tasks with elapsed time tightly clustered around 3:35.

All seven of the 2.01 GW tasks I downloaded were stated in their properties to have 144,000 Gflops of work content.  I ran them at 1X, with elapsed times somewhat variable but averaging about 7:52.

At first glance, the current application seemed able to keep the Radeon VII reasonably busy, and indeed GPU-Z indicated GPU load in the low 80s except for the beginning/end gap.  However, the temperatures dropped drastically, with the measurement GPU-Z names GPU temperature averaging 56C, and the location named GPU Temperature {Hot Spot) averaging 63.1  Also reported power consumption for the GPU card was down over a factor of two.  The reported GPU clock was rapidly time-varying, and averaged only 720 MHz.

Whether the fault is in my sample of the card, the card design, the things which control it, the application, or the data I can't say, but in fact my Radeon VII was loafing while running this work.  Another possibility is that my CPU is not fast enough, as BOINC reported 96% CPU.  Quite likely I'd get a considerable productivity gain from running higher multiplicity, at least as far as 3X (my CPU has only four cores, non-HT).

By contrast, my RX 570 running this version of the application kept usefully busy at 1X, and at 3X kept GPU temperature and load up about as high as the same system running current GRP work at 2X.

As my system has been a bit fragile in recent weeks, I'm not interested in further GW exploration on the Radeon VII.  I hope other owners will take a look.

 

cecht
cecht
Joined: 7 Mar 18
Posts: 1432
Credit: 2468175260
RAC: 752545

After downloading 1056 v2.10

After downloading 1056 v2.10 GW tasks and completing 1055 of them, my one host stopped getting new v2.10 yesterday (but is now keeping busy with FGRBPG1 tasks).  My other, slower, host is still getting downloads of v2.10 work.

The last task that came in before the well ran dry is listed on my results page as "In progress", but I cannot find it in BOINC Manager or client_state.xml.  So, did I hit some quota for v2.10 beta runs or did that last "In progress" ghost task somehow gum up the works?

Ghost task info:

Task 889189433
Name:
h1_0218.40_O2C02Cl1In0__O2MD1Gn_G34731_218.50Hz_16_1
Workunit ID:
422419094
Created:
12 Oct 2019 11:04:39 UTC
Sent:
12 Oct 2019 21:05:04 UTC
Report deadline:
26 Oct 2019 21:05:04 UTC
Received:
1 Jan 1970 0:00:00 UTC
Server state:
In progress

Ideas are not fixed, nor should they be; we live in model-dependent reality.

robl
robl
Joined: 2 Jan 13
Posts: 1709
Credit: 1454554471
RAC: 3561

cecht wrote:After downloading

cecht wrote:

After downloading 1056 v2.10 GW tasks and completing 1055 of them, my one host stopped getting new v2.10 yesterday (but is now keeping busy with FGRBPG1 tasks).  My other, slower, host is still getting downloads of v2.10 work.

The last task that came in before the well ran dry is listed on my results page as "In progress", but I cannot find it in BOINC Manager or client_state.xml.  So, did I hit some quota for v2.10 beta runs or did that last "In progress" ghost task somehow gum up the works?

Ghost task info:

Task 889189433
Name:
h1_0218.40_O2C02Cl1In0__O2MD1Gn_G34731_218.50Hz_16_1
Workunit ID:
422419094
Created:
12 Oct 2019 11:04:39 UTC
Sent:
12 Oct 2019 21:05:04 UTC
Report deadline:
26 Oct 2019 21:05:04 UTC
Received:
1 Jan 1970 0:00:00 UTC
Server state:
In progress

My 3 pcs have received around 10 v2.01 tasks all sent today, the 13th.  

Zalster
Zalster
Joined: 26 Nov 13
Posts: 3117
Credit: 4050672230
RAC: 0

There is an issue with the

There is an issue with the server. It's telling me I have 228 lost work units but won't resend them

mmonnin
mmonnin
Joined: 29 May 16
Posts: 291
Credit: 3232277016
RAC: 186736

Was the CPU version of 2.00

Was the CPU version of 2.00 (GWnew) set as MT? I only had 1 single task running out of 8 threads with 32GB of RAM. I was not out of memory. Stopping them allowed 8 other non E@H tasks to run. GWold did not act this way.

Matt White
Matt White
Joined: 9 Jul 19
Posts: 120
Credit: 280798376
RAC: 0

mmonnin wrote:Was the CPU

mmonnin wrote:
Was the CPU version of 2.00 (GWnew) set as MT? I only had 1 single task running out of 8 threads with 32GB of RAM. I was not out of memory. Stopping them allowed 8 other non E@H tasks to run. GWold did not act this way.

I have 12 instances of this job running on my DL360 with 36G of ram, along with 3 V2.01 GPU tasks. All running without issue.

Clear skies,
Matt

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.