The O2-All Sky Gravitational Wave Search on GPUs - discussion thread.

Matt White
Matt White
Joined: 9 Jul 19
Posts: 114
Credit: 120,661,850
RAC: 2,239

solling2 wrote:On another

solling2 wrote:

On another issue: do you happen to know why the run time of your tasks is more than twice as long as the cpu time?

 

On the same two machines, my CPU (V1.01) tasks take well over 24 hours, in some cases. The RX460 on the LINUX box crunches the task in about 3 1/2 hours. The NVIDIA GT1030 on the DL360 server, is coming in at a little over 5 hours.

I take it, others are crunching the CPU tasks quite a bit faster.

Clear skies,
Matt
Matt White
Matt White
Joined: 9 Jul 19
Posts: 114
Credit: 120,661,850
RAC: 2,239

Bernd Machenschalk wrote:A

Bernd Machenschalk wrote:

A bit of information that might help clear up some misconceptions about validation

- For the OSAS search we are using BOINC's "adaptive replication". This means that only one in ten results sent to "trustworthy" hosts is actually replicated for ultimate comparison (quorum of 2). In most cases, it is accepted as it is (quorum of 1).To prevent cheating, for "adaptive replication" applications, the BOINC web code only reveals the quorum for a workunit after validation.

Thanks for clearing that up, Bernd.

Bernd Machenschalk wrote:
- Due to an unexpected delay in updating the DB some 1.04 and 1.07 app versions erroneously weren't treated as "Beta test", which lead to ~300 tasks being accepted without comparison (quorum of 1) or after (successful) comparison to another GPU app version. This wasn't intentional and shouldn't happen again.

I suspect that was the case with my AMD/LINUX tasks. Looking over the task results, I found all of the NVIDIA/Windows tasks to have a quorum of 2.

Clear skies,
Matt
solling2
solling2
Joined: 20 Nov 14
Posts: 159
Credit: 471,023,751
RAC: 385

Matt White schrieb:solling2

Matt White wrote:
solling2 wrote:

On another issue: do you happen to know why the run time of your tasks is more than twice as long as the cpu time?

 

On the same two machines, my CPU (V1.01) tasks take well over 24 hours, in some cases. The RX460 on the LINUX box crunches the task in about 3 1/2 hours. The NVIDIA GT1030 on the DL360 server, is coming in at a little over 5 hours.

I take it, others are crunching the CPU tasks quite a bit faster.

Seems all normal given that Ati cards require less support from CPU than Nvidia cards. Only in comparison with Archae86's results I was surprised that shorter CPU times versus run times of GPU tasks did NOT show up for Archae86. Apparently he has a better CPU working load because of the second card.

Edit: Or is this a Windows matter, since I just noticed Jim1348's results?

archae86
archae86
Joined: 6 Dec 05
Posts: 2,842
Credit: 3,368,755,428
RAC: 2,720,616

archae86 wrote:I currently

archae86 wrote:
I currently have 51 1.07 tasks pending, most from the first, and a few from a second Windows 10 570 box.  I'm afraid the portents don't look very favorable to me at the moment.  The second box, which has only a two-core CPU, is running 2X, so if somehow 4X was an issue in my reported invalid task maybe it will fare better.

My first box continues to have zero validations, one invalid finding, and now shows 56 pending.  The pending tasks include just four run at 2X, somewhat more run at 3X, and many run at 4X.  It has one invalid, run at 4X, losing to v1.01 CPU tasks run under Linux.

The second box has the same GPU and Windows 10 OS, but has only run 2X.   Having started much later, it has only 14 pending, but already has two validations.  In both cases the quorum partner ran a V1.01 Linux CPU application.

So I do have a success of v1.07 GPU Windows "ATI" paired with CPU Linux.

Possibly I have a hint that running 4X may be a problem.  So I have (again) downgraded my first box from running 4X to running 2X.  I think I'll run it that way for about a day, so that my pending list includes an abundance each of 2X and 4X, then turn that box back to running GRP while waiting for the sand to run through the hourglass to get some picture of the validation prospects on current GW work.

The second box will continue to run GW part-time at 2X while running down a cache of GRP, so I'll slowly build up that evidence pile as well.

It would be a pity if 4x turns out to give greatly higher invalid rates than 2X on this box as the productivity boost would be considerable if it worked (going from 32 tasks/day to 43).

 

archae86
archae86
Joined: 6 Dec 05
Posts: 2,842
Credit: 3,368,755,428
RAC: 2,720,616

archae86 wrote:Possibly I

archae86 wrote:
Possibly I have a hint that running 4X may be a problem.  So I have (again) downgraded my first box from running 4X to running 2X. 

And with very little work time at 2X today since downshifting from 4X, my first box finally has a very first validation--and sure enough it was one of today's 2X tasks.  The successful quorum partner was a Windows PC running the V1.01 CPU task.

The available information certainly does not prove that 2X had a validation rate advantage over 4X even on my particular host, let alone any particular class of hosts, or all hosts, but I'll stay at 2X for at least a day to work on building up evidence. I already have about 35 pending tasks which ran entirely or mostly at 4X, so don't feel the need to add to that pile at the moment.

cecht
cecht
Joined: 7 Mar 18
Posts: 718
Credit: 792,391,706
RAC: 576,379

cecht wrote:archae86

cecht wrote:
I ... see that application listed as "Continuous Gravitational Wave search O2 All-Sky v1.07 () windows_x86_64".  Shouldn't that be "Continuous Gravitational Wave search O2 All-Sky v1.07 (GW-opencl-ati) windows_x86_64"?

Nevermind. Y'all are too kind for not pointing out how sleeping headed I was this morning. I see now that "Continuous Gravitational Wave search O2 All-Sky v1.07 () ..." is the general format of the app name and the (GW-open-ati) specification is included in the application name on the Workunit pages.

EDIT: Have re-written my original post here to try and regain some modicum of respect following an ill-informed comment, which in-turn was a attempt to do the same following a prior post, an on and on.....

 

Ideas are not fixed, nor should they be; we live in model-dependent reality.

archae86
archae86
Joined: 6 Dec 05
Posts: 2,842
Credit: 3,368,755,428
RAC: 2,720,616

Regarding the possibility

Regarding the possibility that for my (Windows 10, RX 570) hosts 2X is much better for validations than 4X, the interim update is that hypothesis has strengthened greatly during the day.

The two-core CPU host which got started later has always run 2X, and at this writing has returned 22 tasks, of which 5 have validated, 17 remain pending, and zero have been found invalid so far.

The six-core CPU host which got started a couple of days earlier has run a lot at 4X, a little at 3X, and a moderate and steadily increasing amount at 2X.  So far it has returned 61 of these tasks, of which the 6 which have validated all ran at 2X, and the 4 so far found invalid all ran at 4X. 

The comparison populations are not properly matched nor randomized, so don't drag out heavy statistical tools to attempt a confidence number, but my professional gut estimate is that there is a real difference here, and it may be as bad as negligible success at 4X for the one host where I have tried it.

On the other hand, 4X, if it worked, is so very much more productive that I'd not suggest anyone completely ignore the possibility that it might work for them--but I do urge you to check things out before dedicating days of production in the hope it will work for you.

It is not much of a surprise that invalid results are a bit slow in being posted.  First both I and a quorum partner (not beta--so a CPU host) have to return tasks, then when they fail to get a good enough match, a new task has to go out to another CPU host, and get returned.  I currently expect a rising tide of invalid results to be reported for the next few days, as my first host has dozens of pending tasks I expect to be found invalid.

If I see new evidence that makes me doubt "4X bad, 2X better" for my host I'll post, but I'll stop posting incremental updates every few hours, as I think the purpose of publishing a warning is now served.

Matt White
Matt White
Joined: 9 Jul 19
Posts: 114
Credit: 120,661,850
RAC: 2,239

h1_0533.55_O2C02Cl1In0__O2AS2

h1_0533.55_O2C02Cl1In0__O2AS20-500_533.70Hz_9

This job came in as invalid today, a Linux/AMD wu.

Clear skies,
Matt
cecht
cecht
Joined: 7 Mar 18
Posts: 718
Credit: 792,391,706
RAC: 576,379

While I was running two RX570

While I was running two RX570 each with 3x tasks on the v1.07 app, I saw individual task times of ~35min, as reported earlier. Today, when I limited runs to just one GPU @ 3x tasks, individual task times decreased to <25 min. That's quite a time difference from running the app on two GPUs  vs. one (6 concurrent tasks vs. 3).  CPU usage dropped from ~88% with 6 concurrent tasks compared to ~50% with 3 concurrent tasks. I'm wondering whether CPU utilization is the performance bottleneck or whether it's something else, like PCIe channels? Any ideas?

I'm limiting O2AS20-500 tasks to one GPU because I'm trying to get FGRPB1G tasks to run on the other GPU, but haven't gotten FGRPB1G tasks downloaded yet.  I'm curious whether there will be any performance interference between the two different apps. (I'm trying this configuration in a greedy attempt to revive some of my BOINC RAC, which has taken a hit since I began running only O2AS20-500 tasks on my RX570 host. Cry)

 

Ideas are not fixed, nor should they be; we live in model-dependent reality.

Rolf
Rolf
Joined: 7 Aug 17
Posts: 27
Credit: 135,377,187
RAC: 0

I am seeing similar behavior,

I am seeing similar behavior, that the GW GPU tasks require a lot of CPU-, RAM-, or PCI bandwidth in bursts so to maximize throughput you can't have anything else running on the CPU, even if the average usage is only a few cores.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.