Gamma-ray pulsar binary search #1 on GPUs

chester
chester
Joined: 15 Jun 10
Posts: 15
Credit: 506261798
RAC: 0

Gary Roberts wrote:chester_4

Gary Roberts wrote:
chester_4 wrote:
You would be correct if i would have GPU installed.

Essentially, you do have a GPU installed.  You should do some reading about what AMD APUs actually are.  The advice you were given would maximise the output of the 'GPU' part of your processor.  In addition, you might be able to run some CPU type tasks to improve the total output of your machine, but at the expense of higher power consumption, more heat produced and possibly somewhat longer GPU crunch times.

You'll need to experiment to find the best balance suited to your needs.  That would be much easier if you know what an app_config.xml file is and how to tweak things with it.  It may well be that the best result for you is just to exclude the CPU version of the search.  I don't own any AMD APUs so I can't give advice based on experience.

Even if it is an APU, computer details show, that boinc client doesn't see any coprocessor, because its a virtual machine with no special graphics driver installed. So the computer has no chance running "real GPU" tasks on a GPU. AgentB's investigation explains what had happened:

AgentB wrote:
chester_4 wrote:

Hi all

Мy CPU crunches CPU version 1.5 times longer, than GPU version.

1. If GPU version is more efficient on certain CPU models, why not send them GPU version instead?

2. Can i somehow enforce downloading GPU version to run on CPU?

Machine: AMD A10-6800K https://einsteinathome.org/host/12192829

CPU version: 28,251.13 sec https://einsteinathome.org/workunit/265968478

GPU version: 19,907.76 sec https://einsteinathome.org/workunit/265691764

I think there may be some misunderstandings here.  We are looking at one of the rare work units (1WU = two tasks) with one a CPU and the other a GPU application.

So you are not running a GPU application on that host.   Interestingly these dual app WUs have "Binary points=36" - normally they have "Binary points=175" which may mean they use more memory (longer arrays but fewer of them).    I might see if i crunched a few of these on my CPU and see if i had a time reduction as well.

As Gary mentioned you could run the GPU app on the APU, I think i saw someone with Kaveri running these apps, so it might be worth looking into.  If clinfo is showing the GPU with 1GB of memory it should work.   See some clues  this thread

 edit: yes i had run 72 such "GPU" tasks  on this CPU host, and they were actually slightly slower,  averaging 42K seconds compared with 40K normally.

That means, that it was not a "real GPU" task, but altered CPU task mislabeled as "Gamma-ray pulsar binary search #1 on GPUs v1.05 (FGRPSSE) x86_64-pc-linux-gnu", though its not "on GPUs". The increase in crunching speed comes from this alteration:

Quote:

So you are not running a GPU application on that host.   Interestingly these dual app WUs have "Binary points=36" - normally they have "Binary points=175" which may mean they use more memory (longer arrays but fewer of them).

Efficiency of these "fake GPU" tasks depends on the CPU/RAM used, in my config they give 30% time reduction, in his its 5% time increase. So my questions are reduced to one:

Can we have that "fake GPU" version available to download and crunch?

TimeLord04
TimeLord04
Joined: 8 Sep 06
Posts: 1442
Credit: 72378840
RAC: 0

[Update:] My Win XP Pro x64

[Update:]

My Win XP Pro x64 with GTX-760 is FINALLY on FGRPB1G 1.17 Units!!!!!  SmileLaughing

The MAC is still finishing the 1.14 Units; but, should catch up and start crunching the 1.17 Units in the next couple hours.  I will monitor both systems and also continue monitoring the MAC for any more OpenCL Bug issues...  However; it seems that with the advent of the 1.14 Units on MAC that the OpenCL Bug is gone.  I still only have 5 Invalids in my Results' Page; as previously mentioned, three 1.12 Units and two 1.13 Units.

 

TL

TimeLord04
Have TARDIS, will travel...
Come along K-9!
Join SETI Refugees

TimeLord04
TimeLord04
Joined: 8 Sep 06
Posts: 1442
Credit: 72378840
RAC: 0

[Observation on MAC:] So

[Observation on MAC:]

So far, on the MAC with two GTX-750TI SC cards crunching two Units at a time per card, the Beta 1.17 Units crunch to about 40 Min. in and then with 4+ Min left Estimated, jump to 100% complete, Upload, and then are Ready To Report.  This is an improvement over the 1.14 Units where, typically, I had almost 30 Min. left Estimated and then the Units jumped to 100% complete.

 

I have quite a few Beta 1.17 Units in queue before I get to the Units just recently modified by Christian.  In Win XP Pro x64; I have considerably fewer 1.17 Beta Units to go through.  Some time tomorrow morning, XP Pro and the GTX-760 card will be on the Full Release 1.17 Units.  Hopefully, sometime later tomorrow afternoon the MAC will follow suit; but, I think it may be another day to day and a half before the MAC gets to the Full Release 1.17 Units.

 

TL

TimeLord04
Have TARDIS, will travel...
Come along K-9!
Join SETI Refugees

Mr P Hucker
Mr P Hucker
Joined: 12 Aug 06
Posts: 838
Credit: 519346596
RAC: 14812

I was getting WUs saying 1

I was getting WUs saying 1 CPU + 1 AMD, which used 0.75 CPUs and the GPU at 100% only 1/10th of the time.  I was told in another thread the problem was fixed, so cancelled them and got more, and these ones used no CPU or GPU at all!  Running Windows 10 with Intel 3570K CPU and AMD Radeon R9 290 graphics.

If this page takes an hour to load, reduce posts per page to 20 in your settings, then the tinpot 486 Einstein uses can handle it.

Betreger
Betreger
Joined: 25 Feb 05
Posts: 992
Credit: 1593329004
RAC: 776022

Christian Beer wrote:I fixed

Christian Beer wrote:

I fixed the Credit issue. You should now also get 5 times the Credit than before, sorry for that. I'm looking into the DCF/speedup issue now. I'm also running some benchmark tasks on a GTX750 Ti and compare them to the BRP4G search. I'm trying to adjust credit so it can be compared to BRP4G. I already got stable runtimes of 1h for BRP4G (1x) on this system.

The overall problem is that the FGRPB1G search is behaving a little bit different than BRP4G so we have to turn the knobs a little bit from time to time and see what the result is. If some of you could monitor your DCF and report any changes, that would be great.

I have seen no sign of that yet.

Jonathan Jeckell
Jonathan Jeckell
Joined: 11 Nov 04
Posts: 114
Credit: 1341981082
RAC: 958

Something odd is going on

Something odd is going on with this machine: 12057966.  It's a MacBookPro with an i7-3720QM and an NVIDIA GTX650M GPU.  It was running all of the FGBRP-GPU versions just fine, and at a speed one would expect vis a vis my other GPUs based on relative benchmark scores.  

But for some reason I'm getting a few work units that take ~9,700 seconds for 1,365 credit, and I mostly get units that take 32,000 seconds for the same credit!  By the way, that's about the same speed as the FGBRP-CPU work units not using the GPU at all (albeit for 693 credit, so it could be half the data) !  My system monitor is telling me the GPU is pegged out, but it begs the question what it's doing.  

Even if it is doing twice the work in the same amount of time, that's bad economics for the use of this GPU.

The work units on my Hackintosh (i7-5820K with NVIDIA GTX 960) seem kind of screwy too, but not to this level.  For a while I got 1.17 work units that took 1,060 seconds for 693 credit, then a streak of ones that took 3,700 seconds for 3,465 credit, and now work units that take the same time but for 1,365 credit and my GPU is "pulsing" instead of running continuously at its full potential.  All of these were v1.17 tasks and none Beta.

I'm not being shallow for worrying about my credit for this because this is the only way I can benchmark the performance/contribution of my system to optimize my contribution.  This Hackintosh was built explicitly and exclusively for Einstein@home.

TimeLord04
TimeLord04
Joined: 8 Sep 06
Posts: 1442
Credit: 72378840
RAC: 0

[Update:] Well, it FINALLY

[Update:]

Well, it FINALLY happened!!!  One of my Pending 1.14 Units just got marked Invalid.  Surprised

 

The original Wingman was running a 1.16, and the resend was a 1.17; both on Windows x64 Platforms.  The 1.16 was crunched on an ATI GPU and the 1.17 was crunched on an NVIDIA GPU.

 

Workunit was 266087314  Each of the Wingmen got 693 Credits, and I got 0!!!  I will continue monitoring; I still have many 1.14 Units Pending.

 

As to the 1.17 Units; I've had several validate on both of my machines.  Plenty of Pendings right now, too.  Again, I will continue monitoring and report.

 

TL

TimeLord04
Have TARDIS, will travel...
Come along K-9!
Join SETI Refugees

B.I.G
B.I.G
Joined: 26 Oct 07
Posts: 117
Credit: 1178354505
RAC: 988881

Jonathan Jeckell wrote:But

Jonathan Jeckell wrote:
But for some reason I'm getting a few work units that take ~9,700 seconds for 1,365 credit, and I mostly get units that take 32,000 seconds for the same credit!  By the way, that's about the same speed as the FGBRP-CPU work

Same problem here, the GPU WUs take 29.000 seconds and give 1350 credit. This lead to that my 8 year old Core 2 Duo MacBook Pro running 2 CPU Tasks has a higher RAC now that my 2012 MacBook Pro running only GPU Tasks on an GT 650M

On the MacPro with a NVIDIA GTX 780 the RAC is also massively breaking in. Now did you just reduce the credits given for the same amount of work or is this a performance problem?

MarkHNC
MarkHNC
Joined: 31 Aug 12
Posts: 37
Credit: 170965842
RAC: 0

Christian Beer wrote:I'm

Christian Beer wrote:
I'm trying to adjust credit so it can be compared to BRP4G.

As reported by others, work accomplished and points granted do not seem to be directly comparable to my experience with BRP4G.  I have two PC's that remain configured exactly as they were for BRP4G.  The only difference is that, on one of them (labeled "X2-7" below), the high GPU usage makes the Windows desktop, and specifically graphically-intensive applications, difficult to use, so I suspend GPU computing when I am actively using the desktop.  In an apples-to-apples comparison, this is an advantage, as it means that, for FGRP tasks, both machines are running without user interaction affecting run time.

Machines:
  "X2-7" is AMD Phenom II X4 965 (overclocked to 3.4GHz), 8GB RAM, GTX 650 SC 2GB, Win 7 Pro
  "W5" is Intel Xeon E5-2670v1 (2.60GHz), 32GB RAM, GTX 960 SSC 2GB, Win 10 Pro

X2-7 runs GPU tasks x1, W5 runs GPU tasks x2

Fortunately, I had a few validated BRP4 tasks remaining in my task lists, so I could do a comparison.

I understand the performance differences between the components, but the run time/points numbers speak for themselves (PPH = points per hour):

App           Mach  Min Hours  Max Hours  Avg Hours  Min PPH    Max PPH    Avg PPH
BRP4G         X2-7    0.84       1.02       0.90       982.29   1,188.94   1,122.47
BRP4G         W5      0.86       0.95       0.93     1,056.87   1,164.78   1,078.88
FGRPopenCL    X2-7    5.45       5.54       5.48       246.49     250.30     248.90
FGRPopenCL    W5      1.18       1.21       1.03     1,127.62   1,160.65   1,140.10

As you can see, BRP4G consistently ran 1x GPU tasks on X2-7 and 2x GPU tasks on W5 in approximately the same amount of time.  However, FGRP is taking five times as long, and so is granting 20% of the credit on X2-7.  In addition, since I was able to use the desktop without too much difficulty while GPU tasks where running, I am accomplishing even less work/getting less points whenever the desktop is in use.

Hope this input helps.

Jim1348
Jim1348
Joined: 19 Jan 06
Posts: 463
Credit: 257957147
RAC: 0

Jonathan Jeckell wrote:The

Jonathan Jeckell wrote:

The work units on my Hackintosh (i7-5820K with NVIDIA GTX 960) seem kind of screwy too, but not to this level.  For a while I got 1.17 work units that took 1,060 seconds for 693 credit, then a streak of ones that took 3,700 seconds for 3,465 credit, and now work units that take the same time but for 1,365 credit and my GPU is "pulsing" instead of running continuously at its full potential.  All of these were v1.17 tasks and none Beta.

I'm not being shallow for worrying about my credit for this because this is the only way I can benchmark the performance/contribution of my system to optimize my contribution.  This Hackintosh was built explicitly and exclusively for Einstein@home.

I use credit for optimization purposes too and to make sure things are running correctly, so I am glad you pointed this out.  However, on my two GTX 960s running under Ubuntu 16.10, I don't see nearly that much variation.  Since 22 December they have been running about 2450 seconds for 1365 credits.  About once a day (running 24/7), I get one that runs 700 seconds for 693 credit, no big deal. 

Also, on my GTX 750 Tis running under Win7 64-bit, I see a similar story, just twice as long.  They run at 4800 seconds for 1365 credit, except for the earlier ones on 22 December and before.  Those ran shorter for less credit, but I assume they were tweaking the system then.   I don't run any on the CPU, so can't make any comparison.  It may be that the GPUs are not making as much credit as they should as the CPU versions, but I expect they are doing a lot more work or Bernd would have noticed by now.  I otherwise don't pay much attention to credits, so will have to leave that comparison to others.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.