Gamma-ray pulsar binary search #1 on GPUs

DanNeely
DanNeely
Joined: 4 Sep 05
Posts: 1364
Credit: 3562358667
RAC: 0

Holmis wrote:Bernd

Holmis wrote:
Bernd Machenschalk wrote:
- There is a new "Beta Test" app version 1.18 with a few improvements Christophe developed over the holidays. It should be significantly faster than 1.17, in particular on GPUs that support double precision.

If this speedup holds for all FGRP-GPU tasks then I suspect that some adjustments to the estimates needs to be done again or my DCF will start swinging more wildly. My DCF right now is at 0.84 and the FGRP-GPU tasks, both 1.17 and 1.18, are estimated at about 54 minutes.

 

Even before it hit, the spread in the DCF targets from CPU and GPU tasks has meant that trying to maintain a several day queue on my Haswell + GTX980 and GTX1080 hosts has occasionally meant needing to abort blocks of >100 CPU tasks that were downloaded when a large batch of GPU ones smashed the DCF down crazy low and massively over allocated my CPU.

Betreger
Betreger
Joined: 25 Feb 05
Posts: 992
Credit: 1591145698
RAC: 767793

After running a few of these

After running a few of these 1.18 tasks on my GTX660 it seems times are down to 2.5 hrs from a bit over 4hrs running 2 at a time. The only thing that would make me happier would be a cuda 50 app so I could get my 2 cpu cores back.

Mumak
Joined: 26 Feb 13
Posts: 325
Credit: 3523897981
RAC: 1550770

FP64 code consumes more

FP64 code consumes more power, so if you see the GPU clock falling, you might need to raise the GPU power limit.
Also undervolting might help in lots of cases, but depends on particular GPU design.

-----

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5872
Credit: 117657126073
RAC: 35184627

DanNeely wrote:Even before it

DanNeely wrote:
Even before it hit, the spread in the DCF targets from CPU and GPU tasks has meant that trying to maintain a several day queue on my Haswell + GTX980 and GTX1080 hosts has occasionally meant needing to abort blocks of >100 CPU tasks that were downloaded when a large batch of GPU ones smashed the DCF down crazy low and massively over allocated my CPU.

It may not just be the DCF mismatch.  Here is an example of what can happen.  It's not meant to represent any of your hosts because I want readers to understand the general principle and then work out for themselves if any of their hosts are affected.

Imagine a quad core host with a GPU running 2 concurrent tasks.  Each of these will require a CPU for support duties, leaving just 2 cores available for crunching CPU tasks.  If BOINC is set to use 100% of the CPU cores, it will fetch work for all 4 cores even if only 2 are being used for crunching.  So the problem is really compounded if you keep a multi-day cache.

As a simple example, let's say the cache setting is 4 days.  Let's also say that the fast GPU tasks have lowered the DCF to the point that CPU tasks are estimated at half the time they will actually take.  So, because of just the DCF behaviour, the 4 day cache has become 8 days of real work.  That's bad enough but shouldn't actually cause you to have to abort CPU tasks.  However on top of this, only 2 cores are doing the crunching so the tasks will actually take 16 days to crunch.

Can this problem be avoided?  Yes it can by making sure BOINC only fetches work for 2 cores rather than 4.  You need an app_config.xml to achieve this.  You need to do two things :-

  1. set BOINC prefs to use the apprpriate number of cores - 50% in this example.
  2. Create an app_config.xml with <gpu_usage> of 0.5 and <cpu_usage> of <0.5 say 0.4.  This overrides the default value of 1 for <cpu_usage>.  The key is to make sure this value times the number of GPU tasks being crunched doesn't equal or exceed a full CPU core.

With these settings this example host would still be crunching 2 GPU tasks and 2 CPU tasks exactly as before.  The only difference would be that the CPU work fetch would be the proper amount rather than double the proper amount.  It would still take 8 days to crunch the tasks rather than 4 but the work queue wouldn't grow any larger.  Obviously the DCF mismatch needs to be addressed as well but at least the problem is minimised whilst waiting for that.

 

Cheers,
Gary.

MAGIC Quantum Mechanic
MAGIC Quantum M...
Joined: 18 Jan 05
Posts: 1887
Credit: 1410824522
RAC: 1189721

Betreger wrote:After running

Betreger wrote:
After running a few of these 1.18 tasks on my GTX660 it seems times are down to 2.5 hrs from a bit over 4hrs running 2 at a time. The only thing that would make me happier would be a cuda 50 app so I could get my 2 cpu cores back.

(I agree about the cuda 50 app as you know)

Ok I decided to test the new v1.18 and just finished 8 tasks

This is on a SC'd 660Ti on a not so new Athlon  II X4 630 with 12GB ram....and Windows 10

It took an average of 2hrs 2min's  X2

Watching them run it had the GPU at 100% and the CPU at 100% (which made it slow just trying to  do anything else)

The one thing I am wondering since if it was mentioned I didn't see it but what is the difference between the tasks that are both running Gamma-ray pulsar binary search #1 on GPUs v1.18 (FGRPopencl-Beta-nvidia) BUT even running at basically the same amount of time the granted credits are different ( 3,465.00 and 1,365.00)

The one with 3,465 credits I am looking at right now did actually run about 1000 seconds faster.

How do you know what the difference is between these tasks other than the credits?

https://einsteinathome.org/task/602928802

https://einsteinathome.org/task/602928116

I woke up for some reason (pc's talking to me in my sleep) at 2am

Good thing since my 8-core did a Windows 10 Update and rebooted and it takes my PIN to start it back up again (I have to get rid of that so it will just start back up on its own)

So I go and take a look at the 660Ti and see if it is close to finishing one and I see one just finished and started a new task and one was almost to 2hrs and when it hit 89.xxx% it just froze up and I thought GREAT I check one and now it will crash on me......well I checked everything as you may imagine as it is sitting there and as I bring up the task manager just to see that it went to 100% and started a new one........I checked what it said in the stats and that was one of the tasks that got 1,365 credits.....no idea if that happens all the time or not.

Ok its almost 3:30 am so I will see if the Science Channel will put me back to  sleep and I will see how the 660Ti is doing later today Cool

[AF>EDLS]GuL
[AF>EDLS]GuL
Joined: 15 Feb 06
Posts: 15
Credit: 227794659
RAC: 0

Gary Roberts wrote: With

Gary Roberts wrote:

With these settings this example host would still be crunching 2 GPU tasks and 2 CPU tasks exactly as before.  The only difference would be that the CPU work fetch would be the proper amount rather than double the proper amount.

 

Very clear example, thanks.

Additionally, limiting the number of CPU cores used will avoid virtualbox multicore tasks to take all 4 cores, even if the gpu requests one of them, which is very limiting for the gpu.

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5872
Credit: 117657126073
RAC: 35184627

MAGIC Quantum Mechanic

MAGIC Quantum Mechanic wrote:
........I checked what it said in the stats and that was one of the tasks that got 1,365 credits.....no idea if that happens all the time or not.

I haven't checked but I'll bet the task that got 1,365 credits was a resend task for a failed task that was in a quorum when the award was 1,365.  You can easily spot a resend task.  It will have an extension in the task name higher than _1.  The two primary copies have _0 and _1 extensions.  The credit is locked in at the time the original WU was generated so if you get a resend for one of those it will be carrying the former credit value.

Also, all tasks get to 89.997% and then stop progressing continuously.  Crunching is over and the follow-up stage (sorting out the top candidates, I think) is happening.  When that is finished, progress will suddenly jump to 100% and the results will be uploaded.  This is not anything to worry about.

 

Cheers,
Gary.

MAGIC Quantum Mechanic
MAGIC Quantum M...
Joined: 18 Jan 05
Posts: 1887
Credit: 1410824522
RAC: 1189721

Gary Roberts wrote: I haven't

Gary Roberts wrote:

I haven't checked but I'll bet the task that got 1,365 credits was a resend task for a failed task that was in a quorum when the award was 1,365.  You can easily spot a resend task.  It will have an extension in the task name higher than _1.  The two primary copies have _0 and _1 extensions.  The credit is locked in at the time the original WU was generated so if you get a resend for one of those it will be carrying the former credit value.

Also, all tasks get to 89.997% and then stop progressing continuously.  Crunching is over and the follow-up stage (sorting out the top candidates, I think) is happening.  When that is finished, progress will suddenly jump to 100% and the results will be uploaded.  This is not anything to worry about.

Thanks Gary,

You are right about it doing that at 89.997%

I also checked that code and the 3,465 credit tasks had 0,1,and 2 and the ones with 1,365 had either 2 or 3

So of the 15 tasks so far 8 are the 3,465 version and 7 are the 1,365 version 

I loaded those the first time I saw we switched to v1.18

Next I am going to try my 560Ti OC but it is with my older 3-core Phenom- Win 10 - 8GB ram

If I make the drive to the post office in my rain storm I have four sticks of 8GB waiting for me!!

I still hope we can get back to the GPU only so I can run all 7 of mine since this is year 13 here and I also run those VB tasks for Cern and I started there 3 months before that......but no other projects and they work great together with the pure GPU only tasks.

Thanks again Gary and hope all is well.

-Samson

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5872
Credit: 117657126073
RAC: 35184627

MAGIC Quantum Mechanic

MAGIC Quantum Mechanic wrote:

... So of the 15 tasks so far 8 are the 3,465 version and 7 are the 1,365 version 

I loaded those the first time I saw we switched to v1.18

If you have a resend of a task originally generated with 1,365 as the award (these will be tasks whose name starts with LATeah0010L...) then that's what you'll get.  Tasks whose name starts with LATeah0009L... are the latest ones so resends of those will most likely be getting 3,465.  I don't remember if any 0009L tasks were generated before the change to 3,465 but maybe there were.

MAGIC Quantum Mechanic wrote:
Next I am going to try my 560Ti OC but it is with my older 3-core Phenom- Win 10 - 8GB ram

The speed of the CPU and having huge amounts of RAM don't seem to be all that critical.  I get pretty much the same GPU task elapsed times for a given GPU even with 2009 vintage core 2 quads with only 4GB RAM.  A lot of my more modern hosts only have 8GB.  However make sure you use at least dual channel mode - 2 matched sticks.  Maybe you need more RAM with Win10 :-).

I have a dual core host (2x2GB RAM) with a 1GB 550Ti and I started it up again just before the switch to 1.18.  It's running 1 GPU task and 1 CPU task (default settings).  With 1.17 the GPU elapsed times were around 9,700 secs.  With 1.18 this has dropped to around 6,200 secs.  Your 560Ti should do OK as is - 2x4GB RAM I presume? :-).

MAGIC Quantum Mechanic wrote:
I still hope we can get back to the GPU only ...

You can do GPU only right now, can't you?  Just disable the CPU version of the search in your project prefs.  Sure, you need a core per GPU task for support but the remaining cores on each host would be able to be used elsewhere.

MAGIC Quantum Mechanic wrote:
Thanks again Gary and hope all is well.

You're welcome, thank you, and all is very well at the moment.  I'm even setting up a couple of machines with dual GPUs for the first time and discovering bugs in the automated procedure my distro uses for multi-card configuration.  That procedure looks like it's handling the driver selection properly but only the first card gets the selected proprietary driver and the second card is left with a different (open source) driver which causes the X server to crash and the whole machine to lockup during startup with a black screen and the reset button being the only viable option.

Luckily I have a basic understanding of xorg.conf configuration so I quickly noticed the wrong driver (and other stuff not quite right as well).  I've manually adjusted the config file and I now have a dual 2GB HD7850 setup that's running very nicely.  It's a quad core host with 2x4GB RAM running 4 concurrent GPU tasks and 2 x O1MD tasks on CPU cores.

I'm quite pleased at how close the crunch times are for each GPU seeing as the second PCI-e slot is only x4.  I was expecting at least some sort of a slowdown but at most it seems like around 25 seconds only.  The x16 slot card takes around 2275 secs for a pair and the x4 slot card around 2300 secs (rough figures only but seems consistent).  I'll check properly when I get some time :-).

To fill in all my spare time at the moment, I think I'll go buy another dual slot board and a couple more AMD cards.  Might be time to try out the new stuff that needs AMDGPU-PRO drivers rather than fglrx.  Should be fun getting that to work on a non- 'buntu style of Linux :-).  I can always bother AgentB who has a RX 480 running on Ubuntu if I really get stuck.  I imagine he's the local expert on AMDGPU-PRO stuff.

 

Cheers,
Gary.

Kailee71
Kailee71
Joined: 22 Nov 16
Posts: 35
Credit: 42623563
RAC: 0

Bernd Machenschalk wrote:1.19

Bernd Machenschalk wrote:
1.19 Beta released (OSX only)

 

Hi Bernd,

 

feeling courageous again after my machines settled back down to 1.17s for a couple of days, and wanted to try these 1.19s again. So I switched the Beta back on in prefs but I keep getting only 1.17s now. Has 1.19 been pulled again?

 

Thanks for your support!

 

Kai.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.