Discussion Thread for the Continuous GW Search known as O2MD1 (now O2MDF - GPUs only)

robl
robl
Joined: 2 Jan 13
Posts: 1,633
Credit: 1,102,670,332
RAC: 692,292

curios.  on my linux host

curios.  on my linux host with amd gpu i am seeing the following:  run time/cpu time of 1081/525, but on some other members windows pc 1346/1061.  The WUs are both "Gravitational Wave search O2 Multi-Directional v1.10 () x86_64-pc-linux-gnu"  The windows pc is utilizing an AMD Radeon (TM) R9 390 Series (8192MB) while I am running a  AMD Radeon (TM) RX 480 Graphics (8097MB)  I suppose the cpu time difference of 525 and 1061 could be attributed to the GPUs.

Richie
Richie
Joined: 7 Mar 14
Posts: 579
Credit: 1,684,170,539
RAC: 63,040

I think it could be a 'linux

I think it could be a 'linux vs windows' thing. I'm running 3x with RX 580 on another host. Total run times may fluctuate somewhat (and occasionally there are some strange black sheeps included) but cpu time / run time factor seems to be very constantly 0.7 for that RX 580 + windows. I checked my two R9 390 + windows hosts and same factor for them is constantly 0.8. None of those systems are currently setup for dual-boot. Would've been nice to find out what the cpu times would be under linux. Mmmm, I have a faint memory that maybe something similar about the cpu time of a gpu application in linux vs. windows  has been discussed earlier at this forum.

edit: I see you have a Ryzen cpu in that host. Perhaps different type of cpus and systems may have an effect on the cpu times in general... with this new GW gpu app. I don't remember how it has been with the previous gpu applications.

Zalster
Zalster
Joined: 26 Nov 13
Posts: 3,059
Credit: 3,341,604,897
RAC: 0

Wow, these GPU O2MD1 run

Wow, these GPU O2MD1 run really fast compared to their CPU counterparts.  190 seconds compared to 25K seconds.

 

Edit..  Also looks like the app got updated to 1.10 from 1.09

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5,210
Credit: 43,575,393,583
RAC: 44,343,833

In this message in the O2AS

In this message in the O2AS discussion thread, I posted data for crunch times for the V1.09 GPU app when running on 4 different CPU/GPU combinations for task multiplicities up to 4x.  The results showed significant improvement in output (ie reduction in secs/task) in all cases when using the higher multiplicities.  With the advent of the O2MD1 search using GPUs, I'm keen to get similar information for the new V1.10 app.  It's a different type of search (directed at specific targets rather than covering the whole sky) so performance is likely to be quite different.

I decided to use 2 hosts that were used in the previous tests, the Q6600/RX 460 and the i5-3470/RX 570 which were the 1st and 4th from the previous list.  Both hosts got work for frequencies around the 215Hz mark so well above the low end values that were reported by others earlier on.

I found these tasks were able to crunch very quickly.  There are already enough returned results to provide some information about expected crunch times so I'll give details here using the same format (columns, abbreviations, etc) as previously.  Each concurrent GPU task had access to the support of a full CPU core.


CPU / GPU (Cores / Threads / GHz)    Tsks     Multi   Pnd  Val   Inc   Inv   Err   Productivity values (secs/task)
=================================    ====     =====   ===   ===   ===   ===   ===   ===============================
Q6600 / RX 460 (4C / 4T / 2.4 GHz)     20     1,2,3     20    0     0     0     0   1300s,  975s,  712s
i5-3470/RX 570 (4C / 4T / 3.2 GHz)     28     1,2,3,4   28    0     0     0     0    586s,  380s,  330s,  312s


Only small numbers were crunched at the lowest multiplicities - just enough to get a basic value for the crunch time.  The bulk of the results were at 3x for the RX 460 and 4x for the RX 570.  The crunch times seemed to become rather more variable for the 570 at 4x so I didn't try 4x for the 460 or anything higher than 4x for the 570.  There was good consistency in the times for both hosts up to 3x.

The CPU time component for each task was surprisingly constant irrespective of multiplicity.  I guess that suggests a fairly constant amount of CPU work per task which shows as a relatively uniform time if there's always a full core available.  The slower CPU will use more time to provide that constant amount of work.  Here is a small table to show the typical values of elapsed time/CPU time for both hosts at the multiplicities used.


GPU Type     Multi     Elapsed     CPU     Tsks
========      =====     =======     ===     ====
RX 460         1x         1300      496        1
RX 460         2x         1950      509        4
RX 460         3x         2135      452       15
RX 570         1x          586      278        1
RX 570         2x          760      278        4
RX 570         3x          990      286        3
RX 570         4x         1246      310       20

All results so far are pending.  Since it may be a while before any validations are performed, I've switched the hosts back to FGRPB1G until it becomes clear that validation is OK.  I don't see much point in crunching more until we see how validation goes.

Cheers,
Gary.

Richie
Richie
Joined: 7 Mar 14
Posts: 579
Credit: 1,684,170,539
RAC: 63,040

Zalster wrote:Wow, these GPU

Zalster wrote:
Wow, these GPU O2MD1 run really fast compared to their CPU counterparts.  190 seconds compared to 25K seconds.

Looks like that particular gpu can finish its duty cycle before the cpu get's its own workload done.... run times are smaller than cpu time Surprised That's computational speed metal !

cecht
cecht
Joined: 7 Mar 18
Posts: 709
Credit: 777,531,688
RAC: 346,271

For the v1.10 app on my two

For the v1.10 app on my two RX570s, running at 3X, I have 10 valids, with 280 pending and no errors or invalids, yet. That's looking hopeful!

Ideas are not fixed, nor should they be; we live in model-dependent reality.

robl
robl
Joined: 2 Jan 13
Posts: 1,633
Credit: 1,102,670,332
RAC: 692,292

cecht wrote:For the v1.10 app

cecht wrote:
For the v1.10 app on my two RX570s, running at 3X, I have 10 valids, with 280 pending and no errors or invalids, yet. That's looking hopeful!

Yes, except for:  https://einsteinathome.org/goto/comment/173777

cecht
cecht
Joined: 7 Mar 18
Posts: 709
Credit: 777,531,688
RAC: 346,271

robl wrote:cecht wrote:For

robl wrote:
cecht wrote:
For the v1.10 app on my two RX570s, running at 3X, I have 10 valids, with 280 pending and no errors or invalids, yet. That's looking hopeful!

Yes, except for:  https://einsteinathome.org/goto/comment/173777

Ahh, the joy of beta testing. :/

Ideas are not fixed, nor should they be; we live in model-dependent reality.

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5,210
Credit: 43,575,393,583
RAC: 44,343,833

cecht wrote:Ahh, the joy of

cecht wrote:
Ahh, the joy of beta testing. :/

After I did some initial testing yesterday, I got quite a few more tasks but then decided to not run them.  I was tempted to think all would be well but, .... so back to FGRPB1G they went for the overnight run.  As I survey the scene this morning, I'm sure glad I was cautious :-).

Unfortunately, a new app with more sensitivity sounds like longer crunch times ....  I guess that dramatic speed increase we were seeing may just be too good to be true :-).

Cheers,
Gary.

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 3,938
Credit: 200,208,828
RAC: 43,704

Our internal test showed a

Our internal test showed a runtime increase by about 20% (both CPU and GPU). We thought this to be justified.

 

BM

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.