Gravitational Wave search GPU App version

Due to the excellent work of our French volunteer Christophe Choquet we finally have a working OpenCL version of the Gravitational Wave search ("S6CasA") application. Thank you Christophe!

This App version is currently considered 'Beta' and being tested on Einstein@Home. To participate in the Beta test, you need to edit your Einstein@Home preferences, and set "Run beta/test application versions?" to "yes".

It is currently available for Windows (32 Bit) and Linux (64 Bit) only, and you should have a card which supports double precision FP in hardware.

BM

Comments

Claggy
Claggy
Joined: 29 Dec 06
Posts: 560
Credit: 2,699,403
RAC: 0

Gravitational Wave search GPU App version

Quote:
It is currently available for Windows (32 Bit) and Linux (64 Bit) only, and you should have a card which supports double precision FP in hardware.


On my Win 7 x64 i5-3210M/GT650M/Intel_Graphics_HD4000 host I'm getting:

Quote:

2014-04-11 11:18:14.2272 [PID=29146] Request: [USER#xxxxx] [HOST#8941572] [IP xxx.xxx.xxx.80] client 7.2.42
2014-04-11 11:18:14.2881 [PID=29146] [send] effective_ncpus 3 max_jobs_on_host_cpu 999999 max_jobs_on_host 999999
2014-04-11 11:18:14.2881 [PID=29146] [send] effective_ngpus 2 max_jobs_on_host_gpu 999999
2014-04-11 11:18:14.2881 [PID=29146] [send] Not using matchmaker scheduling; Not using EDF sim
2014-04-11 11:18:14.2882 [PID=29146] [send] CPU: req 0.00 sec, 0.00 instances; est delay 0.00
2014-04-11 11:18:14.2882 [PID=29146] [send] CUDA: req 17926.81 sec, 0.50 instances; est delay 0.00
2014-04-11 11:18:14.2882 [PID=29146] [send] Intel GPU: req 0.00 sec, 0.00 instances; est delay 0.00
2014-04-11 11:18:14.2882 [PID=29146] [send] work_req_seconds: 0.00 secs

Snip

2014-04-11 11:18:14.2882 [PID=29146] [send] active_frac 0.935463 on_frac 0.963980 DCF 1.528061
2014-04-11 11:18:14.2897 [PID=29146] [send] [HOST#8941572] is reliable
2014-04-11 11:18:14.2897 [PID=29146] [send] set_trust: random choice for error rate 0.029197: yes
2014-04-11 11:18:14.2897 [PID=29146] [mixed] sending non-locality work first (0.5707)
2014-04-11 11:18:14.3054 [PID=29146] [mixed] sending locality work second
2014-04-11 11:18:14.5549 [PID=29146] [version] Checking plan class 'SSE2'
2014-04-11 11:18:14.5561 [PID=29146] [version] reading plan classes from file '/BOINC/projects/EinsteinAtHome/plan_class_spec.xml'
2014-04-11 11:18:14.5561 [PID=29146] [version] plan class ok
2014-04-11 11:18:14.5561 [PID=29146] [version] Don't need CPU jobs, skipping version 105 for einstein_S6CasA (SSE2)
2014-04-11 11:18:14.5561 [PID=29146] [version] Checking plan class 'GWopencl-ati-Beta'
2014-04-11 11:18:14.5561 [PID=29146] [version] beta test app versions not allowed in project prefs.
2014-04-11 11:18:14.5561 [PID=29146] [version] Checking plan class 'GWopencl-nvidia-Beta'
2014-04-11 11:18:14.5562 [PID=29146] [version] beta test app versions not allowed in project prefs.
2014-04-11 11:18:14.5562 [PID=29146] [version] Checking plan class 'SSE2-Beta'
2014-04-11 11:18:14.5562 [PID=29146] [version] beta test app versions not allowed in project prefs.
2014-04-11 11:18:14.5562 [PID=29146] [version] no app version available: APP#24 (einstein_S6CasA) PLATFORM#9 (windows_x86_64) min_version 0
2014-04-11 11:18:14.5562 [PID=29146] [version] no app version available: APP#24 (einstein_S6CasA) PLATFORM#2 (windows_intelx86) min_version 0
2014-04-11 11:18:14.5580 [PID=29146] [debug] [HOST#8941572] MSG(high) No work sent
2014-04-11 11:18:14.5580 [PID=29146] [debug] [HOST#8941572] MSG(high) see scheduler log messages on http://einstein.phys.uwm.edu//host_sched_logs/8941/8941572
2014-04-11 11:18:14.5580 [PID=29146] Sending reply to [HOST#8941572]: 0 results, delay req 60.00
2014-04-11 11:18:14.5583 [PID=29146] Scheduler ran 0.338 seconds

I've already got CPU beta CasA 1.06 work from this host, it is on the home venue, and Run beta is set to yes at that venue,
generally on Windows x64 projects supply x32 Cuda apps because there is no need for 64bit addressing, and there is a slowdown running 64bit Cuda apps because of the 64bit addressing.

Claggy

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4,305
Credit: 248,987,352
RAC: 34,098

Thanks for reporting.

Thanks for reporting.

Should work now.

BM

BM

Holmis
Joined: 4 Jan 05
Posts: 1,118
Credit: 1,055,935,564
RAC: 0

After editing my prefs and

After editing my prefs and upping the cache a bit I managed to get a few S6 tasks assigned to my GTX660Ti, I then suspended other GPU tasks is queue to try on out while I'm here to check on things.
All tasks immediately got a computational error with the following in the stderr:
(unknown error) - exit code -1073741515 (0xc0000135)

Usually a sign of a missing file or .dll etc. The only file downloaded was:
einstein_S6CasA_1.06_windows_intelx86__GWopencl-nvidia-Beta.exe

Next step will be to try a driver upgrade to make sure all files are present and accounted for.
Other GPU work (BRP4G and FGRP3) run OK.

Edit: Updated the Nvidia driver to 335.23 via clean install but that did not change things, still getting instant error with the above error message.
Testing on hold until further notice.

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 2,143
Credit: 2,926,414,666
RAC: 787,235

RE: After editing my prefs

Quote:

After editing my prefs and upping the cache a bit I managed to get a few S6 tasks assigned to my GTX660Ti, I then suspended other GPU tasks is queue to try on out while I'm here to check on things.
All tasks immediately got a computational error with the following in the stderr:
(unknown error) - exit code -1073741515 (0xc0000135)

Usually a sign of a missing file or .dll etc. The only file downloaded was:
einstein_S6CasA_1.06_windows_intelx86__GWopencl-nvidia-Beta.exe

Next step will be to try a driver upgrade to make sure all files are present and accounted for.
Other GPU work (BRP4G and FGRP3) run OK.


That can sometimes be unravelled by using dependency walker.

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4,305
Credit: 248,987,352
RAC: 34,098

Is the Windows version

Is the Windows version running successfully anywhere?

BM

BM

Holmis
Joined: 4 Jan 05
Posts: 1,118
Credit: 1,055,935,564
RAC: 0

Got this with dependency

Got this with dependency walker when opening "einstein_S6CasA_1.06_windows_intelx86__GWopencl-nvidia-Beta.exe":

The swedish phrase "Det går inte att hitta filen" translates to "File not found".

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 2,143
Credit: 2,926,414,666
RAC: 787,235

RE: Is the Windows version

Quote:

Is the Windows version running successfully anywhere?

BM


I'll test too.

Freddykrug
Freddykrug
Joined: 29 May 10
Posts: 2
Credit: 12,047,497
RAC: 0

Sorry, but why this is not

Sorry, but why this is not done at Albert@home?

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 2,143
Credit: 2,926,414,666
RAC: 787,235

Got a similar but rather

Got a similar but rather shorter list of missing files with the 32-bit version of dependency walker (bitness matters, with that tool).

Host is host 5744895 - 64-bit Windows 7 with NV GTX 670, driver 335.23 (about 4 weeks ago).

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4,305
Credit: 248,987,352
RAC: 34,098

Thanks. Looks like something

Thanks. Looks like something went wrong with the build. While investigating, I'll disable the current Windows Beta App version.

BM

BM

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 2,143
Credit: 2,926,414,666
RAC: 787,235

Googling suggests the problem

Googling suggests the problem might be related to missing Microsoft Visual Studio runtime redistributable packages. Are you using either VS 2008 or VS 2010 - if so, which?

(tasks are erroring, as Holmis described, but I'll save some for testing later)

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4,305
Credit: 248,987,352
RAC: 34,098

RE: Are you using either VS

Quote:
Are you using either VS 2008 or VS 2010 - if so, which?

None - MinGW.

BM

BM

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 2,143
Credit: 2,926,414,666
RAC: 787,235

OK, those API- exports are

OK, those API- exports are probably not relevant, then.

Maybe these are more significant, if you can recognise any of them?

[D? ] DCOMP.DLL

Import Ordinal Hint Function Entry Point
------ ------------- ---- ------------------------ -----------
[OE ] 1017 (0x03F9) N/A N/A Not Bound
[CE ] N/A N/A DCompositionCreateDevice Not Bound

[D? ] GPSVC.DLL

Import Ordinal Hint Function Entry Point
------ ------- ---- ------------------------------------- -----------
[CE ] N/A N/A ProcessGroupPolicyCompletedExInternal Not Bound
[CE ] N/A N/A RsopAccessCheckByTypeInternal Not Bound
[CE ] N/A N/A RsopFileAccessCheckInternal Not Bound
[CE ] N/A N/A RsopSetPolicySettingStatusInternal Not Bound
[CE ] N/A N/A ProcessGroupPolicyCompletedInternal Not Bound
[CE ] N/A N/A RsopResetPolicySettingStatusInternal Not Bound

[D? ] IESHIMS.DLL

Import Ordinal Hint Function Entry Point
------ ------- ---- ------------------------------------ -----------
[CE ] N/A N/A IEShims_Initialize Not Bound
[CE ] N/A N/A IEShims_InDllMainContext Not Bound
[CE ] N/A N/A IEShims_GetOriginatingThreadId Not Bound
[CE ] N/A N/A IEShims_CreateWindowEx Not Bound
[CE ] N/A N/A IEShims_SetRedirectRegistryForThread Not Bound

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4,305
Credit: 248,987,352
RAC: 34,098

The problem seems to be that

The problem seems to be that the libstdc++-6.dll is not linked statically into the App. The version of the MinGW compiler that I used for the first time apparently requires an addional option for this (-static-libstdc++).

Will build a new App, however I doubt that I can publish it before Monday.

BM

BM

Sunny129
Sunny129
Joined: 5 Dec 05
Posts: 162
Credit: 160,342,159
RAC: 0

just a heads-up - my work

just a heads-up - my work host recently downloaded 2 S6casA 1.06 (SSE2 Beta) tasks, and this host most certainly is not set to accept beta/test applications. also, it shows as SSE2, not OpenCL...are there now SSE2 Beta tasks that i haven't yet seen mentioned on the boards?

Holmis
Joined: 4 Jan 05
Posts: 1,118
Credit: 1,055,935,564
RAC: 0

RE: just a heads-up - my

Quote:
just a heads-up - my work host recently downloaded 2 S6casA 1.06 (SSE2 Beta) tasks, and this host most certainly is not set to accept beta/test applications. also, it shows as SSE2, not OpenCL...are there now SSE2 Beta tasks that i haven't yet seen mentioned on the boards?


Yes, the CPU beta apps were announced in the Tech news section here.

sorcrosc
sorcrosc
Joined: 3 May 13
Posts: 8
Credit: 16,039,340
RAC: 0

RE: Sorry, but why this is

Quote:
Sorry, but why this is not done at Albert@home?

I'm also curious about this. Isn't albert@home the beta test platform for einstein@home?

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4,305
Credit: 248,987,352
RAC: 34,098

RE: Sorry, but why this is

Quote:
Sorry, but why this is not done at Albert@home?

Albert@Home was originally set up to test server side code, including changes that were necessary to support new applications (e.g. locality scheduling). Historically we were testing Beta App versions on Einstein as "anonymous platform packages" as long as we had only one search (i.e. application). With the introduction of a second search (BRP), maintainig app_info.xml files became a hazzle, so we switched to testing new app versions on Albert. This, however, has its own drawbacks:

- Providing the right amount of "work" of the right type (that belongs to the App version that we want to test) is getting increasingly difficult. We e.g. need to maintain a separate line of code for each workunit generator.

- The computing power and thus throughput on Albert is not very high, so you need to repeatedly reconfigure the system to not waste any on applications that you don't want to test right now.

- When you issue a new series of application versions and generate work for it, validation compares results of these only to that of versions of the same series. A comparison to the result of an established version - which is what we really want - happens only in very rare, accidental cases.

- the variation of different systems attached to Albert is not at all representative for what's running on Einstein.

- due to the low throughput, especially when tasks are assigned with additional constraints (locality scheduling), feedback (i.e. validation) is very slow, slowing down development.

- App version testing is not independent from the server code. Occasionally the need for testing application version prevented us from testing server side changes on Albert, or at least required us to postpone these.

For us these were enough arguments to shift testing of application versions back to Einstein. Server code and new applications will still be tested on Albert, and I think for some time at least BRP app versions, too.

BM

BM

ExtraTerrestrial Apes
ExtraTerrestria...
Joined: 10 Nov 04
Posts: 770
Credit: 569,653,661
RAC: 161,307

RE: and you should have a

Quote:
and you should have a card which supports double precision FP in hardware.


Ouch - this could mean serious trouble! Does the entire calculation need DP or is it only for a few calculations? In the former case the app will work on all modern GPUs (except Intel), but will be very slow on all but a few high end chips.

You're surely aware of this, but most crunchers are not (as evidenced by the many people running Milkyway on hardware which is really not suitable for that task). Ideally any GPU slow in DP should rather stick with BRP calculations, as long as E@H has enough of those. Leave the DP stuff for the "big guns" and CPUs (ideally with SSE3/4 or AVX1/2).

But I don't know how to handle this properly and educate users. It would be a shame if they only found out about this after their RAC dropped to 1/2, 1/4 or even worse.

MrS

Scanning for our furry friends since Jan 2002

choks
choks
Joined: 24 Feb 05
Posts: 16
Credit: 142,371,364
RAC: 152,902

Double Precision is used to

Double Precision is used to rebuild a reference grid with correct numerical accuracy. This is called about 0.2% of the loops.

99.8% of the loops are SP and heavy intensive memory algorithm.It pushed the GPU to it's memory I/O limits.
But the rebuild of the grid is about 1/5 of total execution time.

The performance comes from:
- the memory bandwith. This is by far the limiting factor, and the time to execute a task is linear to GPU memory throughput. Even 4 years old card with fast DDR5 performs well.
- SP if not memory bounded. Unlikely to happen, because the inner loop is 4 adds , one compare and 4 single float load. Loading a float4 takes ages compared to performing the maths.

DP is just required for accuracy but don't play a significant role in execution time.

I tried to reduce as much memory writes as I can to speed up the calculations. Data filtering algorithm is used to reduce memory writes and GPU->CPU transfert to the lowest possible values.

Christophe

Bikeman (Heinz-Bernd Eggenstein)
Bikeman (Heinz-...
Moderator
Joined: 28 Aug 06
Posts: 3,522
Credit: 694,882,199
RAC: 128,978

RE: just a heads-up - my

Quote:
just a heads-up - my work host recently downloaded 2 S6casA 1.06 (SSE2 Beta) tasks, and this host most certainly is not set to accept beta/test applications

This needs to be fixed.
[EDIT] Actually that was already fixed [EDIT]
For the moment the beta test app versions are all disabled, so no new work should be distributed for them.

I spotted on one of my Macs that a Mac Intel 32 bit beta app was actually hanging (no progress in two days), if volunteers spot this on their Macs: feel free to abort the task.

But we did get some valuable clues from the beta test so far, thanks to all who (voluntarily or by accident...sorry for that) participated in it. I've seen run times for some CasA units below 800 sec from an NVIDIA card which is great!! It also demonstrates that (as Christophe explained) the double precision requirement doesn't hurt performance significantly (it was a card that, like most if not all NVIDIA consumer cards, has a rather pathetic double precision performance compared to AMD cards of the same price & performance range).

Cheers
HB

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4,305
Credit: 248,987,352
RAC: 34,098

RE: just a heads-up - my

Quote:
just a heads-up - my work host recently downloaded 2 S6casA 1.06 (SSE2 Beta) tasks, and this host most certainly is not set to accept beta/test applications.

There were a few problems in the server scheduler code related to Beta App versions, but these should have been fixed around noon (CEST) on Friday.

When precisely did your host get these tasks? And more importantly: does this still happen?

BM

BM

Claggy
Claggy
Joined: 29 Dec 06
Posts: 560
Credit: 2,699,403
RAC: 0

RE: And more importantly:

Quote:
And more importantly: does this still happen?


That shouldn't be possible at the moment:

Quote:
For the moment the beta test app versions are all disabled, so no new work should be distributed for them.

My i7-2600K has just picked up normal (CasA) v1.05 (SSE2) work even through Run beta is selected.

Claggy

Sunny129
Sunny129
Joined: 5 Dec 05
Posts: 162
Credit: 160,342,159
RAC: 0

RE: RE: just a heads-up -

Quote:
Quote:
just a heads-up - my work host recently downloaded 2 S6casA 1.06 (SSE2 Beta) tasks, and this host most certainly is not set to accept beta/test applications.

There were a few problems in the server scheduler code related to Beta App versions, but these should have been fixed around noon (CEST) on Friday.

When precisely did your host get these tasks? And more importantly: does this still happen?

BM


unfortunately i had to restart my machine yesterday, so my BOINC event log got wiped clean and i can no longer pinpoint exactly when those two SSE2 Beta tasks were downloaded. that said, i just scrolled all the way to the bottom of my work buffer, and while several S6casA SSE2 tasks have been downloaded since then, i'm happy to say that i've not gotten anymore S6casA SSE2 Beta tasks...so what ever you did fixed the problem and prevented my host from receiving Beta tasks when it shouldn't.

thanks Bernd!

Holmis
Joined: 4 Jan 05
Posts: 1,118
Credit: 1,055,935,564
RAC: 0

You can still access the

You can still access the event log messages from the file stdoutdae.txt located in the Boinc data directory, if you want even older messages check the file stdoutdae.old.

ExtraTerrestrial Apes
ExtraTerrestria...
Joined: 10 Nov 04
Posts: 770
Credit: 569,653,661
RAC: 161,307

Thanks for the detailed

Thanks for the detailed explanation, Christophe! This sounds really good and well developed.. not that I would have expected anything else from Einstein ;)

Concerning the memory bandwidth requirements: it will be interesting too see how much the significantly larger L2 cache of Maxwell helps (assuming the other chips will share this property of GM107).

MrS

Scanning for our furry friends since Jan 2002

choks
choks
Joined: 24 Feb 05
Posts: 16
Credit: 142,371,364
RAC: 152,902

Large L2 caches won't

Large L2 caches won't probably help that much in this case (unless the whole dataset can be cached).

The dataset used for the inner loop is 50-100 Mb. Each iteration, frequency shifted data are fetched almost linearly by all possible GPU memory controllers.

Given a memory frequency, the best is to have a larger bus width. 384+ bus width will do well.

-c

choks
choks
Joined: 24 Feb 05
Posts: 16
Credit: 142,371,364
RAC: 152,902

Just adding one thing to

Just adding one thing to NVIDIA Maxwell architecture and 2Mb L2 caches.

Currently, the GPU internal caches are rather small. So every tuning guide tells to read memory in consecutive locations (no gaps).

But for GW search, we reload about 40-100 times the same dataset slightly shifted each time.
So I was thinking of changing the order of the internal loops: using random reads inside the L2 cache, with a computation window moving across the 100k sample dataset.

Once the HW is released, it would be interesting to see what how a change on the code can improve performance.

After all, it works very well on CPU with big L2, so one could expect the same on GPU. Worth a try!

-c

ExtraTerrestrial Apes
ExtraTerrestria...
Joined: 10 Nov 04
Posts: 770
Credit: 569,653,661
RAC: 161,307

That sounds like a good idea,

That sounds like a good idea, if it works well! Especially if it was also at least as fast as the current version on regular GPUs.

BTW: the first Maxwells are available as GTX750/Ti. I'd expect the bigger chips to feature comparable compute to memory bandwidth balance, but don't know if and how the amount of L2 will be scaled.

MrS

Scanning for our furry friends since Jan 2002

floyd
floyd
Joined: 12 Sep 11
Posts: 133
Credit: 186,610,495
RAC: 0

I find that description by

I find that description by Christophe very interesting. It would be great to have a summary of basic characteristics for other applications as well, something roughly like this:

Application: S6CasA GPU
Required: DP capability, 1GB GPU RAM
Important: Memory bandwidth
Less important: SP performance, GPU interface speed
Negligible: DP performance

That could make the choice of applications or hardware much easier.

Alex
Alex
Joined: 1 Mar 05
Posts: 451
Credit: 503,804,968
RAC: 43,410

RE: Just adding one thing

Quote:

Just adding one thing to NVIDIA Maxwell architecture and 2Mb L2 caches.

Currently, the GPU internal caches are rather small. So every tuning guide tells to read memory in consecutive locations (no gaps).

But for GW search, we reload about 40-100 times the same dataset slightly shifted each time.
So I was thinking of changing the order of the internal loops: using random reads inside the L2 cache, with a computation window moving across the 100k sample dataset.

Once the HW is released, it would be interesting to see what how a change on the code can improve performance.

After all, it works very well on CPU with big L2, so one could expect the same on GPU. Worth a try!

-c

Hi,
I have a GTX750ti but no system with win x86 installed.
Will there be a version for win 64 in the near future?

Alexander

ExtraTerrestrial Apes
ExtraTerrestria...
Joined: 10 Nov 04
Posts: 770
Credit: 569,653,661
RAC: 161,307

The 32 bit executable should

The 32 bit executable should run under 64 bit Win, unless paths or libraries get messed up. Or am I missing something here?

MrS

Scanning for our furry friends since Jan 2002

Bikeman (Heinz-Bernd Eggenstein)
Bikeman (Heinz-...
Moderator
Joined: 28 Aug 06
Posts: 3,522
Credit: 694,882,199
RAC: 128,978

We launched a new round of

We launched a new round of beta GPU (OpenCL) app versions for the Gravitational Wave search:

Version 1.07
Linux: 64 bit, NVIDIA, AMD/ATI
Windows 32 bit & 64 bit, NVIDIA, AMD/ATI

Hardware requirements are unchanged. You should only get these versions if "beta test" app versions are enabled in your web preferences.

For the moment we will focus on testing the OpenCL app versions, when those are stable we will roll out beta-test updated versions for the CPU app versions as well.

If you are unsure whether your card supports double precision, I recommend checking Wikipedia's list of GPU cards, e.g.

http://en.wikipedia.org/wiki/Radeon_HD_7000_Series

http://en.wikipedia.org/wiki/Radeon_HD_6000_Series

http://en.wikipedia.org/wiki/Radeon_HD_5000_Series

http://en.wikipedia.org/wiki/GeForce_200_Series

GeForce 400 , 500, 600, 700 cards should ALL support double precision (at least with current drivers).

Those are lists for desktop GPUs, mobile GPUs and GPUs integrated into a CPU/APU have similar Wikipedia entries.

Many thanks for testing,
HBE

Mumak
Joined: 26 Feb 13
Posts: 325
Credit: 3,435,840,121
RAC: 1,855,398

I'm running a test on a

I'm running a test on a Tahiti XT now.
Looks weird, since the task shows running state and progress is incremented, but GPU usage is 0%, GPU clock in the lowest state. I've observed only very few and small peaks to a slightly higher state, but most of the time it looks like idle...

-----

Alex
Alex
Joined: 1 Mar 05
Posts: 451
Credit: 503,804,968
RAC: 43,410

This

This host
http://einsteinathome.org/host/6801076/tasks
runs the gw opencl nvidia app on nvidia GTX750ti.
The wu is not displayed in normal tasks view, one needs to click to the gw (CaSa) tasks to make the wu visible.
two igpu arecibu wu's are also running, so if you want to get 'clean' results subtract 5-8% from displayed crunching time.

Alexander

Edit
This is what gpu-z shows:
https://dl.dropboxusercontent.com/u/50246791/GW%20opencl%20nvidia%20gpu%20usage.PNG

Mumak
Joined: 26 Feb 13
Posts: 325
Credit: 3,435,840,121
RAC: 1,855,398

OK, found the problem with

OK, found the problem with low load - CPUs were saturated. After freeing some cores, the GPU usage is now ~62 %.

But the progress jumps up/down. Was already >50 % completed, then jumped down to 9, 14, 45, 53 %...

----

Reached 99%, then stayed a while there, but after completion ended up with error:

Quote:
Output file h1_0814.00_S6Directed__S6CasAf40a_814.8Hz_270_0_0 for task h1_0814.00_S6Directed__S6CasAf40a_814.8Hz_270_0 absent

Machine http://einsteinathome.org/account/tasks&offset=0&show_names=1&state=0&appid=24

---

Runtime on R9 280X: 685 secs

-----

Peciak
Peciak
Joined: 16 Jun 09
Posts: 2
Credit: 74,133,035
RAC: 0

WIN 7 64, i7 377K ATI 7970

WIN 7 64, i7 377K ATI 7970 driver 14.3 -> Error while computing
http://einsteinathome.org/host/10952185/tasks&offset=0&show_names=1&state=5&appid=24

Alex
Alex
Joined: 1 Mar 05
Posts: 451
Credit: 503,804,968
RAC: 43,410

First result shows a runtime

First result shows a runtime of 1736.59 sec for GTX750ti

Holmis
Joined: 4 Jan 05
Posts: 1,118
Credit: 1,055,935,564
RAC: 0

First completed unit is in:

First completed unit is in: WUID #187876671, second copy is unsent so it might take a while to get this one verified...

My observations during crunching this task:

  • * Video RAM usage: ~400MB
    * RAM usage: ~400MB
    * Task set to use 0.5 GPU + 1 GPU, it actually used one full core for the whole duration of the task.
    * Almost 4 min. from start of task until the GPU kicked in, this causes newer Boinc versions to start to increase the percentage done until the app actually reports some progress causing the percentage to first go up and the reset to what the app reported.
    I assume that delay at the beginning are caused by some initial preparations that has to be done at startup. Second task took a shorter time but I was composing this at the time so don't know for sure how long it took until the GPU started working.
    * GPU load is a sigsaw pattern with lows of ~55% and highs of 99% load. About 4-5 seconds between the shifts. When the load goes down the memory controller load goes up from 1% to ~50%. Considering Christophe's comments earlier about the bus with I believe that the 192-bit bus on my 660Ti is quite a bottleneck.
    * When the GPU load hits 99% I experience quite severe screen lags, enough to not run this while using the computer for watching video. That's not a problem on this machine when I run the other searches x2 on the GPU.
    * And finally the run time, it came in at a whopping 1,744.70s, compared to the 36000s that my CPU (i7 HT=On) can manage when averaging the 13 latest valid results! Impressive!

Edit: Second task, WUID 187873674 completed in 1,545.44s when not running the iGPU but one core free.
Edit2: First run was also with one core free but one task running on the iGPU.

Senilix
Senilix
Joined: 25 Mar 05
Posts: 1
Credit: 10,273,659
RAC: 0

My first wu successfully

My first wu successfully finished after 450 sec on a ATI r9 280x, GPU utilization showed 82%.
Now waiting for my wingman to see if it validates.

Bikeman (Heinz-Bernd Eggenstein)
Bikeman (Heinz-...
Moderator
Joined: 28 Aug 06
Posts: 3,522
Credit: 694,882,199
RAC: 128,978

Thanks for the feedback so

Thanks for the feedback so far.

Strange, we see a lot of errors in the phase AFTER the GPU computation is already finished, when results are to be written to the disk. We'll have to look a bit deeper to find the root cause.

HB

Bikeman (Heinz-Bernd Eggenstein)
Bikeman (Heinz-...
Moderator
Joined: 28 Aug 06
Posts: 3,522
Credit: 694,882,199
RAC: 128,978

RE: My first wu

Quote:
My first wu successfully finished after 450 sec on a ATI r9 280x, GPU utilization showed 82%.
Now waiting for my wingman to see if it validates.

Amazing!!!

The ca. 320 GB/s memory bandwidth of this card seems to help a lot.

HB

ravenigma
ravenigma
Joined: 20 Aug 10
Posts: 69
Credit: 80,551,758
RAC: 32

First attempt, WU 431061981

First attempt, WU 431061981 ended in error.

Radeon HD 7950 and AMD Phemom II x4 965

Win7 x64

GPU kicked in at 10% completion at stayed at 70%-75% load until 99% completion.

Total time 703s.

Alex
Alex
Joined: 1 Mar 05
Posts: 451
Credit: 503,804,968
RAC: 43,410

Have now first results from

Have now first results from HD5830
http://einsteinathome.org/host/10105883/tasks

1590 - 1640 sec.

Edit: first finished wu from AMD A10 7700K APU is in:

http://einsteinathome.org/host/10283382/tasks
3539 sec !!!

Edit:
HD7850 XT (1536 Shaders) failed after 99%

GT 640 (GK208) finished in 6546 sec.

Might be worth to try it on Intel GPU's ?

Amauri
Amauri
Joined: 12 Jul 11
Posts: 7
Credit: 40,549,315
RAC: 17,064

Linux 64-bit, GT 640 ->

Linux 64-bit, GT 640 -> Error:
http://einsteinathome.org/task/430282776

Quote:

2014-04-14 17:33:31.7874 (6502) [normal]: Reading input data ...
Failed to open SFT '../../projects/einstein.phys.uwm.edu/l1_0978.45_S6Directed' for reading: Permission denied

Failed to open locator '../../projects/einstein.phys.uwm.edu/l1_0978.45_S6Directed : 344152'

I've checked this file's permissions, it's as expected (644)...

ravenigma
ravenigma
Joined: 20 Aug 10
Posts: 69
Credit: 80,551,758
RAC: 32

Four up, four down. All WUs

Four up, four down. All WUs have resulted in errors for me at the end of computation on my AMD machine.

Jim1348
Jim1348
Joined: 19 Jan 06
Posts: 463
Credit: 257,957,147
RAC: 0

No errors thus far on six

No errors thus far on six completed on a WinXP machine, with each GPU having an E8400 core.

The first one on my GTX 650 Ti completed in 2,591 seconds (validated).
My GTX 660 is now averaging about 1900 seconds, with the GTX 650 Ti running BRP4G/BRP5s. A faster CPU would of course help this a little.

ravenigma
ravenigma
Joined: 20 Aug 10
Posts: 69
Credit: 80,551,758
RAC: 32

5 of 5 completed successfully

5 of 5 completed successfully on my other machine.

GTX 780Ti and i7-3770k

Win7 x64

Average time = 933s. Perhaps due to less DP capability of Nvidia vs AMD?

rbpeake
rbpeake
Joined: 18 Jan 05
Posts: 266
Credit: 1,098,844,464
RAC: 729,452

How often do these

How often do these checkpoint? I had been running for over 30 minutes and exited BOINC, and the WU started over from the beginning.

Thanks.

boinc127
boinc127
Joined: 17 Mar 11
Posts: 23
Credit: 4,007,475
RAC: 0

So far I've run through 8

So far I've run through 8 workunits with the NVidia program, and all have processed correctly, within an hour or less, I'm just waiting for validation from wingmen.

N.B.
Also, before the SSE2-Beta app was pulled, I got 2 results validated and I'm working on the final 3 workunits now. But compared to the opencl program version, the CPU versions take forever!