Support for (integrated) Intel GPUs (Ivy Bridge and later)

ExtraTerrestrial Apes
ExtraTerrestria...
Joined: 10 Nov 04
Posts: 770
Credit: 578346872
RAC: 197536

Could you quickly try this in

Could you quickly try this in a test-build of your app? Seems like only developers could help any further in this matter.

MrS

Scanning for our furry friends since Jan 2002

Raistmer*
Raistmer*
Joined: 20 Feb 05
Posts: 208
Credit: 181425482
RAC: 7347

@Oliver And what AMD SDK

@Oliver
And what AMD SDK version did you use to build intel_gpu version of your app?
Also, there are NV-related comments in your OpenCL code. So, no OpenCL version for NV just because CUDA build faster or some another issues?

Raistmer*
Raistmer*
Joined: 20 Feb 05
Posts: 208
Credit: 181425482
RAC: 7347

RE: Could you quickly try

Quote:

Could you quickly try this in a test-build of your app? Seems like only developers could help any further in this matter.

MrS


Yes, I will try replicate all differencies eventually. Cause this 100% CPU issue deserves understanding IMO.

ExtraTerrestrial Apes
ExtraTerrestria...
Joined: 10 Nov 04
Posts: 770
Credit: 578346872
RAC: 197536

RE: Cause this 100% CPU

Quote:
Cause this 100% CPU issue deserves understanding IMO.


Yeah.. you could probably improve power efficiency of the iGPU by >50% just by getting rid of the CPU usage! Assuming performance wouldn't suffer, or course.

MrS

Scanning for our furry friends since Jan 2002

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 2143
Credit: 2958279587
RAC: 714378

RE: RE: Cause this 100%

Quote:
Quote:
Cause this 100% CPU issue deserves understanding IMO.

Yeah.. you could probably improve power efficiency of the iGPU by >50% just by getting rid of the CPU usage! Assuming performance wouldn't suffer, or course.

MrS


That was my thinking too. If the CPU could be 'held in reserve but not doing much' (the way the current Einstein app seems to be working), rather than 'spinning like mad', I reckon I'd see a 10W (>10%) reduction in total system power draw. That's a significant drop.

Oliver Behnke
Oliver Behnke
Moderator
Administrator
Joined: 4 Sep 07
Posts: 984
Credit: 25171438
RAC: 34

RE: @Oliver And what AMD

Quote:
@Oliver
And what AMD SDK version did you use to build intel_gpu version of your app?

2.6, for our Linux and Windows builds.

Quote:

Also, there are NV-related comments in your OpenCL code. So, no OpenCL version for NV just because CUDA build faster or some another issues?

Faster builds? Where did you see that (line number)?

No, our app just won't validate correctly when run on NVIDIA GPUs. We intended/developed it to be able to but never got round to get it working correctly. I'd have loved to drop CUDA in favour of OpenCL, but as I said, OpenCL is a dead end concerning NVIDIA...

Oliver

Einstein@Home Project

Raistmer*
Raistmer*
Joined: 20 Feb 05
Posts: 208
Credit: 181425482
RAC: 7347

I spoke about these lines in

I spoke about these lines in code (that imply you considered OpenCL build suitable for NV too hence was my question about speed/validness. Answer is - validness issues, OK):

// defined in OpenCL 1.1 (but Apple is still using 1.0)
#ifndef CL_DEVICE_COMPUTE_CAPABILITY_MAJOR_NV
#define CL_DEVICE_COMPUTE_CAPABILITY_MAJOR_NV 0x4000
#define CL_DEVICE_COMPUTE_CAPABILITY_MINOR_NV 0x4001
#endif

// NVIDIA-specific
bool nvidia = false;
cl_uint nvCompCapMajor = 0;
cl_uint nvCompCapMinor = 0;

in ocl_utilities.cpp.

#define VENDOR_AMD 1
#define VENDOR_NVIDIA 2

in demod_binary_ocl.cpp

BTW, vendor INTEL not defined at all.

PS. I run BOINC 6 on my dev netbook where most AstroPulse profiling done and your app requires BOINC 7 to be downloaded. So I installed CodeXL on another host, with ATi HD6950 + BOINC 7 but had not time to try profiling so far.
Will post results later (app downloaded with some task to do OK there).

Oliver Behnke
Oliver Behnke
Moderator
Administrator
Joined: 4 Sep 07
Posts: 984
Credit: 25171438
RAC: 34

RE: you considered OpenCL

Quote:
you considered OpenCL build suitable for NV too hence was my question about speed/validness. Answer is - validness issues, OK


Yep.

Quote:


#define VENDOR_AMD 1
#define VENDOR_NVIDIA 2

in demod_binary_ocl.cpp

BTW, vendor INTEL not defined at all.

Yep, we don't have any Intel-specifics in the code so far (even VENDOR_AMD is unused right now)...

Quote:

Will post results later (app downloaded with some task to do OK there).


Great!

Einstein@Home Project

Raistmer*
Raistmer*
Joined: 20 Feb 05
Posts: 208
Credit: 181425482
RAC: 7347

RE: [22:17:42][7988][INFO ]

Quote:
[22:17:42][7988][INFO ] Seed for random number generator is 1147371725.
Activated exception handling...
22:19:04 (5132): Can't set up shared mem: -1. Will run in standalone mode.
22:19:04 (5132): called boinc_finish
Activated exception handling...
22:19:51 (5224): Can't set up shared mem: -1. Will run in standalone mode.
22:19:51 (5224): called boinc_finish


Can't run offline, app exits after few seconds...

Should I supply some command line options for it? I copied real executable in corresponding slot directory and trying to run from there (BOINC switched off).

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 2143
Credit: 2958279587
RAC: 714378

Well, I tried much the same,

Well, I tried much the same, and it starts off:

Activated exception handling...
19:54:35 (3092): Can't set up shared mem: -1. Will run in standalone mode.
[19:54:35][3092][INFO ] Starting data processing...
[19:54:35][3092][INFO ] Using OpenCL platform provided by: Intel(R) Corporation
[19:54:35][3092][INFO ] Using OpenCL device "Intel(R) HD Graphics 4600" by: Intel(R) Corporation
[19:54:35][3092][INFO ] Checkpoint file unavailable: status.cpt (No such file or directory).
------> Starting from scratch...
[19:54:35][3092][INFO ] Header contents:
------> Original WAPP file: ./p2030.20130203.G203.76-01.67.N.b5s0g0.00000_DM77.70
------> Sample time in microseconds: 65.4762
------> Observation time in seconds: 274.62705
------> Time stamp (MJD): 56327.094780941181
------> Number of samples/record: 0
------> Center freq in MHz: 1214.289551
------> Channel band in MHz: 0.33605957
------> Number of channels/record: 960
------> Nifs: 1
------> RA (J2000): 62859.8098984
------> DEC (J2000): 72421.9974003
------> Galactic l: 0
------> Galactic b: 0
------> Name: G203.76-01.67.N
------> Lagformat: 0
------> Sum: 1
------> Level: 3
------> AZ at start: 0
------> ZA at start: 0
------> AST at start: 0
------> LST at start: 0
------> Project ID: --
------> Observers: --
------> File size (bytes): 0
------> Data size (bytes): 0
------> Number of samples: 4194304
------> Trial dispersion measure: 77.7 cm^-3 pc
------> Scale factor: 0.000858516
[19:54:36][3092][INFO ] Seed for random number generator is 1174326477.
[19:54:38][3092][ERROR] Error in OpenCL context: Out of device memory.
[19:54:38][3092][ERROR] Error in OpenCL context: Out of device memory.
[19:54:38][3092][INFO ] Derived global search parameters:
------> f_A probability = 0.08
------> single bin prob(P_noise > P_thr) = 1.32531e-008
------> thr1 = 18.139
------> thr2 = 21.241
------> thr4 = 26.2686
------> thr8 = 34.6478
------> thr16 = 48.9581


(and is still running). Pretty much the same as the live result for task 406867971.

Recipe:
Copied all the symlinks out of the slot directory into a temporary folder for reference.
Made a test working folder with the full versions of the same files.
Looked in client_state for the description of the same task.
Concatenated a command line from the data there:

Quote:
C:\Test\Offline test files>einsteinbinary_BRP4_1.34_windows_x86_64__opencl-intel_gpu.exe -i p2030.20130203.G203.76-01.67.N.b5s0g0.00000_777.bin4 -t stochastic_full.bank -l p2030.20130203.G203.76-01.67.N.b5s0g0.00000.zap -o results.cand0 -c status.cpt -A 0.08 -P 3.0 -f 400.0 -W -z


Hit enter
Noted "[19:54:35][3092][INFO ] Application startup - thank you for supporting Einstein@Home!"
Went and made a cup of coffee.

Edit - came back a few minutes later to find

[19:55:40][3092][INFO ] Checkpoint committed!
[19:56:46][3092][INFO ] Checkpoint committed!
[19:57:50][3092][INFO ] Checkpoint committed!
[19:58:56][3092][INFO ] Checkpoint committed!
[20:00:01][3092][INFO ] Checkpoint committed!
[20:01:07][3092][INFO ] Checkpoint committed!
[20:02:12][3092][INFO ] Checkpoint committed!
[20:03:18][3092][INFO ] Checkpoint committed!
[20:04:22][3092][INFO ] Checkpoint committed!
[20:05:27][3092][INFO ] Checkpoint committed!
[20:06:00][3092][INFO ] OpenCL shutdown complete!

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.