Gamma-Ray Pulsar search (FGRPB1) CPU app version 1.06

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4,322
Credit: 250,934,449
RAC: 37,569
Topic 208394

Later today I will be issuing a new (CPU) App version of the Gamma-Ray pulsar search. Its main features are:

- a 64Bit Windows version

- FFTW has been updated to version 3.3.4 (3.3.6-pl2 on Windows & Linux 64Bit)

- a fix for the rare case when the application was interrupted between renaming the two output files (resulting in a "missing output file" error)

The main improvement is that the app now tries to find "wisdom" for the FFT in various places:

- in a file "FGRPB1wisdom.dat" in the project directory

- on Linux in a file "/etc/fftw/wisdomf-<fftw-version>"

- in a system-specific location (see FFTW documentation: fftw_import_system_wisdom), on Linux this is "/etc/fftw/wisdomf"

"Wisdom" can be created using the "fftwf-wisdom"-tool from the same version, running

fftwf-wisdom-o FGRPB1wisdom.dat rib67108864 rob67108864

BM

AgentB
AgentB
Joined: 17 Mar 12
Posts: 915
Credit: 513,211,304
RAC: 0

Thanks Bernd Will the

Thanks Bernd

Will the application try to create (leave behind) a wisdom file if there is not one?

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4,322
Credit: 250,934,449
RAC: 37,569

The current application

The current application already leaves a "wisdom.dat" file in the slot directory. However this is really only from a very short plan generation. To be of help, you need to generate a "patient" plan specific for the machine, which usually takes several hours (the above command-line ran 14 and 18h on two modern machines), and to be effective, nothing else should run on that machine during that time.

The 1.06 app versions mostly produced lots of errors, though, with exception of the 64Bit Linux one. I'll deprecate all versions but that one and investigate.

BM

Sebastian M. Bobrecki
Sebastian M. Bo...
Joined: 20 Feb 05
Posts: 63
Credit: 1,529,603,785
RAC: 110

Hi, Brend With which compile

Hi, Brend

With which compile flags and configure options was build fftw, that is statically linked inside this app? I have concerns that if I make wisdom file with my system fftwf-wisdom (also 3.3.6_p2) which was build with different compile flags and configure options, it may contain suboptimal values for fftw linked inside app.

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4,322
Credit: 250,934,449
RAC: 37,569

My last experiments actually

My recent experiments actually show that there are even worse compatibility issues with "wisdom" or "fftwf-wisdomf". The current application apparently can't import wisdom produced with any other "fftwf-wisdom" binary than the one that was built as a by-product of the app compilation, even if the same compiler, flags, configure options and even calling script were used on a different machine.

I'll probably update the app bundle to ship the right fftwf-wisdom binary with it, but let me investigate the errors of the other platforms first - maybe I can fix a few along the way with a new app version release.

BM

Sebastian M. Bobrecki
Sebastian M. Bo...
Joined: 20 Feb 05
Posts: 63
Credit: 1,529,603,785
RAC: 110

That explains why it almost

That explains why it almost immediately close system wisdomf after reading only 4k:

open("/etc/fftw/wisdomf", O_RDONLY)     = 6
fstat(6, {st_mode=S_IFREG|0644, st_size=205462, ...}) = 0
read(6, "(fftw-3.3.6-pl2 fftwf_wisdom #x0"..., 4096) = 4096
close(6)

I also see that automaticaly created wisdom.dat file is made in slot directory not in project so it will be deleted after computation. Set@home is generating wisdom files in project directory and name in a way that indicate app build (or maybe algorithm) version and cpu name so it is unique for app and cpu combination. For example:

r2728_AMDPhenomtmIIX61075TProcessor_x64.wisdom

r3306_AMDPhenomtmIIX61075TProcessor_x64.wisdom

r3584_AMDPhenomtmIIX61075TProcessor_x64.wisdom

r3306_IntelRCoreTMi7CPUQ840187GHz_x64.wisdom

r2696_IntelRXeonRCPUL5520227GHz_x64.wisdom

r2728_IntelRXeonRCPUL5520227GHz_x64.wisdom

r3306_IntelRXeonRCPUL5520227GHz_x64.wisdom

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4,322
Credit: 250,934,449
RAC: 37,569

The system wisdom is read

The system wisdom is read more or less as a last resort. However, if it was generated with "fftwf-wisdom -c" ("canonical sizes") it isn't of any help here, since it goes only up to FFT sizes of 2^20, while we use 2^26 here.

BM

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4,322
Credit: 250,934,449
RAC: 37,569

SETI uses way shorter FFT

SETI uses way shorter FFT sizes, thus I assume that creating the necessary wisdom is rather a matter of minutes. Creating wisdom that helps for our application (patient 2^26) takes more than 10h, and to be efficient, nothing else can be run on that machine during that time. This rather prevents generating this type of wisdom in a normal BOINC task.

BM

Sebastian M. Bobrecki
Sebastian M. Bo...
Joined: 20 Feb 05
Posts: 63
Credit: 1,529,603,785
RAC: 110

Bernd Machenschalk

Bernd Machenschalk wrote:
Creating wisdom that helps for our application (patient 2^26) takes more than 10h, and to be efficient, nothing else can be run on that machine during that time. This rather prevents generating this type of wisdom in a normal BOINC task.

Yes, this take a lot of time. And cause they are not used by app I just wasted a lot of hours on making them. But maybe not totaly wasted as I have some useful observations. For my hosts, different cpus (form Intel and form AMD), in place tuning is much faster than out of place, about 6 to 10 times. MEASURE mode is a lot faster. It takes less than hour for both. So it's a good starting point as more accurate tuning can be appended to output wisdom file later.

Bernd Machenschalk wrote:
The system wisdom is read more or less as a last resort. However, if it was generated with "fftwf-wisdom -c" ("canonical sizes") it isn't of any help here, since it goes only up to FFT sizes of 2^20, while we use 2^26 here.

Mine is with MEASURE mode up to 2^27, with PATIENT mode up to 2^26 and some small and popular sizes, that I use, in EXHAUSTIVE mode (it's over 200k in size).

Bernd Machenschalk wrote:
SETI uses way shorter FFT sizes, thus I assume that creating the necessary wisdom is rather a matter of minutes.

Yes I know that Seti uses smaller FFT but I just wanted to point to some idea of storing wisdom files. Also as MEASURE mode takes less than hour, even on old not so fast hosts, maybe it is good to do it on start of app if wisdom file don't already exist on project folder. It will extend runtime only by less than 10%, and only for first run of this app version on current cpu model.

 

Sebastian M. Bobrecki
Sebastian M. Bo...
Joined: 20 Feb 05
Posts: 63
Credit: 1,529,603,785
RAC: 110

I also made some comparison

I also made some comparison between results of fftwf-wisdom when run in parallel on many cores. I wanted to see what effect will have concurrency in memory and shared L3 cache bandwidth. In case of L3 cache also capacity is shared. More parallel task also disables CPU clock turbo mode.

To save time I used MEASURE mode and smaller 2^20 rib on six core AMD Phenom: fftwf-wisdom -m -n -o wisdomf.${parallel_tasks_count}.${task_number} rib1048576
So there are 21 files in total and there are some differences between some of them:
$ find . -type f -name "wisdomf.*" -exec md5sum {} +|sort
15dcee500cf1d748aaf95ec72eeb8079 ./wisdomf.2.2
440160b3041518f6df2b7158f1ddcf5f ./wisdomf.4.1
440160b3041518f6df2b7158f1ddcf5f ./wisdomf.5.3
440160b3041518f6df2b7158f1ddcf5f ./wisdomf.5.5
440160b3041518f6df2b7158f1ddcf5f ./wisdomf.6.4
6ae6c3cb2344fb6943d1d820c4674f7c ./wisdomf.4.4
9ab24bbaefb321ad2487b1f47b1987c0 ./wisdomf.4.2
9ab24bbaefb321ad2487b1f47b1987c0 ./wisdomf.5.2
9ab24bbaefb321ad2487b1f47b1987c0 ./wisdomf.5.4
9ab24bbaefb321ad2487b1f47b1987c0 ./wisdomf.6.1
9ab24bbaefb321ad2487b1f47b1987c0 ./wisdomf.6.3
b2fc9062366ccdece72d1e928be46264 ./wisdomf.2.1
b2fc9062366ccdece72d1e928be46264 ./wisdomf.3.2
b2fc9062366ccdece72d1e928be46264 ./wisdomf.6.2
e80bf6d015ea519bd5dc09ab71576d3e ./wisdomf.1.1
e80bf6d015ea519bd5dc09ab71576d3e ./wisdomf.3.3
e80bf6d015ea519bd5dc09ab71576d3e ./wisdomf.5.1
e80bf6d015ea519bd5dc09ab71576d3e ./wisdomf.6.5
f6cc77cdf21778ba6b3c3f37f2f32c48 ./wisdomf.3.1
f6cc77cdf21778ba6b3c3f37f2f32c48 ./wisdomf.4.3
f6cc77cdf21778ba6b3c3f37f2f32c48 ./wisdomf.6.6

First number indicate in which number of files given line is present.$ cat wisdomf.* |sort|uniq -c
21 )
15 (fftwf_codelet_hc2cbdft_2 0 #x11bcd #x11bdd #x0 #x97f19052 #x7acf5d64 #x8079cadc #x65037d5a)
6 (fftwf_codelet_hc2cbdftv_2_sse2 0 #x11bcd #x11bdd #x0 #x97f19052 #x7acf5d64 #x8079cadc #x65037d5a)
21 (fftwf_codelet_n1bv_128_sse2 0 #x10fdd #x10fdd #x0 #x58131019 #xc77de447 #x3e484e5e #x3578808d)
5 (fftwf_codelet_n1fv_128_sse2 0 #x11bdd #x11bdd #x0 #x0f540577 #x2d8bbfe1 #x9ffa7db4 #xbf6f9c2b)
13 (fftwf_codelet_n1fv_128_sse2 0 #x11bdd #x11bdd #x0 #x4f8bf6db #x68039ae4 #xebe79bcd #xb8e9ba13)
3 (fftwf_codelet_n1fv_128_sse2 0 #x11bdd #x11bdd #x0 #xcac91cd9 #x91a76b30 #x934c58d4 #x45967509)
21 (fftwf_codelet_r2cb_2 2 #x11bdd #x11bdd #x0 #x4ed198bc #x6c2d113a #xda3cf4c7 #x69e43a0f)
21 (fftwf_codelet_r2cbIII_2 2 #x11bdd #x11bdd #x0 #xb525d7dc #xa2287a93 #x4399b3c3 #x9a04e375)
1 (fftwf_codelet_t1bv_32_sse2 0 #x11bcd #x11bdd #x0 #x5447b4f1 #xbcccba76 #x937a77d0 #xdec4daff)
15 (fftwf_codelet_t2bv_32_sse2 0 #x11bcd #x11bdd #x0 #x5447b4f1 #xbcccba76 #x937a77d0 #xdec4daff)
5 (fftwf_codelet_t3bv_32_sse2 0 #x11bcd #x11bdd #x0 #x5447b4f1 #xbcccba76 #x937a77d0 #xdec4daff)
5 (fftwf_ct_genericbuf_register 27 #x11bcd #x11bdd #x0 #x854448a4 #xaae4f72d #xfa109504 #xa931a061)
3 (fftwf_ct_genericbuf_register 28 #x11bcd #x11bdd #x0 #x854448a4 #xaae4f72d #xfa109504 #xa931a061)
13 (fftwf_ct_genericbuf_register 29 #x11bcd #x11bdd #x0 #x854448a4 #xaae4f72d #xfa109504 #xa931a061)
21 (fftwf_dft_buffered_register 1 #x11bdd #x11bdd #x0 #xf9e722a1 #xb7dbff40 #xd4d34cc4 #xa079497d)
21 (fftwf_dft_indirect_register 0 #x10bdd #x10bdd #x0 #xb5c51e03 #x80ede38e #x6e466d6f #xff239aa4)
21 (fftwf_dft_nop_register 0 #x11bdd #x11bdd #x0 #x0b7e3efa #xd9dad36f #xe6aad5b9 #xee5bff7a)
21 (fftwf_dft_r2hc_register 0 #x10bdd #x10bdd #x0 #x78b11004 #xdbe06ea5 #xd180ed97 #x4d95615f)
21 (fftwf_dft_r2hc_register 0 #x11bdd #x11bdd #x0 #xda1c72ab #x2ab24ee2 #x8dd0a5fb #x6177cf51)
21 (fftwf_dft_vrank_geq1_register 0 #x11bdd #x11bdd #x0 #x9d6f9c91 #x6a459a4c #xbe3cf06b #xb2895c51)
21 (fftwf_rdft_rank0_register 1 #x11bdd #x11bdd #x0 #x5e622d64 #x8229e075 #x9e876805 #xed11017f)
21 (fftwf_rdft_rank0_register 2 #x10bdd #x10bdd #x0 #xfeb7d0c8 #xbdc40947 #x7a84fbdd #x8a359b6e)
21 (fftw-3.3.6-pl2 fftwf_wisdom #x08ac4c16 #x457005cc #xea102cf7 #xd7ff9038

I don't know yet if those kind of differences are in fact big or small and if they have significant impact on computation time. Will see when app will be in good shape.

Richie
Richie
Joined: 7 Mar 14
Posts: 656
Credit: 1,702,989,778
RAC: 0

Linux running on an AMD

Linux running on an AMD x86_64 or Intel EM64T CPU ... 1.06 (ATLAS1)

and

<version>
<platform_short>x86_64-pc-linux-gnu</platform_short>
<platform_long>Linux running on an AMD x86_64 or Intel EM64T CPU</platform_long>
<version_num>106</version_num>
<plan_class>ATLAS1</plan_class>
<date>26 Jun 2017, 12:17:24 UTC</date>
<date_unix>1498479444</date_unix>
</version>
 
What are those? Surprised Can my host get to crunch them?

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.