Later today I will be issuing a new (CPU) App version of the Gamma-Ray pulsar search. Its main features are:
- a 64Bit Windows version
- FFTW has been updated to version 3.3.4 (3.3.6-pl2 on Windows & Linux 64Bit)
- a fix for the rare case when the application was interrupted between renaming the two output files (resulting in a "missing output file" error)
The main improvement is that the app now tries to find "wisdom" for the FFT in various places:
- in a file "FGRPB1wisdom.dat" in the project directory
- on Linux in a file "/etc/fftw/wisdomf-<fftw-version>"
- in a system-specific location (see FFTW documentation: fftw_import_system_wisdom), on Linux this is "/etc/fftw/wisdomf"
"Wisdom" can be created using the "fftwf-wisdom"-tool from the same version, running
fftwf-wisdom-o FGRPB1wisdom.dat rib67108864 rob67108864
BM
Copyright © 2024 Einstein@Home. All rights reserved.
Thanks Bernd Will the
)
Thanks Bernd
Will the application try to create (leave behind) a wisdom file if there is not one?
The current application
)
The current application already leaves a "wisdom.dat" file in the slot directory. However this is really only from a very short plan generation. To be of help, you need to generate a "patient" plan specific for the machine, which usually takes several hours (the above command-line ran 14 and 18h on two modern machines), and to be effective, nothing else should run on that machine during that time.
The 1.06 app versions mostly produced lots of errors, though, with exception of the 64Bit Linux one. I'll deprecate all versions but that one and investigate.
BM
Hi, Brend With which compile
)
Hi, Brend
With which compile flags and configure options was build fftw, that is statically linked inside this app? I have concerns that if I make wisdom file with my system fftwf-wisdom (also 3.3.6_p2) which was build with different compile flags and configure options, it may contain suboptimal values for fftw linked inside app.
My last experiments actually
)
My recent experiments actually show that there are even worse compatibility issues with "wisdom" or "fftwf-wisdomf". The current application apparently can't import wisdom produced with any other "fftwf-wisdom" binary than the one that was built as a by-product of the app compilation, even if the same compiler, flags, configure options and even calling script were used on a different machine.
I'll probably update the app bundle to ship the right fftwf-wisdom binary with it, but let me investigate the errors of the other platforms first - maybe I can fix a few along the way with a new app version release.
BM
That explains why it almost
)
That explains why it almost immediately close system wisdomf after reading only 4k:
open("/etc/fftw/wisdomf", O_RDONLY) = 6
fstat(6, {st_mode=S_IFREG|0644, st_size=205462, ...}) = 0
read(6, "(fftw-3.3.6-pl2 fftwf_wisdom #x0"..., 4096) = 4096
close(6)
I also see that automaticaly created wisdom.dat file is made in slot directory not in project so it will be deleted after computation. Set@home is generating wisdom files in project directory and name in a way that indicate app build (or maybe algorithm) version and cpu name so it is unique for app and cpu combination. For example:
r2728_AMDPhenomtmIIX61075TProcessor_x64.wisdom
r3306_AMDPhenomtmIIX61075TProcessor_x64.wisdom
r3584_AMDPhenomtmIIX61075TProcessor_x64.wisdom
r3306_IntelRCoreTMi7CPUQ840187GHz_x64.wisdom
r2696_IntelRXeonRCPUL5520227GHz_x64.wisdom
r2728_IntelRXeonRCPUL5520227GHz_x64.wisdom
r3306_IntelRXeonRCPUL5520227GHz_x64.wisdom
The system wisdom is read
)
The system wisdom is read more or less as a last resort. However, if it was generated with "fftwf-wisdom -c" ("canonical sizes") it isn't of any help here, since it goes only up to FFT sizes of 2^20, while we use 2^26 here.
BM
SETI uses way shorter FFT
)
SETI uses way shorter FFT sizes, thus I assume that creating the necessary wisdom is rather a matter of minutes. Creating wisdom that helps for our application (patient 2^26) takes more than 10h, and to be efficient, nothing else can be run on that machine during that time. This rather prevents generating this type of wisdom in a normal BOINC task.
BM
Bernd Machenschalk
)
Yes, this take a lot of time. And cause they are not used by app I just wasted a lot of hours on making them. But maybe not totaly wasted as I have some useful observations. For my hosts, different cpus (form Intel and form AMD), in place tuning is much faster than out of place, about 6 to 10 times. MEASURE mode is a lot faster. It takes less than hour for both. So it's a good starting point as more accurate tuning can be appended to output wisdom file later.
Mine is with MEASURE mode up to 2^27, with PATIENT mode up to 2^26 and some small and popular sizes, that I use, in EXHAUSTIVE mode (it's over 200k in size).
Yes I know that Seti uses smaller FFT but I just wanted to point to some idea of storing wisdom files. Also as MEASURE mode takes less than hour, even on old not so fast hosts, maybe it is good to do it on start of app if wisdom file don't already exist on project folder. It will extend runtime only by less than 10%, and only for first run of this app version on current cpu model.
I also made some comparison
)
I also made some comparison between results of fftwf-wisdom when run in parallel on many cores. I wanted to see what effect will have concurrency in memory and shared L3 cache bandwidth. In case of L3 cache also capacity is shared. More parallel task also disables CPU clock turbo mode.
To save time I used MEASURE mode and smaller 2^20 rib on six core AMD Phenom:
fftwf-wisdom -m -n -o wisdomf.${parallel_tasks_count}.${task_number} rib1048576
So there are 21 files in total and there are some differences between some of them:
$ find . -type f -name "wisdomf.*" -exec md5sum {} +|sort
15dcee500cf1d748aaf95ec72eeb8079 ./wisdomf.2.2
440160b3041518f6df2b7158f1ddcf5f ./wisdomf.4.1
440160b3041518f6df2b7158f1ddcf5f ./wisdomf.5.3
440160b3041518f6df2b7158f1ddcf5f ./wisdomf.5.5
440160b3041518f6df2b7158f1ddcf5f ./wisdomf.6.4
6ae6c3cb2344fb6943d1d820c4674f7c ./wisdomf.4.4
9ab24bbaefb321ad2487b1f47b1987c0 ./wisdomf.4.2
9ab24bbaefb321ad2487b1f47b1987c0 ./wisdomf.5.2
9ab24bbaefb321ad2487b1f47b1987c0 ./wisdomf.5.4
9ab24bbaefb321ad2487b1f47b1987c0 ./wisdomf.6.1
9ab24bbaefb321ad2487b1f47b1987c0 ./wisdomf.6.3
b2fc9062366ccdece72d1e928be46264 ./wisdomf.2.1
b2fc9062366ccdece72d1e928be46264 ./wisdomf.3.2
b2fc9062366ccdece72d1e928be46264 ./wisdomf.6.2
e80bf6d015ea519bd5dc09ab71576d3e ./wisdomf.1.1
e80bf6d015ea519bd5dc09ab71576d3e ./wisdomf.3.3
e80bf6d015ea519bd5dc09ab71576d3e ./wisdomf.5.1
e80bf6d015ea519bd5dc09ab71576d3e ./wisdomf.6.5
f6cc77cdf21778ba6b3c3f37f2f32c48 ./wisdomf.3.1
f6cc77cdf21778ba6b3c3f37f2f32c48 ./wisdomf.4.3
f6cc77cdf21778ba6b3c3f37f2f32c48 ./wisdomf.6.6
First number indicate in which number of files given line is present.
$ cat wisdomf.* |sort|uniq -c
21 )
15 (fftwf_codelet_hc2cbdft_2 0 #x11bcd #x11bdd #x0 #x97f19052 #x7acf5d64 #x8079cadc #x65037d5a)
6 (fftwf_codelet_hc2cbdftv_2_sse2 0 #x11bcd #x11bdd #x0 #x97f19052 #x7acf5d64 #x8079cadc #x65037d5a)
21 (fftwf_codelet_n1bv_128_sse2 0 #x10fdd #x10fdd #x0 #x58131019 #xc77de447 #x3e484e5e #x3578808d)
5 (fftwf_codelet_n1fv_128_sse2 0 #x11bdd #x11bdd #x0 #x0f540577 #x2d8bbfe1 #x9ffa7db4 #xbf6f9c2b)
13 (fftwf_codelet_n1fv_128_sse2 0 #x11bdd #x11bdd #x0 #x4f8bf6db #x68039ae4 #xebe79bcd #xb8e9ba13)
3 (fftwf_codelet_n1fv_128_sse2 0 #x11bdd #x11bdd #x0 #xcac91cd9 #x91a76b30 #x934c58d4 #x45967509)
21 (fftwf_codelet_r2cb_2 2 #x11bdd #x11bdd #x0 #x4ed198bc #x6c2d113a #xda3cf4c7 #x69e43a0f)
21 (fftwf_codelet_r2cbIII_2 2 #x11bdd #x11bdd #x0 #xb525d7dc #xa2287a93 #x4399b3c3 #x9a04e375)
1 (fftwf_codelet_t1bv_32_sse2 0 #x11bcd #x11bdd #x0 #x5447b4f1 #xbcccba76 #x937a77d0 #xdec4daff)
15 (fftwf_codelet_t2bv_32_sse2 0 #x11bcd #x11bdd #x0 #x5447b4f1 #xbcccba76 #x937a77d0 #xdec4daff)
5 (fftwf_codelet_t3bv_32_sse2 0 #x11bcd #x11bdd #x0 #x5447b4f1 #xbcccba76 #x937a77d0 #xdec4daff)
5 (fftwf_ct_genericbuf_register 27 #x11bcd #x11bdd #x0 #x854448a4 #xaae4f72d #xfa109504 #xa931a061)
3 (fftwf_ct_genericbuf_register 28 #x11bcd #x11bdd #x0 #x854448a4 #xaae4f72d #xfa109504 #xa931a061)
13 (fftwf_ct_genericbuf_register 29 #x11bcd #x11bdd #x0 #x854448a4 #xaae4f72d #xfa109504 #xa931a061)
21 (fftwf_dft_buffered_register 1 #x11bdd #x11bdd #x0 #xf9e722a1 #xb7dbff40 #xd4d34cc4 #xa079497d)
21 (fftwf_dft_indirect_register 0 #x10bdd #x10bdd #x0 #xb5c51e03 #x80ede38e #x6e466d6f #xff239aa4)
21 (fftwf_dft_nop_register 0 #x11bdd #x11bdd #x0 #x0b7e3efa #xd9dad36f #xe6aad5b9 #xee5bff7a)
21 (fftwf_dft_r2hc_register 0 #x10bdd #x10bdd #x0 #x78b11004 #xdbe06ea5 #xd180ed97 #x4d95615f)
21 (fftwf_dft_r2hc_register 0 #x11bdd #x11bdd #x0 #xda1c72ab #x2ab24ee2 #x8dd0a5fb #x6177cf51)
21 (fftwf_dft_vrank_geq1_register 0 #x11bdd #x11bdd #x0 #x9d6f9c91 #x6a459a4c #xbe3cf06b #xb2895c51)
21 (fftwf_rdft_rank0_register 1 #x11bdd #x11bdd #x0 #x5e622d64 #x8229e075 #x9e876805 #xed11017f)
21 (fftwf_rdft_rank0_register 2 #x10bdd #x10bdd #x0 #xfeb7d0c8 #xbdc40947 #x7a84fbdd #x8a359b6e)
21 (fftw-3.3.6-pl2 fftwf_wisdom #x08ac4c16 #x457005cc #xea102cf7 #xd7ff9038
I don't know yet if those kind of differences are in fact big or small and if they have significant impact on computation time. Will see when app will be in good shape.
Linux running on an AMD
)
Linux running on an AMD x86_64 or Intel EM64T CPU ... 1.06 (ATLAS1)
and