Parallella's
Hostid=11389405 -> Using this to test
Hostid=11389404
Pi2's
Hostid=11751336 -> Using this to test
Hostid=11750595
Pi B+
Hostid=11660429
The fftw-wisdom on the Parallella took around 45 mins to generate. It looks like its used it (only completed one task so far with it), even though it got fftw v3.3.3 from the (Ubuntu) repo.
I tried generating one on the B+ but it generated an empty wisdom file. It picked up fftw v3.3.4 from the (Debian) repo. I removed it for the time being.
Looking through results for the Pi2, looks like a speed-up. Not sure why they weren't showing. I did note the cut-over date/time as being 18 Feb at 10:36 UTC. The first task is now showing (sent column) at 19 Feb at 10:45 UTC. Looks like the results have changed since I looked yesterday.
Can you clarify what the file name should be please. The fftw docs refer to wisdom (ie without the f), but you mention a wisdomf in the /etc/fftw directory.
Just had a light-bulb moment. Are we using single precision? In which case the fftw-wisdom command should be replaced with fftwf-wisdom. I presume that's why you refer to a wisdomf file instead of the default wisdom file.
The release sources use single precision FFTW library (--enable-float compilation flag). In the single precision library we must use the fftwf_import_system_wisdom function and the "wisdomf" file (in the /etc/fftw directory).
The "wisdomf" file should be produced with the fftwf_wisdom executable (same FFTW version as in the compilation of the client - default 3.3.2).
I am running some tests with the wisdomf file and I will post the results here.
Just had a light-bulb moment. Are we using single precision? In which case the fftw-wisdom command should be replaced with fftwf-wisdom. I presume that's why you refer to a wisdomf file instead of the default wisdom file.
You'll need to regen the wisdomf if you get to FFTW 3.3.3 or later...
Quote:
FFTW 3.3.3
Nov 25, 2012
• Fix deadlock bug in MPI transforms (thanks to Michael Pippig for the bug report and patch, and to Graham Dennis for the bug report).
• Use 128-bit ARM NEON instructions instead of 64-bit instructions. This change appears to speed up even ARM processors with a 64-bit NEON pipe.
• Speed improvements for single-precision AVX.
• Speed up planner on machines without "official" cycle counters, such as ARM.
And just to make life interesting, Debian Wheezy seems to have 3.3.2, Debian Jessie 3.3.4 and Ubuntu Trusty 3.3.3.
I generated a wisdomf on the Parallella in patient mode (it was still going after 3 hours in exhaustive mode and I was impatient). Lets see if it makes any difference.
I generated a wisdomf on the Parallella in patient mode (it was still going after 3 hours in exhaustive mode and I was impatient). Lets see if it makes any difference.
3 hours for rof/rif12M or for all modes?
The EaH client only runs 12M R2C FFTs so you need only rof/rif 12M in the wisdomf file.
3 hours for rof/rif12M or for all modes?
The EaH client only runs 12M R2C FFTs so you need only rof/rif 12M in the wisdomf file.
fftwf-wisdom -v -x -o wisdom rif12582912
That was while it was running 2 work units at the same time. If I run it when the machine is idle then its around 30-45 minutes. Patient mode is a lot quicker (about 5 minutes).
That was while it was running 2 work units at the same time. If I run it when the machine is idle then its around 30-45 minutes. Patient mode is a lot quicker (about 5 minutes).
I think that the most efficient is to run the wisdom with one core load in a dual-core.
Each of the EaH client will be running with one core load.
I run "fftwf-wisdom -v -x -o wisdom rof12582912" in an idle quad-core cortex-a9 (1GHz) in ~6 min
but rif12582912 in ~33 min. I think the the rif is way more complex than rof.
In parallella the rof wisdom file production took about 19min in idle cpu.
In the rof case I didn't see any differences in the wisdom files with and without cpu load.
Furthermore, in the rof case (custom build from release sources) I didn't see any performance speedup
with the use of wisdom file (TBS2910 - Cortex-A9@1GHz).
I test FFTW 3.3.2, 3.3.3 and 3.3.4 with and without wisdom files.
I run 100 templates (100 main loops) for all cases (FFTW version and with/without wisdom file) and the
results shown that there is a very small speedup (<0.5%) with the use of FFT-3.3.3.
The use of wisdom file in FFTW 3.3.3 has also a very small speedup (<0.5%).
The FFTW-3.3.3 with wisdom file has a speedup of <1% compared to FFTW-3.3.2 (default EaH client) without wisdom file.
All the clients in the comparison are custom builds of the release sources (with the addition of import_system_wisdom_file and different FFTW versions)
with extra aggressive optimization flags for ARM Cortex-A9.
I leave the FFTW-3.3.3 with wisdom file to crunch some WUs to see if there is any speedup.
With the FFTW-3.3.2 client with wisdom file I didn't see any difference in the WU crunching (you can see the results from this host here).
I will run some tests with the parallella cpu also.
I will run some tests with the parallella cpu also.
Thank you,
Did you try some timing between rif and rof?
It would be interesting to know if it would be worthwhile building such an ARM app for those that have the free memory. It might benefit your TBS2910 and the Parallella. We're definitely seeing a speedup on the Pi2 with the wisdom.
RE: But I'm confused, which
)
Parallella's
Hostid=11389405 -> Using this to test
Hostid=11389404
Pi2's
Hostid=11751336 -> Using this to test
Hostid=11750595
Pi B+
Hostid=11660429
The fftw-wisdom on the Parallella took around 45 mins to generate. It looks like its used it (only completed one task so far with it), even though it got fftw v3.3.3 from the (Ubuntu) repo.
I tried generating one on the B+ but it generated an empty wisdom file. It picked up fftw v3.3.4 from the (Debian) repo. I removed it for the time being.
Looking through results for the Pi2, looks like a speed-up. Not sure why they weren't showing. I did note the cut-over date/time as being 18 Feb at 10:36 UTC. The first task is now showing (sent column) at 19 Feb at 10:45 UTC. Looks like the results have changed since I looked yesterday.
Can you clarify what the file name should be please. The fftw docs refer to wisdom (ie without the f), but you mention a wisdomf in the /etc/fftw directory.
BOINC blog
Just had a light-bulb moment.
)
Just had a light-bulb moment. Are we using single precision? In which case the fftw-wisdom command should be replaced with fftwf-wisdom. I presume that's why you refer to a wisdomf file instead of the default wisdom file.
BOINC blog
The release sources use
)
The release sources use single precision FFTW library (--enable-float compilation flag). In the single precision library we must use the fftwf_import_system_wisdom function and the "wisdomf" file (in the /etc/fftw directory).
The "wisdomf" file should be produced with the fftwf_wisdom executable (same FFTW version as in the compilation of the client - default 3.3.2).
I am running some tests with the wisdomf file and I will post the results here.
Thank you,
RE: Just had a light-bulb
)
Indeed, that's the explanation.
Cheers
HB
You'll need to regen the
)
You'll need to regen the wisdomf if you get to FFTW 3.3.3 or later...
And just to make life interesting, Debian Wheezy seems to have 3.3.2, Debian Jessie 3.3.4 and Ubuntu Trusty 3.3.3.
I generated a wisdomf on the Parallella in patient mode (it was still going after 3 hours in exhaustive mode and I was impatient). Lets see if it makes any difference.
BOINC blog
For me, the wisdomf file that
)
For me, the wisdomf file that I generated on the Pi2 and posted here worked quite well on the Parallella as well,
http://einsteinathome.org/host/11381212/tasks.
HB
RE: I generated a wisdomf
)
3 hours for rof/rif12M or for all modes?
The EaH client only runs 12M R2C FFTs so you need only rof/rif 12M in the wisdomf file.
RE: 3 hours for rof/rif12M
)
fftwf-wisdom -v -x -o wisdom rif12582912
That was while it was running 2 work units at the same time. If I run it when the machine is idle then its around 30-45 minutes. Patient mode is a lot quicker (about 5 minutes).
BOINC blog
RE: fftwf-wisdom -v -x -o
)
I think that the most efficient is to run the wisdom with one core load in a dual-core.
Each of the EaH client will be running with one core load.
I run "fftwf-wisdom -v -x -o wisdom rof12582912" in an idle quad-core cortex-a9 (1GHz) in ~6 min
but rif12582912 in ~33 min. I think the the rif is way more complex than rof.
In parallella the rof wisdom file production took about 19min in idle cpu.
In the rof case I didn't see any differences in the wisdom files with and without cpu load.
Furthermore, in the rof case (custom build from release sources) I didn't see any performance speedup
with the use of wisdom file (TBS2910 - Cortex-A9@1GHz).
I test FFTW 3.3.2, 3.3.3 and 3.3.4 with and without wisdom files.
I run 100 templates (100 main loops) for all cases (FFTW version and with/without wisdom file) and the
results shown that there is a very small speedup (<0.5%) with the use of FFT-3.3.3.
The use of wisdom file in FFTW 3.3.3 has also a very small speedup (<0.5%).
The FFTW-3.3.3 with wisdom file has a speedup of <1% compared to FFTW-3.3.2 (default EaH client) without wisdom file.
All the clients in the comparison are custom builds of the release sources (with the addition of import_system_wisdom_file and different FFTW versions)
with extra aggressive optimization flags for ARM Cortex-A9.
I leave the FFTW-3.3.3 with wisdom file to crunch some WUs to see if there is any speedup.
With the FFTW-3.3.2 client with wisdom file I didn't see any difference in the WU crunching (you can see the results from this host here).
I will run some tests with the parallella cpu also.
Thank you,
RE: I will run some tests
)
Did you try some timing between rif and rof?
It would be interesting to know if it would be worthwhile building such an ARM app for those that have the free memory. It might benefit your TBS2910 and the Parallella. We're definitely seeing a speedup on the Pi2 with the wisdom.
BOINC blog