Vintage & unusual Computers on E@H Part II

trentnthompson

Joined: 21 Jan 12

Posts: 6

Credit: 3260442

RAC: 0

I am thinking about picking

21 Dec 2012 10:34:19 UTC

Message 113605

(moderation:

)

I am thinking about picking up another RaspberryPi here soon and maybe again after they make a POE version.

Without pulling out my 256MB Model B and getting my hands dirty, can you briefly explain how you got the app to work?

I have an older G4 (or G5?) PPC laying on the floor as an old web-app server (Deb6-PPC). It would be cool to utilize it's CPU time while I am gone on vacation.

It looks like there was a Mac OSX PPC app at some point.

MarkJ

Joined: 28 Feb 08

Posts: 437

Credit: 137741139

RAC: 17257

The guys at asteroids have

26 Dec 2012 10:55:28 UTC

Message 113606

(moderation:

)

The guys at asteroids have managed to get their app working but are concerned about the run time. It's estimate is 85 hours

BOINC blog

Bikeman (Heinz-...

Moderator

Joined: 28 Aug 06

Posts: 3522

Credit: 691097075

RAC: 259381

RE: Without pulling out

30 Dec 2012 13:49:05 UTC

Message 113607 in response to message 113605

(moderation:

)

Quote:

Without pulling out my 256MB Model B and getting my hands dirty, can you briefly explain how you got the app to work?

The BRP4 App. source code is available for download following links from the Einstein@Home home page. After making some small changes to the build.sh script and Makefile, the source compiled almost out of the box. Changes included setting the necessary compiler flags for the Linux distribution of your choice (e.g. Hard float ABI ) and dropping intel specific options (like enabling SSE support). I guess the changes will be in our next release of the source code.

Boinc itself can be installed via repository for example in the Raspian Distribution, the app is installed via app_info.xml file as an anonymous platform app. That's all ;-).

I'm currently trying to run Boinc in parallel to the Raspbmc media center installation on a second Raspberry. Seems to work just fine. XBMC reportedly worked well even on the earlier 256 MB Raspi, so the 512MB model should have enough headroom to do some Boinc when otherwise idle.

Cheers
HB

Janus

Joined: 10 Nov 04

Posts: 27

Credit: 18056753

RAC: 13964

Before Xmas one of the little

5 Jan 2013 11:44:45 UTC

Message 113608

(moderation:

)

Before Xmas one of the little ones returned a unit that validated. It took over 8 days running 24/7.

When generating the wisdom file, what sizes and types did you use? - Does BRP use a specific size or is it varying?
Did you run an exhaustive search or simply a patient one?
And where did you place it? - Does the default Einstein app read in the system wisdom file?
How do I know if the wisdom file was actually read?

Bikeman (Heinz-...

Moderator

Joined: 28 Aug 06

Posts: 3522

Credit: 691097075

RAC: 259381

RE: Before Xmas one of the

5 Jan 2013 21:11:28 UTC

Message 113609 in response to message 113608

(moderation:

)

Quote:

Before Xmas one of the little ones returned a unit that validated. It took over 8 days running 24/7.

Yup, sounds ok. That is with some overclocking I guess?

Quote:

When generating the wisdom file, what sizes and types did you use? - Does BRP use a specific size or is it varying?

Size is always the same for one 'run'. e.g. for the current BRP4 Arecibo data, the type of transform is real to complex , length 3*2^22 , out-of-place, forward transform, single precision

Quote:

Did you run an exhaustive search or simply a patient one?

A patient one. I tried exhaustive but gave up after it had not finished after a week or so. Usually FFTW 3.3 uses hardware performance counters in 'self-profiling' when generating wisdom, but the FFTW code doesn't support this for ARMv6. This is a bit unfortunate, I understand the Raspi's ARMv6 actually HAS hardware performance counters. So when compiling FFTW on the RASPI, you have to tell it to use the 'slow counters' from the OS, this might slow down wisdom generation. But without telling it to fall back to slow counters, fftw won't perform ANY measurements when doing wisdom generation (only estimate mode is supported then).

Quote:

And where did you place it? - Does the default Einstein app read in the system wisdom file?

The currently used version of BRP4 doesn't use wisdom files at all. If you compile your own version, it's easy to add this feaure, just add a call to

int fftwf_import_system_wisdom(void)

before the plan generation happens. This will try to read wisdom from the file /etc/fftw/wisdomf

Quote:

How do I know if the wisdom file was actually read?

The return value from the function above will indicate whether something was read, but that doesn't mean necessarily that the wisdom was used in plan generation (e.g. if transform type didn't match). In general it shopuld be faster than the default plan without using wisdom which only uses teh ESTIMATE method.

Cheers
HB

Janus

Joined: 10 Nov 04

Posts: 27

Credit: 18056753

RAC: 13964

RE: Yup, sounds ok. That is

6 Jan 2013 11:29:57 UTC

Message 113610 in response to message 113609

(moderation:

)

Quote:

Yup, sounds ok. That is with some overclocking I guess?

Yup, 1.1Ghz arm, 700Mhz core and 610Mhz mem, overvolted to "6" and using the framebuffer polling tricks mentioned above.
The clock seem to run slightly faster under thermal load and the Pi is in a very hot server room. I'm guessing it is running closer to what would be 1.15Ghz under normal temperatures.

Quote:

Size is always the same for one 'run'. e.g. for the current BRP4 Arecibo data, the type of transform is real to complex , length 3*2^22 , out-of-place, forward transform, single precision

That is very interesting - for multiple reasons. First of all that would allow running an exhaustive search on just that one size and composition (or was that what you tried?).
Secondly, since this is a power-of-two-ish forward transform with just single precision it could in theory be run on the GPU through OpenGL ES. A single precision real input data point fits nicely into a single cell in a RGBA texture. I wonder if it would be worth it since FFT on the GPU is memory bound and the GPU uses the same memory as the arm core.

Quote:

The currently used version of BRP4 doesn't use wisdom files at all [snip

Why not, actually?
Even better: Why not something along the lines of this in the app:

if (!exists(../../projects/einstein.phys.uwm.edu/wisdom.fft)){
   run_wisdom_generation_and_store_in_project_dir();
}
load_awesome_wisdom(../../projects/einstein.phys.uwm.edu/wisdom.fft);

It would seem to make sense even for x86/64 systems to do this at least once with the exact settings for the search as you mentioned. Any of the CPU-based BRP4 apps could have a potential benefit from that - at the small cost of running the wisdom generator in the very first workunit of course.

Quote:

The return value from the function above will indicate whether something was read, but that doesn't mean necessarily that the wisdom was used in plan generation (e.g. if transform type didn't match). In general it shopuld be faster than the default plan without using wisdom which only uses teh ESTIMATE method.

Right, so using FFTW_WISDOM_ONLY rather than FFTW_ESTIMATE and checking for a null plan would be even better than just checking the fftwf_import_system_wisdom()?

Bikeman (Heinz-...

Moderator

Joined: 28 Aug 06

Posts: 3522

Credit: 691097075

RAC: 259381

RE: RE: Yup, sounds ok.

6 Jan 2013 13:39:33 UTC

Message 113611 in response to message 113610

(moderation:

)

Quote:

Quote:
Yup, sounds ok. That is with some overclocking I guess?

Yup, 1.1Ghz arm, 700Mhz core and 610Mhz mem, overvolted to "6" and using the framebuffer polling tricks mentioned above.
The clock seem to run slightly faster under thermal load and the Pi is in a very hot server room. I'm guessing it is running closer to what would be 1.15Ghz under normal temperatures.

Quite impressive overclocking! IIRC, the use of a wisdom file was speeding up the app by 25% or so, so this might take your setup to under one week per task.

Quote:

Quote:
Size is always the same for one 'run'. e.g. for the current BRP4 Arecibo data, the type of transform is real to complex , length 3*2^22 , out-of-place, forward transform, single precision

That is very interesting - for multiple reasons. First of all that would allow running an exhaustive search on just that one size and composition (or was that what you tried?).

Indeed, I only tried this single one transform type and still it would take forever in exhaustive mode. Maybe I can patch FFTW to use hardware counters if the Raspi's ARMv6 indeed has them, and maybe that would speed things up.

Quote:

Secondly, since this is a power-of-two-ish forward transform with just single precision it could in theory be run on the GPU through OpenGL ES. A single precision real input data point fits nicely into a single cell in a RGBA texture. I wonder if it would be worth it since FFT on the GPU is memory bound and the GPU uses the same memory as the arm core.

From what I've read, doing FFT on the Raspis GPU would be great but currently noone knows how to do that with software compatible to our licence (Broadcom hasn't exposed all of the source code that would allow us to do this).

Quote:

Quote:
The currently used version of BRP4 doesn't use wisdom files at all [snip

Why not, actually?
Even better: Why not something along the lines of this in the app:
if (!exists(../../projects/einstein.phys.uwm.edu/wisdom.fft)){
   run_wisdom_generation_and_store_in_project_dir();
}
load_awesome_wisdom(../../projects/einstein.phys.uwm.edu/wisdom.fft);
It would seem to make sense even for x86/64 systems to do this at least once with the exact settings for the search as you mentioned. Any of the CPU-based BRP4 apps could have a potential benefit from that - at the small cost of running the wisdom generator in the very first workunit of course.

We were (and are still) thinking about exactly this scheme. For the most important CPU platforms (x86 and x86_64), we are dealing with a wide range of CPUs all the way from Pentium III to latest Cores, plus their AMD counterparts, so we would not want to generate one-file-fits-all-CPUs wisdom. As you mentioned the solution would be to generate wisdom on the fly by the app on the client for the specific CPU used. The problem is that wisdom generation can take long, and there is no way to interrupt and later resume the process when the app gets suspended by the core client. So there is a danger of an app never actually getting beyond wisdom generation when the app is frequently interrupted.

I'm also not sure there is a clean way in BOINC to manage 'sticky' files generated by the app (there is a sticky file meachanism for downloaded files). We don't want to leave stale files around when the user detaches from Einstein@Home.

Anyway, we are still looking into this subject. For ARM based devices we might even try pre-canned 'one-fits-all' wisdom files, not so much because of runtime considerations but to have a better control about the runtime memory requirements (which vary depending on the plan).

Cheers
HB

ML1

Joined: 20 Feb 05

Posts: 347

Credit: 86314908

RAC: 243

RE: I am thinking about

6 Jan 2013 17:02:17 UTC

Message 113612 in response to message 113605

(moderation:

)

Quote:

I am thinking about picking up another RaspberryPi here soon and maybe again after they make a POE version.

Get one or two to mod them for that for yourself, and blog the results!

Happy crunchin',
Martin

See new freedom: Mageia Linux
Take a look for yourself: Linux Format
The Future is what We all make IT (GPLv3)

MarkJ

Joined: 28 Feb 08

Posts: 437

Credit: 137741139

RAC: 17257

Asteriods@home have released

12 Jan 2013 10:08:32 UTC

Message 113613

(moderation:

)

Asteriods@home have released their Raspberry Pi app. They are supporting the Pi directly so there is no need for an app_info. Expected run time at stock speed is 76 hours.

BOINC blog

MarkJ

Joined: 28 Feb 08

Posts: 437

Credit: 137741139

RAC: 17257

Various bits

12 Jan 2013 10:24:14 UTC

Message 113614 in response to message 113611

(moderation:

)

Various bits snipped

Quote:

Quote:
Quote:
The currently used version of BRP4 doesn't use wisdom files at all [snip

Why not, actually?

We were (and are still) thinking about exactly this scheme.

The problem is that wisdom generation can take long, and there is no way to interrupt and later resume the process when the app gets suspended by the core client.

Why not get BOINC to do it? You aren't the only project that uses FFT's and if it had a wisdom file generated (maybe when it does CPU benchmarks) then ALL the projects could make use of it rather than having to do their own.

When you say it takes a while, how long are we talking here? The Seti optimised apps generate wisdom files, so it makes sense to have one that all projects could utilise per host.

BOINC blog

Vintage & unusual Computers on E@H Part II

Forums › Cruncher's Corner

Comment viewing options

Forums › Cruncher's Corner