[S5R3/R4] How to check Performance when Testing new Apps

Paper Moon
Paper Moon
Joined: 12 Apr 08
Posts: 14
Credit: 292,052
RAC: 0

RE: Everything went back to

Message 77864 in response to message 77863

Quote:
Everything went back to normal this morning as i just replace back to previous the SSE file (_1).
So i'll wait for the current WU's to finish and try again with the SSE2 one, triple checking the spell.


If it's not the name, we're left with:

EACCES: maybe the 'x' mode on the file is not enabled? Try 'chmod 0755 einstein_S5R3_4.49_i686-pc-linux-gnu_1'

ENOEXEC: very unlikely, but maybe a download error. Try 'md5sum einstein_S5R3_4.49_i686-pc-linux-gnu_1' => 'cf3fa0823f43745ebe8df104a1f3e9c6'

Or take out the switcher:
$ cp -p einstein_S5R3_4.49_i686-pc-linux-gnu einstein_S5R3_4.49_i686-pc-linux-gnu.switcher
$ ln -f einstein_S5R3_4.49_i686-pc-linux-gnu_1 einstein_S5R3_4.49_i686-pc-linux-gnu

I'd expect that last one to fail, because starting the SSE application is OK.

[AF>Futura Sciences]click
[AF>Futura Scie...
Joined: 12 Apr 05
Posts: 34
Credit: 1,923,040
RAC: 0

RE: If it's not the name,

Message 77865 in response to message 77864

Quote:

If it's not the name, we're left with:

Can't be 100% sure yet, but pretty confident.
Only one hour left for crunching current WU's, i'll investigate further more asap.

Quote:

EACCES: maybe the 'x' mode on the file is not enabled? Try 'chmod 0755 einstein_S5R3_4.49_i686-pc-linux-gnu_1'

I just did as Bikeman said : "chmod u+x"
http://einsteinathome.org/node/193468&nowrap=true#84195
I did a chmod --help before trying, and did not recognize the "u+x" syntax in it. So i tried anyway and as it accepted the parameters i assumed it worked.
But i've been lazy and didn't read the "man". Maybe that's it. I'll try your way this time and take time to learn a bit more about files rights under Linux.

Quote:

ENOEXEC: very unlikely, but maybe a download error. Try 'md5sum einstein_S5R3_4.49_i686-pc-linux-gnu_1' => 'cf3fa0823f43745ebe8df104a1f3e9c6'

As you assumed, checksum is OK
And it worked OK (the SSE2 one) when i used it with the perfs-apps test script.

Quote:


Or take out the switcher:
$ cp -p einstein_S5R3_4.49_i686-pc-linux-gnu einstein_S5R3_4.49_i686-pc-linux-gnu.switcher
$ ln -f einstein_S5R3_4.49_i686-pc-linux-gnu_1 einstein_S5R3_4.49_i686-pc-linux-gnu

I'd expect that last one to fail, because starting the SSE application is OK.

Must say i don't understand theses lines right away, but now work is on my part to learn about it.

Thanks for your precious help.
I don't want to pollute further more this thread, the initial post was aimed more about weired timing results i got between no-SSE, SSE, SSE2 perfs tests.
So i'll just report if the problem is solved, or open a new thread in the right session other wise (or maybe just PM you ;) ).

God created a few good looking guys.. and for the rest he put hairs on top..

[AF>Futura Sciences]click
[AF>Futura Scie...
Joined: 12 Apr 05
Posts: 34
Credit: 1,923,040
RAC: 0

Everything is fine now ! I

Everything is fine now !
I checked the spell, checked the file rights (ls -l)...
So i still don't know for sure what happened first time as i did the exact same thing (except double checking).

But i thought of something, first time i stopped Boing-Manager and set every task to "suspend" before leaving, but the daemon was still running (service mode was enabled).
And this time i first disabled "service mode", and did a clean restart.

-----

And i reran the perfs-tests as well, witch seems more coherent =>

(AMD 3800+, FSB-210MHz)

./run.sh ./einstein_S5R3_4.49_i686-pc-linux-gnu_0 ### std. app
real 11m46.883s
user 11m33.139s
sys 0m3.576s

./run.sh ./einstein_S5R3_4.49_i686-pc-linux-gnu_1 ### SSE app

real 10m57.928s
user 10m43.968s
sys 0m2.880s

./run.sh ./einstein_S5R3_4.491_i686-pc-linux-gnu_1 ### SSE2 app

real 10m28.651s
user 10m13.114s
sys 0m4.272s

God created a few good looking guys.. and for the rest he put hairs on top..

tullio
tullio
Joined: 22 Jan 05
Posts: 2,118
Credit: 61,407,735
RAC: 0

RE: Everything is fine now

Message 77867 in response to message 77866

Quote:

Everything is fine now !
I checked the spell, checked the file rights (ls -l)...
So i still don't know for sure what happened first time as i did the exact same thing (except double checking).

But i thought of something, first time i stopped Boing-Manager and set every task to "suspend" before leaving, but the daemon was still running (service mode was enabled).
And this time i first disabled "service mode", and did a clean restart.

-----

And i reran the perfs-tests as well, witch seems more coherent =>

(AMD 3800+, FSB-210MHz)

./run.sh ./einstein_S5R3_4.49_i686-pc-linux-gnu_0 ### std. app
real 11m46.883s
user 11m33.139s
sys 0m3.576s

./run.sh ./einstein_S5R3_4.49_i686-pc-linux-gnu_1 ### SSE app

real 10m57.928s
user 10m43.968s
sys 0m2.880s

./run.sh ./einstein_S5R3_4.491_i686-pc-linux-gnu_1 ### SSE2 app

real 10m28.651s
user 10m13.114s
sys 0m4.272s


Your times are very close to those obtained by my Opteron 1210 at 1.8 GHz both on SSE and SSE2. My times with the std app were much higher, this means that optimization works well on the Opteron architecture.
Tullio

koschi
koschi
Joined: 17 Mar 05
Posts: 86
Credit: 1,664,597,555
RAC: 0

Hi everyone :) Thumbs up

Hi everyone :)

Thumbs up for the good work so far! I like the reference work unit.
As I run everything in /tmp, it only complained that it wasn't able to open stat on stderr.txt.

Did some benchmark runs on my Q6600 @ 3,2GHz with DDR2-800 @ 4-4-4-6 (powered by 64bit Kubuntu hardy), and the results are maybe not surprising anymore, but more clearly visible than looking at cycling crunch times that differ some thousand seconds.

The improvements over the last versions are amazing, but have a look yourself:

4.49 SSE2 app - einstein_S5R3_4.49_1_i686-pc-linux-gnu

real 4m48.726s
user 4m45.562s
sys 0m1.176s

4.49 SSE app - einstein_S5R3_4.49_i686-pc-linux-gnu_1

real 5m33.938s
user 5m30.653s
sys 0m1.128s

4.49 nonSSE app - einstein_S5R3_4.49_i686-pc-linux-gnu_0

real 8m56.656s
user 8m52.977s
sys 0m1.488s

4.38 SSE app - einstein_S5R3_4.38_i686-pc-linux-gnu_1

real 6m17.721s
user 6m14.367s
sys 0m1.360s

4.38 nonSSE app - einstein_S5R3_4.38_i686-pc-linux-gnu_0

real 9m33.577s
user 9m30.180s
sys 0m1.412s

4.35 SSE app - einstein_S5R3_4.35_i686-pc-linux-gnu

real 6m1.429s
user 5m58.166s
sys 0m1.260s

4.49 SSE2 is 16% faster than 4.49 SSE
4.49 SSE is ~13% faster than 3.38 SSE
so 4.49 SSE2 is 31% faster than 3.38 SSE and 98% faster than 4.38 nonSSE

System was under load by some regular work (Einstein, Cosmomolgy and Seti V8 in a virtual machine ) all the time, but the ./HierarchicalSearch process had 100% (means one core) cpu share when ever I checked, so the test environment should be the same.

For the two nonSSE runs of 4.49 and 4.38, the newer one looks faster. Was there any improvement, or might this be caused by my system? Also 4.35 vs. 4.38 SSE looks strange. In that case I will rerun all the tests today, after some hours of sleep ;-)

regards, koschi

[AF>Futura Sciences]click
[AF>Futura Scie...
Joined: 12 Apr 05
Posts: 34
Credit: 1,923,040
RAC: 0

RE: Ok, here's the first

Message 77869 in response to message 77839

Quote:

Ok, here's the first try of a reference workunit to measure the performanece of Apps: refwu.zip.

It is meant to work on MacOS and Linux (for now), hopefully someone better than me in scripting on Windows could write a batch file similar along the lines of the shell script "run.sh", I'll be happy to include it in the archive.

Hi.

I'll give it a first shot.
Please advise your girlfriend you’r about to laugh in the 5 next minutes, as reading the scripts...

Bernd, I would like your comments on the parameters we need to give to the apps depending on versions :
- I changed --DataFiles1='h1_1005.00_S5R3;[...blablabla..];h1_1005.10_S5R3' to --DataFiles1=h1_1005.00_S5R3;[...blablabla..];h1_1005.10_S5R3
because it issued a warning with the ' ' . seems to be a difference between Unix and Windows apps.
- I searched a long time for the warning about the Hough.cpt warning, before i remembered you did some changes about the check-point format. What should be the parameter to give to older apps (to replace “–o Hough.cpt “) ?
It does not seems to affect the results, but would be better to do it clean.
- Seems that on early SR3 versions (4.01 windows and 4.02 Linux), some parameters are not take in account. Not a big issue as results time seems ok. But again…

-------------

Now, about the scripts...

I choose BAT file scripting for maximum compatibility. To avoid the problems of VBS-scripts on SP2 and higher for XP.

I didn't auto download the files used by the EAH-perf-app, to avoid issues with various versions of Windows. If someone wants to give it a shot...
Not really a problem, the files needed are in a Zip file at the end of the post.

Now, the real big issue was to find an equivalent to the "time" command of Unix. As nothing similar exists for DOS.
So, the script will give "real" time of the Unix time command. This is the time of beginning and stop of the EAH-perf-app.
To get the most interesting one (referred as "User" in the Unix version), corresponding to the CPU time consumed by the app, i will submit to different dirty tricks at the end of the post.

No the main script :

@echo off

COPY /Y %1 HierarchicalSearch.exe

CLS
echo .
echo .

REM @echo off
SET PATH="C:\\Program Files\\BOINC\\projects\\refwuXP"

REM SET

REM ------- Cleaning previous Runs and merge previous results to log files -----------
@echo off
IF EXIST stderr.txt ( ECHO .............. >> stderr.log
TYPE stderr.txt >> stderr.log
DEL stderr.txt )
IF EXIST timing.txt ( ECHO .............. >> timing.log
TYPE timing.txt >> timing.log
DEL timing.txt )
IF EXIST log.txt ( DEL log.txt )
IF EXIST log.txt ( DEL log.txt )
IF EXIST Hough.out ( DEL Hough.out )
IF EXIST Hough.out.cpt ( DEL Hough.out.cpt )
IF EXIST Hough.out.zip ( DEL Hough.out.zip )
IF EXIST init_data.xml ( DEL init_data.xml )
IF EXIST boinc_finish_called ( DEL boinc_finish_called )

echo .
echo .
echo [-------------- %1 ------------- ]
echo .
echo .
echo [-------------- Crunching ref WU - Wait for completion -------------- ]
echo .
echo .

echo Result from " %1 " run : >> timing.txt
echo Output from " %1 " run : >> stderr.txt

echo start time %TIME% >> timing.txt
@echo off
HierarchicalSearch %1 --method=0 --Freq=1005.20242518 --FreqBand=0.0161398745876 --dFreq=6.71056161393e-06 --f1dot=-1.58548959919e-09 --f1dotBand=1.74403855911e-09 --df1dot=3.88447721545e-10 --skyGridFile=skygrid_1010Hz_S5R3.dat --DataFiles1=h1_1005.00_S5R3;l1_1005.00_S5R3;h1_1005.05_S5R3;l1_1005.05_S5R3;h1_1005.10_S5R3;l1_1005.10_S5R3;h1_1005.15_S5R3;l1_1005.15_S5R3;h1_1005.20_S5R3;l1_1005.20_S5R3;h1_1005.25_S5R3;l1_1005.25_S5R3;h1_1005.30_S5R3;l1_1005.30_S5R3;h1_1005.35_S5R3;l1_1005.35_S5R3 --tStack=90000 --nStacksMax=84 --pixelFactor=0.500 --nf1dotRes=1 --ephemE=earth_05_09 --ephemS=sun_05_09 --nCand1=10000 -o Hough.out --gridType=3 --useWeights=0 --printCand1 --semiCohToplist -d1
echo stop time %TIME% >> timing.txt

REM ------ Emit a speaker beep upon completion -----
beep.bat

IF EXIST HierarchicalSearch.exe ( DEL HierarchicalSearch.exe )

Don’t say I didn’t warm you that you will laugh…

Does just a little file cleaning, and managing of the history of stderr files as the Unix script.
Will issue a file "timing.txt" with the start and end time, corresponding to the
"real" time. And it will log it between different runs.Sorry, you'll need to do a little math for now.
This should be enough to compare 2 apps versions, if you don't decide to recompress your entire AVI porn collection in parallel as running the benchmarks.
To run it, copy it to the same directory as the EAH files you downloaded (the Zip at the end of the post). Here it's "C:\\Program Files\\BOINC\\projects\\refwuXP".
Copy the EAH apps you want to test in this folder as well. For example "einstein_S5R3_4.46_windows_intelx86_1.exe"
Then edit the PATH ( at the beginning of the script if needed), then open a DOS shell (with Start->execute ; type CMD or via windows shortcuts).
Move to the folder: CD "C:\\Program Files\\BOINC\\projects\\refwuXP" or any other one chosen.
Then type: run.bat "einstein_S5R3_4.46_windows_intelx86_1.exe"
The parameter is the name of the EAH you want to test. Here the 4.46 SSe one.

--------

In order to make a clean comparison, some would want to get the CPU time actually consumed by the app. Sorry, both of theses methods are not really clean, but if someone finds a better way, he's welcome. ;)

VERSION 1 :

I made a second script to monitor the EAH thread:


@echo off
SET PATH="C:\\Program Files\\BOINC\\projects\\refwuXP;C:\\WINDOWS\\system32\\dllcache;C:\\WINDOWS\\system32"
CLS

echo %PATH%

REM ------- Cleaning previous Runs and merge previous results to log files -----------
@echo off
IF EXIST timingCPU.txt ( ECHO .............. >> timingCPU.log
TYPE timingCPU.txt >> timingCPU.log
DEL timingCPU.txt )

echo .
echo .
echo [-------------- Monitoring - Please Wait for completion ------------- ]
echo [-------------- ------- Or use CRTL+C to exit -------- -------------- ]
echo .
echo .

REM ----- Wil monitor if the file "boinc_finish_called" issued by the einstein app (at the end of the run)
REM ----- is present or not

@ECHO OFF
:TOP
IF EXIST boinc_finish_called GOTO END
REM IF EXIST timingCPU.txt ( DEL timingCPU.txt )
C:\\WINDOWS\\system32\\tasklist.exe /v /fi "IMAGENAME eq HierarchicalSearch.exe" > timingCPU.txt
C:\\WINDOWS\\system32\\ping.exe 127.0.0.1 -n 2 > NUL
GOTO TOP
:END

REM ---- The "ping 127.0.0.1 -n 2 > NUL" is a trick to make a 1 sec sleep in the loop

It uses the �tasklist.exe�, embedded in windows since WinNT, that gives information’s about running processes.
But, as it gives only instant info, the script will pull every second the process to get its CPU time, and exit as soon as a file appears. That file is create by the EAH-perf-app on exit.
Quite dirty, but not very CPU consuming it seems. Should run on one core CPU's too, without disturbing to much the EAH-perf-app, but...

To run it:
First run the first script to start the EAH-perf-app ( “run.bat WhateverAppYouWantToBench�.
Then, at any time before completion, open a second shell.
Go to the running folder (you might want to edit the PATH as well if you choose another one)
Run the second bat file : “monitor.bat�

When the tests are complete, it will issue a timingCPU.txt with your results.
It does a bit a file management as well, to have the different runs log in timingCPU.log

VERSION 2:

The one i used before finding a way to get Version 1 to work.
This requires user attention, as it used the "process monitor" of windows. Or another one you have, I use the sweet Taskinfo here.

The idea is to watch closely for the EAH-perf-app to end, and to get the results in “process monitor� before the thread disappear from it. You have a few seconds for this. I used "print screen key" to be quick, and the first script issues a motherboard beep at the end of the test.
So you won't have to monitor closely, just react quick when it beeps.

With Taskinfo, you will need to multiply the Time value by 2 on 2 cores CPUs to get the actual CPU time consumed by EAH-perf-app. This is weird i know...

I'm sure someone with better skills will get to something a lot cleaner, but for now it does the job...

Note : For Vista users, i have no idea if any of this will work. You have to discover by yourself.

Cheers.
Alex

The Script files =>
http://dl.free.fr/mv9e4n4Ug/Scripts.zip
Note, the “beep.bat� file contains just one line. Don’t edit it, as it contains a special char ( Alt+007) that emits a beep when combined with the “echo� command.

The grid/sky/WU's needed by EAH-perf-app =>
http://dl.free.fr/eznjSSWOH/refwuXP.zip

God created a few good looking guys.. and for the rest he put hairs on top..

[AF>Futura Sciences]click
[AF>Futura Scie...
Joined: 12 Apr 05
Posts: 34
Credit: 1,923,040
RAC: 0

Hi. I've done a little bit

Hi.

I've done a little bit of digging, crunching this poor WU again and again :D

1°) Are the results of Windows script accurate enough ?
I've run various apps with various methods to check it.
And to ensure maximum stability and avoiding wiggles, i ran the EAH apps in "real-time" mode using the persistent priority for a specific process offered by TaskInfo.
( !!! Only try this on multi-Core platforms or your systems will badly crash !!! )

Here are some results to compare script time-measurement method versus Taskmonitor method. For both User time and CPU time, even if CPU time is the more meaningful one here.

Crunch time in seconds on XP3800+.

Win...........Script....Tasinfo.....Script.....Tasinfo
App Version...User......User........CPU........CPU
4,01..........1549......1548........1543.......1542
4,07..........1659......1658........1656.......1656
4,13..........1398......1397........1397.......1396
4,15..........1399......1398........1398.......1398
4,26..........1282......1280........1278.......1278
4,36_SSE......788.......787.........785........786
4,46_0........1280......1280........1278.......1278
4,46_1_SSE....788.......786.........785........786 

Maybe easier to read this way =>

So we can say it's accurate enough, sometimes 1-2 sec difference.
User and CPU are real close here, because EAH was run in "real-time" priority.

I also ran each line 3 or 4 times, and again results of each run are within the same second.

2°) How various apps were improved over time (a tribute to the great work done here - apart from the science itself)
Big thanks to Bernd, Olive, Akosf, Bikeman, and probably others...

And not to forget the work done on previous runs too !

All the results here have been run on XP3800+ ; 1Go ram cas 2-2-2-5 ; FSB 210MHz.
Windows XP SP1 or Linux Ubuntu 8.04 Hardy Heron (kernel 2.6.24-17)

I download various app versions from here : http://einstein.phys.uwm.edu/download/
Maybe there's an other place with all the old apps ?

Only CPU time value is use here.
I ran more than once each version ( 3 - 4 times), and took the lowest value with a no-error output (stderr.txt).
In "real-time" priority there's no variation in crunch time, but in IDLE priority there can be a bit (more a less depending of what's running on the PC)
Don't forget to stop BOINC running in background in "IDLE" priority testing, or the swapping between the test WU and real WU would increase the time you will get, and might mislead later analysis.

And if we plot it =>

A few comments/questions on this:
- As expected, results for 4.26 vs 4.46 (windows) and 4.36 vs 4.49_1 are the same. :D (note only the SSE detect/switch mechanism was introduced in 4.4X )
- The faster version Linux 4.49 SSE2 is more than 3 times faster than the slower one (Linux 4.02) !!!
- Linux is doing better than Windows now. Would be interesting someone run the same on a Mac with triple boot have the full overview. Sorry, i don't have that kind of hardware.
- 4.07 is slower than 4.01, nothing really new, as it was aimed to correct a few bugs, and test the new checkpoints and separate graphics.
- The SSE2 4.49 version is slightly faster on my CPU with the ref WU. With real WU's, the gap is bigger (8% compared to 5%). And might do even better on other CPU architecture
So its an improvement anyway. I know its new and Bernd is working on it, later a 3rd switch (non-SSE;SSE;SSE2) in future official apps could benefit of this
- Now, the real point is about 4.24 !
BernD described it as a non-SSE one (see in chapter 5°) ).
I wasn't running Linux at that time so i don't have real experience with it.
But i think this was a mistake in the app description and the app is SSE.
Or i don't understand what has happened with later apps.
The big point of 4.24 was the introduction of the linear SIN/COS from akosf and bikeman. And this is now used (from my knowledge) in all versions.
So why should later non-SSE versions being slower that 4.24, and later SSE app being around the same speed ?
Bikeman, could you rectify/clarify this ? I must be misleading somewhere !

3°) Compare with real WU's

Be aware the results you will have with the ref WU is not the exact reflect of what you will get with real WUs.
BernD can probably give better explanation on this, my guess is the ratio of prefeching versus crunching is not exactly the same.

For example, with the ref WU, the 4.49 SSE2 is about 5% faster than 4.49 SSE on my system.
And with real WU's, after correcting the natural variation between WU's with the
t = a(1 - b|sin(PI*seq/period)|) formula, the freq ceiling, and taking care of the natural wiggles close to the valleys ; the speed-up is an average of 8.1% on my system.

And the difference can be even more important when comparing apps that are not as close at this one.

So the plot before, needs to be related with real Wu's logs.
An update of my logs plot =>

4°) Different CPU architectures

Now, to give the devs a clearer picture, and maybe a direction were they should put more efforts, it would be nice that some of you run similar tests on different CPU architecture or OS's (mac for example).
So we could come up with a small database for them to extract results.

I heard for example, that SSE2 could perform different on Core2 architecture.
Or, that the low RAM amount or low bandwidth can make the gap between valleys and peaks a lot wider. For example the machine owned by Herr Datenrat already discussed here (25kS for valleys and a huge 65kS for peaks)

I will run the tests on my vintage CPU's as well ; non-SSE 486 (sx25-33 ; Dx2-66 ) , P2 (266-333-400) , P3-600 ,

5°) Quick summary of changes from one app to another, to link with tests results.
Do you know if there’s a backup somewhere of the changelog for each app ?
Like a backup of each evolution of this page over time =>
http://einstein.phys.uwm.edu/app_test.php

Only the apps from whom I found the exec somewhere and test above a describe here.

------------------------------

Windows S5R3 App 4.01 / Linux 4.02 (22 Sep 2007)
http://einsteinathome.org/node/193160

The code of the Apps hasn't changed very much compared to the 4.4x Apps of S5R2. If you would run the same S5R2 workunit with both of them you'd find the S5R3 App being slightly faster.

------------------------------

Windows S5R3 App 4.07 (26 Sep 2007)
http://einsteinathome.org/node/193177

This App addresses two problems that are mentioned in the "Client Errors of S5R2 Apps" thread in the "Problems and Bug Reports" Forum.

It features :
- a new checkpointing mechanism and code that has been implemented from scratch without the legacies from older Apps and their requirements. It is much more simple and thus, I hope, more reliable. However the checkpoint files are incompatible with the previous ones, thus there can't be a "smooth transition" from the current official App to this Beta.

- dynamic decision whether to run graphics or not. It will start up without the graphics libraries GDI32.DLL, OPENGL32.DLL and GLU32.DLL, will try to load them on its own and in case of a failure it will run without graphics instead of crashing with a client error.

------------------------------

Windows S5R3 App 4.13 (19 Oct 2007)
http://einsteinathome.org/node/193242

This App fixes the problem that actually was in the command-line parsing of the 4.11 Beta App (exit status 99, "Ending frequency not contained in SFT (sequence)").

In addition it forces immediate syncing of the checkpoint file (thanks Bikeman!), which should make the checkpointing even more reliable.

------------------------------

Windows S5R3 App 4.15 (23 Oct 2007)
http://einsteinathome.org/node/193262

It has the immediate syncing that has been disabled for the 4.13 release re-enabled and (hopefully) working now. It should make the checkpointing more reliable, e.g. in case of a power failure.

--------------------------

GNU/Linux S5R3 App 4.20 (29 Nov 2007)
http://einsteinathome.org/node/193352

This App should fix the bug that caused a SEGV (signal 11) when the App couldn't write a checkpoint. Speed should be comparable to that of the 4.16.

--------------------------

GNU/Linux S5R3 App 4.24 (14 Jan 2008)
http://einsteinathome.org/node/193438

This App should fix the bug that caused a SEGV (signal 11) when the BOINC Core Client became unresponsive (e.g. due to network access / problems).

This "standard" (non-SSE) App has the "linear SIN/COS" code working (many thanks to Akos and Bikeman) and should thus be somewhat faster than the 4.20.

------------------------

Windows S5R3 App 4.26 (21 Jan 2008)
http://einsteinathome.org/node/193453

From the 4.25 App Thread:
I'll try to build an App with the old Visual Studio of 2003 (instead of VS2005). At least the /G7 optimization should work there. Let's see if it helps...

-------------------------

GNU/Linux S5R3 App 4.31 (8 Feb 2008)
http://einsteinathome.org/node/193497

This App looks a little faster than the previous 4.24 due to some hacking with the sin/cos routine, and it is a new "separate graphics" App (featuring the "extended information" mentioned in the "screensver competition" thread).

It's probably not the fastest we can do w/o SSE, but in contrast to the quick-fix 4.24 it's an actual release candidate.

-------------------------

GNU/Linux S5R3 App 4.35 (22 Feb 2008)
http://einsteinathome.org/node/193528

# codebase of 4.33 App (MacOS Intel)
# Only run this if you're sure that your CPU supports SSE
# "Hough" prefetching (ass)
# hand-coded SSE hot-loop
# linear sin/cos approximation ("+2")
# graphics in a separate program ("BOINC APIv6")

-------------------------

Windows S5R3 SSE power App 4.36 (26 Feb 2008)
http://einsteinathome.org/node/193533

# Windows App with SSE "hot loop" and prefetching
# Only run this if you're sure that your CPU supports SSE
# graphics in a separate program ("BOINC APIv6")

------------------------

GNU/Linux S5R3 App 4.38 (29 Feb 2008)
http://einsteinathome.org/node/193541

The package includes renamed versions of the Apps formaly known as 4.31 and 4.35 and a little wrapper program that switches between them based on the CPU fetures it detects.

------------------------

Windows S5R3 App 4.46 (7 May 2008)
http://einsteinathome.org/node/193660

This App combines the Apps previously known as 4.26 and 4.36 (SSE) with the App-switching mechanism we already use on Linux and MacOS PPC

------------------------

GNU/Linux S5R3 App 4.49 (14 May 2008)
http://einsteinathome.org/node/193675

This is a "switching" App as you already know from the 4.38 App.

The SSE App was built with compiler settings that should use the SSE unit for most arithmetics (-mfpmath=sse). There might be a difference in speed overall in one or the other direction, but mainly it should serve two purposes: first we try to avoid the FPU exceptions by avoid using the FPU, and second it should improve the prefetching of the Hough code (actually enable parts of it). This means that the App should run faster and further reduce the variance in run-times between workunits compared to previous versions.

In addition the App was built with BOINC API as of May 7, which means that it should work properly with latest development clients.

------------------------

Cheers,
Alex.

PS: Feel free to comment/rectify any of this
PS2: Sorry for my long posts, bad habit i know.

God created a few good looking guys.. and for the rest he put hairs on top..

Bikeman (Heinz-Bernd Eggenstein)
Bikeman (Heinz-...
Moderator
Joined: 28 Aug 06
Posts: 3,522
Credit: 693,435,718
RAC: 96,079

Hi Alex! Awesome! Thanks

Message 77871 in response to message 77870

Hi Alex!

Awesome! Thanks for the many plots and comparative statistics.

Quote:

The big point of 4.24 was the introduction of the linear SIN/COS from akosf and bikeman. And this is now used (from my knowledge) in all versions.


The linear sin/cos code was written by Bernd and Akos, I just helped to optimize the C version a bit :-).

Yes, this variant is now used for all versions.

Quote:

So why should later non-SSE versions being slower that 4.24, and later SSE app being around the same speed ?
Bikeman, could you rectify/clarify this ? I must be misleading somewhere !


IIRC, there could two reasons:

First: additional self-checking code to detect corruption of the computations. AFAIK all of this is still used now, not quite sure when they were first introduced.

Second: Before the performance critical sections were hand-coded in assembler, it was a bit of luck how many of the FPU registers would be used by the compiler. Depending on the surrounding code, the gcc compiler would compile the same lines of C code rather differently, resulting in rather drastic performance differences. Even providing compiler hints would not help.

Still, the absolute speed of 4.24 that you measured is surprising to me, I think I'll do some tests as well. In fact, my recollection is that 4.24 did NOT improve crunching speed at all, also see this message from Bernd: http://einsteinathome.org/node/193438&nowrap=true#79506

Could there be an error in your measurements?

CU

Bikeman

[AF>Futura Sciences]click
[AF>Futura Scie...
Joined: 12 Apr 05
Posts: 34
Credit: 1,923,040
RAC: 0

Hi mate ;) RE: The

Message 77872 in response to message 77871

Hi mate ;)

Quote:

The linear sin/cos code was written by Bernd and Akos, I just helped to optimize the C version a bit :-).

That's just what Bernd said at that time. :D
Anyone that contributes might deserve credits, whatever the contribution was. ;)
Ps : I'm not aware of everything, so feel free to edit my post to add any names i could have forgotten !

Quote:

Yes, this variant is now used for all versions.

Thanks, seems logical.


Quote:


IIRC, there were two reasons:

First: additional self-checking code to detect corruption of the computations in versions > 4.24. AFAIK all of this is still used now.

Second: Before the performance critical sections were hand-coded in assembler, it was a bit of luck how many of the FPU registers would be used by the compiler. Depending on the surrounding code, the gcc compiler would compile the same lines of C code rather differently, resulting in rather drastic performance differences. Even providing compiler hints would not help.

Thanks for the enlightenment !

Quote:

Still, the absolute speed of 4.24 that you measured is a bit surprising to me, I think I'll do some tests as well.

I was quite confused too.
That's why i tried to dig as much as i could, running an re-running it, reading the threads...
But, as i never used this app at that time, i was unable to compare with my own results.
And you now a lot better than me, theses results can be specific to my XP3800 architectures.
Maybe you will have better clues. ;)

I got this version here:
http://einstein.phys.uwm.edu/download/einstein_S5R3_4.24_i686-pc-linux-gnu
The date of the file being stored there is quite coherent.
Is there an other place for the old apps files being stored ?

The results for this particular app is so disturbing, i'm pretty sure it's a glitch either from my side, the app i used, the parameters i gave to the app ...

If you need be to test anything, just ask ;)

Cheers,
Alex.

PS : shell output and stderr.txt of 2 runs =>

root@alex-Ubuntu:/var/lib/boinc-client/projects/refwusidetest# ./run.sh ./einstein_S5R3_4.24_i686-pc-linux-gnu
Now running: ./HierarchicalSearch --method=0 --Freq=1005.20242518 --FreqBand=0.0161398745876 --dFreq=6.71056161393e-06 --f1dot=-1.58548959919e-09 --f1dotBand=1.74403855911e-09 --df1dot=3.88447721545e-10 --skyGridFile=skygrid_1010Hz_S5R3.dat --DataFiles1='h1_1005.00_S5R3;l1_1005.00_S5R3;h1_1005.05_S5R3;l1_1005.05_S5R3;h1_1005.10_S5R3;l1_1005.10_S5R3;h1_1005.15_S5R3;l1_1005.15_S5R3;h1_1005.20_S5R3;l1_1005.20_S5R3;h1_1005.25_S5R3;l1_1005.25_S5R3;h1_1005.30_S5R3;l1_1005.30_S5R3;h1_1005.35_S5R3;l1_1005.35_S5R3' --tStack=90000 --nStacksMax=84 --pixelFactor=0.500 --nf1dotRes=1 --ephemE=earth_05_09 --ephemS=sun_05_09 --nCand1=10000 -o Hough.out --gridType=3 --useWeights=0 --printCand1 --semiCohToplist -d1
cp: ne peut évaluer `stderr.txt': Aucun fichier ou dossier de ce type
  adding: Hough.out (deflated 91%)

real 13m59.023s
user 12m54.980s
sys 0m6.256s

Stderr.txt =>
2008-05-26 19:12:29.8275 [normal]: Built at: Feb 21 2008 15:57:05

2008-05-26 19:12:29.8277 [normal]: Start of BOINC application './HierarchicalSearch'.
shmget: Invalid argument
Can't set up shared mem: -1
Will run in standalone mode.
2008-05-26 19:12:29.8287 [normal]: WARNING: Can't boinc-resolve result file 'Hough.out'
2008-05-26 19:12:29.8287 [normal]: WARNING: boinc-resolved result file "Hough.out" in local directory - will zip into "Hough.out.zip"
2008-05-26 19:12:29.8288 [debug]: Set up communication with graphics process.
2008-05-26 19:12:30.2650 [debug]: Reading SFTs and setting up stacks ... done
2008-05-26 19:12:49.4284 [normal]: INFO: Couldn't open checkpoint Hough.out.cpt
2008-05-26 19:12:49.4285 [debug]: Total skypoints = 31. Progress: 0,
$Revision: 1.115 $ OPT:3 SCV:9, SCTRIM:2, HLV:3, HP:7
1, 2, 3, 4, 5, 6, 7, 8, 9, 10, c
11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, c
24, 25, 26, 27, 28, 29, 30, done.
FPU status flags: PRECISION
2008-05-26 19:26:26.8072 [normal]: done. calling boinc_finish(0).
called boinc_finish

----------

root@alex-Ubuntu:/var/lib/boinc-client/projects/refwusidetest# ./run.sh ./einstein_S5R3_4.24_i686-pc-linux-gnu
Now running: ./HierarchicalSearch --method=0 --Freq=1005.20242518 --FreqBand=0.0161398745876 --dFreq=6.71056161393e-06 --f1dot=-1.58548959919e-09 --f1dotBand=1.74403855911e-09 --df1dot=3.88447721545e-10 --skyGridFile=skygrid_1010Hz_S5R3.dat --DataFiles1='h1_1005.00_S5R3;l1_1005.00_S5R3;h1_1005.05_S5R3;l1_1005.05_S5R3;h1_1005.10_S5R3;l1_1005.10_S5R3;h1_1005.15_S5R3;l1_1005.15_S5R3;h1_1005.20_S5R3;l1_1005.20_S5R3;h1_1005.25_S5R3;l1_1005.25_S5R3;h1_1005.30_S5R3;l1_1005.30_S5R3;h1_1005.35_S5R3;l1_1005.35_S5R3' --tStack=90000 --nStacksMax=84 --pixelFactor=0.500 --nf1dotRes=1 --ephemE=earth_05_09 --ephemS=sun_05_09 --nCand1=10000 -o Hough.out --gridType=3 --useWeights=0 --printCand1 --semiCohToplist -d1
cp: ne peut évaluer `stderr.txt': Aucun fichier ou dossier de ce type
adding: Hough.out (deflated 91%)

real 13m29.012s
user 12m51.356s
sys 0m4.068s

EDIT : what is the best way to verify the outputs are correct when testing an app with the ref WU ?
I did some "file compare" with ought.out for example, but is seems it's timestamped so a direct ASCII/HEX/BIN compare will not help.
I tried with other files as well, but this led nowhere...
Is there a better way than just verify there's no errors in stderr.txt ?

God created a few good looking guys.. and for the rest he put hairs on top..

Mikie Tim T
Mikie Tim T
Joined: 22 Jan 05
Posts: 105
Credit: 263,777,741
RAC: 0

RE: On a Kubuntu 8.04 64

Message 77873 in response to message 77852

Quote:

On a Kubuntu 8.04 64 bit setup with an AMD Turion X2 TL-56 I get the following results with the test workunit:

./einstein_S5R3_4.49_i686-pc-linux-gnu_0
real 19m27.247s
user 19m21.557s
sys 0m2.940s

./einstein_S5R3_4.49_i686-pc-linux-gnu_1(SSE)
real 12m27.322s
user 12m22.746s
sys 0m2.312s

./einstein_S5R3_4.49_1_i686-pc-linux-gnu(SSE2)
real 12m33.268s
user 12m28.435s
sys 0m2.484s

Looks like the SSE2 app has no performance gain on Turion processors.

I stand corrected. After reading all of these posts about pretty much everyone else showing a performance gain with the SSE2 app, I thought back and realized that I hadn't stopped BOINC when I ran the tests, only suspended the project. Retesting the apps after stopping the CC resulted in SSE2 finishing over 30 seconds sooner than the SSE app. So, I've swapped out the _1 app in the project, and we'll see what kind of gains occur on Turion X2's with regular workunits.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.