Ready Reckoner Area

[AF>Futura Sciences]click
[AF>Futura Scie...
Joined: 12 Apr 05
Posts: 34
Credit: 1,923,040
RAC: 0

RE: Ok, how's it goin? Any

Message 79449 in response to message 79448

Quote:

Ok, how's it goin?

Any problemo's ? ... :-)

Cheers, Mike.

Very nice work Mike !!!
Seems you had fun ( and sometimes unpredicted difficulties ;) ) programming it.

I've been following all this from the beginning, even if in silent mode.
So, thanks to all the contributors who made an impressive work understanding the cyclic nature of EAH WU’s ; Archae, Richard, Bikeman, Gary, Mike, Paul…. And most impressive is that the results were very accurate as Bernd explanations confirmed later.
Seems like you all had a lot of fun playing with it, as I had myself reading you and experimenting with logs.

Mike, as it seems, you are willing to play a little bit more, may I suggest you to implement an other way to display datas, with a little bit different approach, to enrich the exploitation for “non-expert� users.

----------------
First, let me introduce a little bit my history the past 3 months and background a bit my specificities;
My crunching machine is quite slow (XP3800+), so I’m getting from the scheduler quite a few WU’s in the same frequency. Adding to that a bad setup of BoinclogX, a lot of WU’s were missing in my old logs, witch I discovered later. My Pc is also used for a lot of stuffs apart EAH witch introduce quite some variations in crunch time for similar WU’s.
So theses two points made quite difficult for me to data mining, first by hand then with RR early versions, and to extract useful information’s. Such as variance or speed improvement for new apps (etc.. ).
I needed to use all my data points, so it gives me the idea to take it the reverse way. Instead of calculating peaks, variance for each freq, why not transmute the datas itselves in a different “universe� (probably not the right mathematical word for this :/ ) in order to plot it.
Instead of displaying it with seq number in abscise for each freq, I displayed it a “linear virtual unit� witch is a function of :
((X * Period) / (10Hz stepped freq)² )
Where X is to take the 0,000206 factor in account, even if this is irrelevant in my case for a graphical approach.

This gives something like this on my datas :

Not very nice because of the lack of points for some apps versions, and because of the “noise� in crunch caused by the side use of the PC. But still gives a good idea of what I’m talking about.
This is a way to display datas with freq from 320 to 1000 here, without having it all mix-up.

And with clean datas of Peanuts:

Despite the still few points in the data set (too bad I don’t have Linux to use the script and Gary stopped updating) . We can even start distinguish the envelope of the wiggle in crunch time near the though on the blue curve, that were well explained by Bikeman reminder
----------------

So the idea Mike, would be to add an other tab in RR that display all the points in the same way as I do.
Then you can add on it a few very useful features that was not easy for me using exel graph

- Trace the “virtual curve� for each app version using your minimum square approach, but with a lot more datas (by working in the "virtual unit" universe)than when addressing a single frequency. Then you can directly calculate the “improved percentage� between each app release by comparing each “virtual curve�. This would be quite accurate (more points than with single freq approach) and directly useful for “non-expert user�. This is the only thing that 99% of people are interested-in.
- Display peaks an troughs as usual
- Allow to enlighten in a different color the data points corresponding to a specific frequency data set.
- The ability to navigate between points as you already do in RR current versions.
- Etcetera etcetera…

And now, in order to make it a quick plug and-play tool for beta test users and perfs testing, you might also want to add an “auto-import history� in the same way as the Linux script do with the computer ID in input.
I don’t know if this is difficult for you to program with java, but without it user will need to already have their log history or run the script aside. Witch will dismiss many of them I think.

Theses are just crazy thoughts, just for you to have a bit more fun if you wish.

Sorry for my bad English and sorry for this massive post..

Cheers, Alex.

PS : If some of you have large logs of datas, could you please keep posting updates from time to time in the dedicates thread [url=http://einsteinathome.org/node/193515)] here [/url] in order for us to play with ;)

PS2 : My Excel file and data file if you want to play a bit.
Zip file
Notes :
- If you want to use the auto import macro (the “import� button) you will need to point on the .txt file that contains datas when prompt. The file structure/separators is the same as for RR.
- You might experience strange behaviour of the macro if using an English version of excel, because of the format used in my French version. Separators “.� are translated in “,� and vice-versa. I had to exchange some in the macro code, in order to translate during the logs import. You can have a look to the macro source and discard this part if this cause troubles on English Excel.
- You don't need to use the macro, juste paste your datas in the 4 first columns. But don't do it by hand. First step import CSV file in a new sheet, than copy and paste the entire block in my file.
- Not a really clean file excel file as used as a fun tool for me.

God created a few good looking guys.. and for the rest he put hairs on top..

Mike Hewson
Mike Hewson
Moderator
Joined: 1 Dec 05
Posts: 6,199
Credit: 133,822,847
RAC: 15,430

RE: Very nice work Mike

Message 79450 in response to message 79449

Quote:
Very nice work Mike !!!
Seems you had fun ( and sometimes unpredicted difficulties ;) ) programming it.


Bit of an odyssey, actually... :-)

Quote:
I've been following all this from the beginning, even if in silent mode.
So, thanks to all the contributors who made an impressive work understanding the cyclic nature of EAH WU’s ; Archae, Richard, Bikeman, Gary, Mike, Paul…. And most impressive is that the results were very accurate as Bernd explanations confirmed later.


Yes, indeed a nifty group effort. Congrats to all!

Quote:

Seems like you all had a lot of fun playing with it, as I had myself reading you and experimenting with logs.

Mike, as it seems, you are willing to play a little bit more, may I suggest you to implement an other way to display datas, with a little bit different approach, to enrich the exploitation for “non-expert� users.


Sure, that's fine, that's why I asked ..... :-)

Quote:

----------------
First, let me introduce a little bit my history the past 3 months and background a bit my specificities;
My crunching machine is quite slow (XP3800+), so I’m getting from the scheduler quite a few WU’s in the same frequency. Adding to that a bad setup of BoinclogX, a lot of WU’s were missing in my old logs, witch I discovered later. My Pc is also used for a lot of stuffs apart EAH witch introduce quite some variations in crunch time for similar WU’s.
So theses two points made quite difficult for me to data mining, first by hand then with RR early versions, and to extract useful information’s. Such as variance or speed improvement for new apps (etc.. ).
I needed to use all my data points, so it gives me the idea to take it the reverse way. Instead of calculating peaks, variance for each freq, why not transmute the datas itselves in a different “universe� (probably not the right mathematical word for this :/ ) in order to plot it.
Instead of displaying it with seq number in abscise for each freq, I displayed it a “linear virtual unit� witch is a function of :
((X * Period) / (10Hz stepped freq)² )
Where X is to take the 0,000206 factor in account, even if this is irrelevant in my case for a graphical approach.

This gives something like this on my datas :

Not very nice because of the lack of points for some apps versions, and because of the “noise� in crunch caused by the side use of the PC. But still gives a good idea of what I’m talking about.
This is a way to display datas with freq from 320 to 1000 here, without having it all mix-up.

And with clean datas of Peanuts:

Despite the still few points in the data set (too bad I don’t have Linux to use the script and Gary stopped updating) . We can even start distinguish the envelope of the wiggle in crunch time near the though on the blue curve, that were well explained by Bikeman reminder
----------------

So the idea Mike, would be to add an other tab in RR that display all the points in the same way as I do.
Then you can add on it a few very useful features that was not easy for me using exel graph

- Trace the “virtual curve� for each app version using your minimum square approach, but with a lot more datas (by working in the "virtual unit" universe)than when addressing a single frequency. Then you can directly calculate the “improved percentage� between each app release by comparing each “virtual curve�. This would be quite accurate (more points than with single freq approach) and directly useful for “non-expert user�. This is the only thing that 99% of people are interested-in.
- Display peaks an troughs as usual
- Allow to enlighten in a different color the data points corresponding to a specific frequency data set.
- The ability to navigate between points as you already do in RR current versions.
- Etcetera etcetera…


Capital ideas! :-)
I was indeed wondering how to sensibly generalise the appearance of plots and the analysis above the 'frequency' level. Your method is valid - a form of normalisation of each frequency curve to a generic one - using the period formula and 10Hz step thingy. Then overlay all the frequencies, marking them by a color code like the Excel displays. I'll have to have a think about the maths of whether one converts to some virtual measure ( or variant ) as you define BEFORE the curve fit, OR do the individual curve fits ( as at present ) and THEN convert to the common benchmark. [ As squares are involved in a 'least squares' method, then I suspect a non-linearity may appear b/w the two alternatives. I'll have to run through my derivations and see if the order matters ... ]

Quote:
And now, in order to make it a quick plug and-play tool for beta test users and perfs testing, you might also want to add an “auto-import history� in the same way as the Linux script do with the computer ID in input.
I don’t know if this is difficult for you to program with java, but without it user will need to already have their log history or run the script aside. Witch will dismiss many of them I think.


Absolutely, a must I think. I have sketched a preliminary algorithm for automating the interrogation of a user's account to fish out the host/app/freq/seq/runtime elements. I'm looking at some Java libraries for direct programmatic HTTP work, as I'd very much prefer if someone smarter than me has already solved the low-level HTTP session stuff for me. Then I could focus on the parsing of the data from the HTML files.

Quote:
Theses are just crazy thoughts, just for you to have a bit more fun if you wish.
Sorry for my bad English and sorry for this massive post..


I pronounce you sane, and your English is fine :-)
Thanks for taking the time to give your thoughtful feedback!

Cheer, Mike.

I have made this letter longer than usual because I lack the time to make it shorter. Blaise Pascal

archae86
archae86
Joined: 6 Dec 05
Posts: 2,994
Credit: 4,545,695,036
RAC: 4,417,166

RE: Any problemo's ? ...

Message 79451 in response to message 79448

Quote:
Any problemo's ? ... :-)


Mike, I downloaded it shortly after you posted, but still have failed to try using it. I did, however, have a comment I should have posted right away.

It seems a bit confusing that the correct order of fields for entry in the CSV area differs from the order of the controls provided for manual entry immediately above it.

I assume the csv order is driven my backward compatibility with (something?), so perhaps you might consider altering the order of the control boxes to match.

That might help someone who has not gotten into the documentation, or just does not remember.

I'll lash myself with a wet noodle and resolve actually to use the tool soon so I can provide other comments.

Brian Silvers
Brian Silvers
Joined: 26 Aug 05
Posts: 772
Credit: 282,700
RAC: 0

RE: RE: Very nice work

Message 79452 in response to message 79450

Quote:
Quote:
Very nice work Mike !!!
Seems you had fun ( and sometimes unpredicted difficulties ;) ) programming it.

Bit of an odyssey, actually... :-)

Just to let you know I'm still meaning to come around to looking at this. I've gotten involved with finishing up my finals and, finally, getting a job. I have been all but guaranteed a summer internship with my local government's IT shop. It is part-time and no benefits, which means I'll still be having to shell out $300+ for insurance per month, but that runs out in July. The internship runs out in August, and they have claimed to want to move me to full time, but who knows...

Anyway, I'll try out whatever latest version you have up sometime either tomorrow or on Monday.

[AF>Futura Sciences]click
[AF>Futura Scie...
Joined: 12 Apr 05
Posts: 34
Credit: 1,923,040
RAC: 0

Hi : I tried to do a bit

Hi :

I tried to do a bit of data mining with long term logs Peanut post here :
http://einsteinathome.org/node/193703

Unfortunately, he didn’t log the app used for each WU (or the CPU freq changes ... ; anything that affect the crunch times), so its difficult to get a clean analysis.
But he did log a date and time stamp, associates with the WU’s, so using a high and low threshold on the date stamp it’s still possible to discriminate runs.
Not perfectly accurate thought. (by the way, it seems that early datas all have the same date stamp ; so about 40% of the data’s are not discriminable this way).

Note : Still using the form of normalisation of each frequency curve to a generic one described above.

By the way, your correction factor seems to be close to 0.0002045

The frequency/period normalisation seems to be still working pretty well.
As we can in this zoom on a specific area, the wiggles envelope starts to pop-up pretty well:

BUT, there are also some disturbing things !

If you look at the green arrows, some of the peaks seems not to be matching.
Whereas they match well everywhere else, even with a very wide range of frequencies (400-Hz to 1000+Hz).
Strange…
And the data’s here as not easy to analyse, as many of them have a same datestamp in this particular area.
The weird run with the green arrow, is all on the same frequency: 749.9, with period from 213 to 323.
I don’t have a clue right now, will to try to investigate more later.
By that time, if anyone has an idea…

If anyone wants to play with the datas, here’s the Excel sheet and the data file converted to CSV in the “RR way� :
Download Files
Sorry for the proprietary Excel format. To be quick I used my previous sheets.
I’ll try to have a clean file in OpenOffice format next time.
Anyway, if you use OpenOffice on it, all the computation/plot part should work fine.
Only the Auto “import from CSV file - graph scale…� macro won’t work. But useless here.
And one last thing, the file is not completely clean, some columns from my previous use are still there and are meaningless. But the date threshold scrollbars, the peak detection… are functional.

Cheers,
Alex.

God created a few good looking guys.. and for the rest he put hairs on top..

peanut
peanut
Joined: 4 May 07
Posts: 162
Credit: 9,644,812
RAC: 0

FYI... The date in my data is

FYI... The date in my data is loosely the date gathered. I run my script about once a day so all the things I pull in one day tend to have the same date (the time the SQL query adds the records). The reason early stuff is the same date is that was a big dump from a spreadsheet into my MYSQL database. And I don't have APP in my data as noted. So my data probably is good for making nice graphs but not so good for very detailed analysis.

I do have Task ID and Work Unit Id stored, but I don't know if they are of any value for this kind of thing.

[AF>Futura Sciences]click
[AF>Futura Scie...
Joined: 12 Apr 05
Posts: 34
Credit: 1,923,040
RAC: 0

Hi ! RE: FYI... The

Message 79455 in response to message 79454

Hi !

Quote:
FYI... The date in my data is loosely the date gathered. I run my script about once a day so all the things I pull in one day tend to have the same date (the time the SQL query adds the records). The reason early stuff is the same date is that was a big dump from a spreadsheet into my MYSQL database. And I don't have APP in my data as noted.

Thanks for the infos.
Got to a similar explanation myself, when i saw the similar date bunch of WU's

Quote:

So my data probably is good for making nice graphs but not so good for very detailed analysis.

Makes things a bit harder, but not impossible.
I'm sure were can squeeze out more infos from it.

Quote:

I do have Task ID and Work Unit Id stored, but I don't know if they are of any value for this kind of thing.

No won't help more.
The useful information is all "external" factors that affect crunch time apart from the cycling nature of WU lenght.
Like an new optimized app or a change in CPU clocking, RAM timings...

But you can't provide something you don't have monitor at that time ;)

Graphically, it's easy to see that any of this changed, with a "new line" been drawing beneath others.
But while datamining, it's quite tricky to sort each one without complex maths, like Mike's square root method discussed above.
A quick way could be to set a IF (Crunch time > X AND Crunch time X AND Date < Y).
Working well (except for those you import from SQL base), but take some time to sort properly.

Haven't worked on it since yesterday, but will do. ;)

Cheers,
Alex.

God created a few good looking guys.. and for the rest he put hairs on top..

Mike Hewson
Mike Hewson
Moderator
Joined: 1 Dec 05
Posts: 6,199
Credit: 133,822,847
RAC: 15,430

Continuing/resurrecting

Continuing/resurrecting development of Ready Reckoner 7 Version F from here.

To be definite, these are the relevant variables so we can discuss by name:

freq_adjustment changed to 0.42
grid_density changed to 0.00029
freq_steps unchanged at 10

Call these changes by the name Ready Reckoner 7 Version G

Ok, Peter's data :
[pre]541.95,181,30960
541.95,180,30563
541.95,179,30876
541.95,177,30186
541.95,176,30071
541.95,175,29709
541.95,174,29272
541.95,172,28926
541.95,171,28646
541.95,170,28600
541.95,169,55204
541.95,167,49156
541.95,166,39407
541.95,165,37621
541.95,163,27246[/pre]
is still disliked by this RR. Now Gary's for the second CPU quoted here
[pre]1164.00,1121,26734
1164.00,1120,26693
1164.00,1119,26729
1164.00,1118,34669
1164.00,1117,39721
1164.00,1116,39744
1164.00,1115,39753
1164.00,1114,39784
1164.00,1113,39900
1164.00,1166,42514
1164.00,1165,42520
1164.00,1164,42556
[/pre]
gives an analysis:
Number of points = 12
Minimum runtime in data = 26693
Maximum runtime in data = 42556
Estimated peak runtime = 45684
Estimated average runtime = 33637
Estimated trough runtime = 26760
Estimated runtime variance = 0.414

although the fit is not a pretty one:

I'll post some more fiddles shortly....

Cheers, Mike.

I have made this letter longer than usual because I lack the time to make it shorter. Blaise Pascal

Winterknight
Winterknight
Joined: 4 Jun 05
Posts: 482
Credit: 178,814,359
RAC: 201,150

Continued from post 89912 in

Continued from post 89912 in Windows S5R4 SSE2 power App 6.05 available.
[pre]----------------
ANALYSIS RESULTS
----------------

-----------
App Version : NONE
-----------
Frequency : 1048
Period of task cycle = 320
Task sequence number = 856
runtime = 23194
phase = 0.675
principal value = 0.852
Task sequence number = 857
runtime = 22855
phase = 0.678
principal value = 0.847
Task sequence number = 858
runtime = 23233
phase = 0.681
principal value = 0.842
Task sequence number = 859
runtime = 22857
phase = 0.685
principal value = 0.837
Task sequence number = 860
runtime = 23244
phase = 0.688
principal value = 0.831
Task sequence number = 861
runtime = 23091
phase = 0.691
principal value = 0.826
Task sequence number = 862
runtime = 23115
phase = 0.694
principal value = 0.82
Task sequence number = 864
runtime = 23277
phase = 0.7
principal value = 0.809
Task sequence number = 867
runtime = 23179
phase = 0.71
principal value = 0.791
Task sequence number = 868
runtime = 23213
phase = 0.713
principal value = 0.785
Task sequence number = 869
runtime = 24222
phase = 0.716
principal value = 0.779
Task sequence number = 870
runtime = 23410
phase = 0.719
principal value = 0.773
Task sequence number = 871
runtime = 23243
phase = 0.722
principal value = 0.766
Task sequence number = 925
runtime = 25709
phase = 0.891
principal value = 0.336
Number of points = 14
Minimum runtime in data = 22854.58
Maximum runtime in data = 25708.98
Estimated peak runtime = 27486
Estimated average runtime = 24157
Estimated trough runtime = 22257
Estimated runtime variance = 0.19[/pre]
The seq numbers down to 847 are on my computer waiting to be crunched.host 252515 But then I have moved to 1048.05Hz.

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5,460
Credit: 58,728,660,809
RAC: 55,871,786

RE: Continuing/resurrecting

Message 79458 in response to message 79456

Quote:
Continuing/resurrecting development of Ready Reckoner 7 Version F from here.

Mike,

As previously discussed in the other thread, I've now uploaded 8 results files, six of which belong to Q6600 hosts and two of which belong to AMD64 dual cores.

The Q6600s are 1609769, 1607190, 1608085, 1613641, 1248847, 1621253.
The Windows host is 1248847 and it shows a transition from science app 6.04 to 6.05. All the others are running Linux.

The AMD64 dual cores are 1512497 and 1607993. Both are running Linux.

Cheers,
Gary.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.