[S5R3/R4] How to check Performance when Testing new Apps

Mike Hewson
Mike Hewson
Moderator
Joined: 1 Dec 05
Posts: 6,143
Credit: 129,218,220
RAC: 25,772

OK! I've gone rather

OK! I've gone rather ballistic on RR_V6A ( 120K ) having scored a freely distributable Javascript graphics library. Basically the same functionality as V5A but I've re-done/savaged the interface to suit - a picture is indeed worth a thousand words!

Before you ask - I am working on a method to print the plots. :-)

As usual, please tell me about the least little thing ..... :-)

Enjoy!

Cheers, Mike.

I have made this letter longer than usual because I lack the time to make it shorter. Blaise Pascal

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5,354
Credit: 48,525,022,562
RAC: 57,648,534

RE: OK! I've gone rather

Message 77785 in response to message 77784

Quote:
OK! I've gone rather ballistic on RR_V6A ( 120K ) having scored a freely distributable Javascript graphics library. Basically the same functionality as V5A but I've re-done/savaged the interface to suit - a picture is indeed worth a thousand words!

Wow!! Nice work!!

This is a current data file that I used to test V6A with. It has come from a dual PIII coppermine 1Gig HP Netserver running Linux. It's the same machine that I gave you data for back on Feb 2 with a promise of a couple more points when they finished. Well, better late than never and at least you get a few more than just two extra points. The HostID is 946535

0764.50,049,96360,4.27
0764.50,054,93559,4.27
0764.50,109,125899,4.27
0764.50,123,137237,4.27
0764.50,139,119570,4.27
0764.50,144,114911,4.27
0764.50,166,98298,4.27
0764.50,170,98555,4.27
0764.50,180,93351,4.27
0764.50,186,96133,4.27
0764.50,198,97331,4.27
0764.50,206,105605,4.27
0764.50,217,114704,4.27
0764.50,235,135611,4.27
0764.50,241,140024,4.27
0764.50,248,133579,4.27
0764.50,253,129221,4.27
0764.50,266,111334,4.27
0764.50,272,108279,4.27
0764.50,278,103103,4.27
0764.50,282,98828,4.27
0764.50,287,96485,4.27
0764.50,288,97247,4.27
0764.50,298,102002,mixed
0764.50,299,107445,mixed
0764.50,309,119601,4.14
0764.55,060,93651,4.27

I pasted exactly this into the LH 'Input' pane and got output in the RH pane and a very nice full width plot below. I particularly like the 'next' and 'prev' ability. The above data has only one workable frequency so all I could cycle through were the sequence numbers. In doing that, points that are perhaps a little suspect seem to show up very clearly. As an example, seq# 170 seems to have taken a bit longer than it should have compared to the immediate neighbours.

On clicking the Inputs and Outputs Summary, the information produced is pretty much as in the previous version. Here is a small snippet

	Frequency : 764.5
		Period of task cycle = 120.4
			Task sequence number = 49
				runtime = 96360
				phase = 0.407
				principal value = 0.958
			Task sequence number = 54
				runtime = 93559
				phase = 0.449
				principal value = 0.987
			Task sequence number = 109
				runtime = 125899
				phase = 0.905
				principal value = 0.293
.....
.....
			Task sequence number = 288
				runtime = 97247
				phase = 0.392
				principal value = 0.943

Number of point pairs used = 199
Minimum runtime in data = 93351
Maximum runtime in data = 140024
Estimated peak runtime = 142087
Estimated average runtime = 111924
Estimated trough runtime = 94708
Estimated runtime variance = 0.333
Estimated error = 3.2 %

I don't really know how to preserve the indentation but I wanted to ask a question or two about the output. First of all, you said previously

Quote:
The algorithm performs determinations of A and B over all possible combinations of pairs of points ( two equations with two unknowns each time ). For instance, 8 points yields 28 pairwise estimates [ generally N*(N-1)/2 ].


and

Quote:
Actually the two-point solution can only be sensibly obtained if the points are within the same sine excursion. That's because of the absolute value in your equation, which then ruins analytic continuity ( ~ differentiability ) across any peak point. So the sequence numbers of all the given points are mapped into the first cycle [ zero to one period ] prior to pairwise analysis. So it's that image of the points ( 'principle value' ) which I'm really discussing. This is legitimate as we expect no difference in execution times between points with sequence numbers exactly one period apart.

I had not remembered these details until I went back searching for information about 'point pairs' and 'principal value'. I was confused to see in the final block of output that 199 point pairs were used when I had entered only 23 data points. I initially thought that a 'point pair' was a 'runtime, seq#' combination and that you were showing how many of these data pairs were left after any 'irregular' ones were discarded. Of course, going back and finding the first quote quickly sorted that out.

As for 'principal value', I fully understand the need to map higher seq#s to the first cycle. So, from the comments about the term given in the second quote, I imagined that it meant that 'higher order' seq#s would be mapped to seq#s between 0 - 120.4 for the 764.5 frequency. So the 'principal value' would simply be the equivalent 'base' seq#, ie a seq# of 288 would have a principal value of 47.2 ie 47. Obviously 'principal value' isn't the equivalent 'base' seq# but rather some function of it. What exactly is a 'principal value'?

I now intend to play with the tool using different and more extensive datasets so hopefully there will be more feedback to come.

Cheers,
Gary.

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5,354
Credit: 48,525,022,562
RAC: 57,648,534

If there are any people out

If there are any people out there using either Bikeman's awk script or my shell script which relies on the awk script, there is a small change you will need to make to the awk script as a result of the new >800 frequency tasks. This line from the script

match($0,"h1_[0-9]+\\\\.[0-9]+_S5R2__[0-9]+_S5R3[a-z]_[0-9]+"); needs to change to

match($0,"h1_[0-9]+\\\\.[0-9]+_S5R[23]__[0-9]+_S5R3[a-z]_[0-9]+"); This is needed because old data was referred to as S5R2 data and the new >800 data is referred to as S5R3 data.

Having got that out of the way, I'd like to report on some enhancements that I've made to my data gathering shell script. I wrote it this way because I don't really have any experience with other scripting languages and the unix shell seems to be able to handle what I wanted to do without too much of a re-learning curve. Obviously you need a unix (linux) machine to run this script.

My goals were to automate the process of data gathering for an unlimited number of machines and to create result files that could be dropped straight into Mike Hewson's RR. The data gathering process should be smart enough to handle things like frequency changes, app version changes, recognition of crunching of a single task by multiple app versions, etc, without requiring user intervention.

I've been testing my latest version of the script for a few days now and it seems to be working OK. It's gathering data for around 100 boxes and a random check on a few data files seems to show that everything is in order. For each successful task that a host completes, four CSV data values are recorded per line in the results file - Frequency, Sequence#, Runtime, App_Version.

Here are some of the features of the script

  • * Separate results files for each host which are appended to at a user settable interval with no limit on the number of hosts.
    * Automatic recording of both HostID and Hostname in a file called hostids every time you give a new host to the script. Future runs can simply reuse some or all stored hosts, or can add new ones.
    * Manual operation on just a couple of selected hosts if needed.
    * Each time the script runs, any data that has previously been gathered will not be duplicated in the ongoing results file.
    * The correct app version used for crunching will always be recorded even if the task is "branded" differently in your Boinc Manager list of tasks. If more than one app version was used for a single task the output will show 'mixed'.
    * All website task data for each host will be examined irrespective of how many pages this might involve (ie one or many).
    * The script will attempt to minimise the number of website pages consulted in updating the results file for each host.
    * A rudimentary progress indicator - very useful when auto collecting for 100 hosts :).

Here is the current version (unfortunately indenting is lost) of the script. It is extensively commented if anyone is actually trying to make sense of it, so that partly accounts for the size. Undoubtedly, better ways to do things may emerge so there's a good chance it might shrink in future. I'll find a place to host this so you will get the full indentation if you wait a bit. I'll also post a separate message with proper instructions for use. The only thing you will need is access to a unix/linux machine that understands bash, awk, sed, grep, find, sort, paste, and a few other standard unix utilities.

#!/bin/sh
#
# grabdata.sh - Version 3.0
#
# Script to retrieve successful task data from the Einstein@Home online database for
# one or many hostIDs (either entered singly or read from a file) and to parse it to
# produce a tabulated set of information (frequency, seq#, cpu_time, app_version) as
# a CSV list for each host of interest.
#
# Hosts entered manually may be added automatically to the HostIDs file for future
# reuse.  Extracted results data are added to individual host results files to provide
# an ongoing, updating log of host results statistics.
#
# This script relies on an awk script written by Bikeman.
#
echo
echo "Program to grab stats for successful EAH tasks and write them to a results file."
echo "Data is collected from the EAH website for hosts of interest, using the HostID."
echo "The IDs can be entered manually or read from an existing file named hostids."
echo "HostIDs manually entered will be added to the hostids file if not already there."
echo "The lines in this file contain the HostID and a HostName separated by a space."
hidfile=hostids
mindays=+2
echo
echo "Integer days of enforced wait from previous update before a new one is allowed."
echo -n "(Signed ints like +0 +1 +2 - see -mtime flag on 'find' manpage) (default=+2) : "
read ans
echo
if [ "X$ans" != "X" ]
then
	mindays=$ans
fi
#
# Find all results files older than $mindays & store in $hids - ignore newer files
#
hids=`find results.* -mtime $mindays | sed 's/^results\\.//'`
#
# If there is no file of HostIDs (ie 1st time run) create a new one
#
if [ ! -f  $hidfile ]
then
	touch $hidfile
fi
#
# See if we are doing an auto update of all out-of-date host or perhaps a manual run
#
echo
echo -n "Using hosts from hostids file? (y or n - use  if entering manually) : "
read ans
echo
if [ "$ans" == "y" ]
then
	if [ "`wc -l $hidfile`" == "0" ]
#
# Can't autorun if there are no hosts filed.  Suggest manual host entry
#
	then
		echo "HostIDs file $hidfile does not contain hosts - use manual entry ..."
		echo
		exit
	else
		echo "Any outdated results for hosts in $hidfile file will be retrieved."
		echo
	fi
else
#
# Manual run.  Collect HostIDs.  Check if HostID is on the list of out-of-date results
# Only accept it if it is.  If it's not, check if we have a results file for it.
# Offer to update HostIDs file if we don't already have this host filed.
#
	mhids=""
	allhids=`cat $hidfile | sed 's/\\ \\ .*//'`
	while true
	do
		echo -n "Next hostID to use for results (  only when finished ) : "
		read hid
		if [ "X$hid" != "X" ]
		then
			inhids=n
			for i in $hids
			do
				if [ "$hid" == "$i" ]
				then
					inhids=y
					mhids="$mhids$hid "
					break
				fi
			done
			if [ "$inhids" == "n" ]
			then
				if [ -f results.$hid ]
				then
					inhids=r
					echo "HostID $hid was recently updated -- ignoring this HostID ..."
				else
					mhids="$mhids$hid "
				fi
				inallhids=n
				for j in $allhids
				do
					if [ "$j" == "$hid" ]
					then
					  inallhids=y
					  break
					fi
				done
				if [ "$inallhids" == "n" ]
				then
					echo -n "Hostname for updating HostIDs file $hidfile ( = Dont update) : "
					read host
					if [ "X$host" != "X" ]
					then
						echo "$hid $host" >> $hidfile
					fi
				else
					echo "HostID $hid already in $hidfile - no need to update that file ..."
				fi
			else
				echo "HostID $hid already in $hidfile - no need to update that file ..."
			fi
		else
			break
		fi
	done
	echo "Manually entered HostIDs requiring an update are :-"
	echo "$mhids"
	echo
	echo -n "If these are OK hit  to continue or q to quit : "
	read tmp
	if [ "X$tmp" != "X" ]
	then
		exit
	else
		hids=$mhids
	fi
fi
#
# Variable $hids now contains just those hosts (either auto-determined or manually
# entered) that require updating - if there are any.
#
if [ "X$hids" == "X" ]
then
	echo "There are no out-of-date results files that need updating -- exiting ... "
	echo
	exit
fi
#
# Grab the data from the website for each valid host
#
for hid in $hids
do
#
# Get the first page of (up to 20) results   Test if there is a "Next" page
#
	curl -s "http://einstein.phys.uwm.edu/results.php?hostid=$hid" > results.curl
	while true
	do
		tmp=`grep Next results.curl | tail -1 | sed -e 's/^.*offset.//' -e 's/.Next.*//'`
		if [ "X$tmp" != "X" ]
		then
#
# There is a next page - keep grabbing until no more
#
			while true
			do
				curl -s "http://einstein.phys.uwm.edu/results.php?hostid=$hid&offset=$tmp" > results.ext
				tmp1=`grep Next results.ext | tail -1 | sed -e 's/^.*offset.//' -e 's/.Next.*//'`
				cat results.ext >> results.curl
				tmp=$tmp1
				if [ "X$tmp1" == "X" ]
				then
					break 2
				fi
			done
		else
			break
		fi
	done
#
# All grabbed pages for a host have been concatenated.  Pass through Bikeman's awk script
# Massage to create a CSV list of Freq,Seq#,Runtime and a list of task IDs.  Use the TIDs
# to grab the page with stderr.out so that the app version can be obtained.
#
	awk -f parser.awk results.curl > results.raw
	cut -c-9 results.raw > results.tid
	tids=`cat results.tid`
	cut -c10- results.raw > results.cut
	sed -e s/\\ /,/g -e /,[0-9][0-9],/s/,/,0/ -e /,[0-9],/s/,/,00/ -e s/...$// results.cut > results.tmp
	paste -d, results.tid results.tmp | sed s/\\ // > results.csv
#
# Progress indicator.  Each dot means another set of raw host results has been parsed
#
	echo -n .
	for tid in $tids
	do
#
# For each TID obtained from the website, check to see if those details are already recorded
# in the results file,  Regard a match of seq# and runtime as sufficient to prove a match.
# Do not grab the page with stderr.out if we already have the data recorded
#
		runtime=`grep $tid results.csv | sed 's/^.*,//'`
		seqno=`grep $tid results.csv | cut -c18- | sed s/,.*//`
		tmp=""
		if [ -f results.$hid ]
		then
			tmp=`grep $runtime results.$hid | grep $seqno`
		fi
		if [ "X$tmp" == "X" ]
		then
#
# No match for seq# and runtime for this TID so we need to grab the page.
#
			freq=`grep $tid results.csv | sed -e 's/,[0-9][0-9][0-9],.*//' -e 's/^.*,//'`
			ver=`curl -s "http://einstein.phys.uwm.edu/result.php?resultid=$tid" | grep einstein_S5R3 | sed -e 's/^.*S5R3_//' -e 's/_[iwp][6io].*//'`
			numv=`echo $ver | wc -w`
#
# Check if more than one app version was used to crunch the data - if so record as 'mixed'
#
			if [ $numv != 1 ]
			then
				flag=1
				ver1=`echo $ver | sed 's/\\ .*//'`
				for i in $ver
				do
					if [ $i != $ver1 ]
					then
						flag=2
						break
					fi
				done
				if [ $flag == 1 ]
				then
					ver=$ver1
				else
					ver=mixed
				fi
			fi
#
# Assemble a line of new data and store it in a temporary results file
#
			echo $freq,$seqno,$runtime,$ver >> results.new
		fi
	done
#
# Progress indicator.  Each plus means another host's new results have been assembled.
#
	echo -n +
	if [ ! -f results.new ]
	then
#
# There are actually no new results at this time so create an empty file as a placeholder
#
		touch results.new
	fi
	if [ -f results.$hid ]
	then
#
# If there are existing results for this host add in the new ones.  For no existing results
# any new ones will become the future existing ones.  Clean up any temp files
#
		mv results.$hid results.sav
		cat results.sav results.new | sort > results.$hid
		rm -f results.sav results.new
	else
		mv results.new results.$hid
	fi
	rm -f results.raw results.c* results.t* results.ext
done
echo

This script has been tested a few times but I wouldn't call it "extensive" by any means. During testing, I would often think of new features that would be nice so some of the more recent additions haven't been tested much at all. I'll be very surprised if there aren't at least a few logic bugs and probably more logic deficiencies. The most recent test involving about 100 hosts took less than 10 minutes to update the results files of all hosts. Most hosts had an average of about 10-15 results listed on the website, with just a few going past 20.

I'll be interested to see if anyone is kind (?foolish?) enough to give it a trial :). There is one known deficiency which I've just remembered and will fix before I host it somewhere. The script only understands results crunched on Windows or Linux. I've got to look at what other platforms say about the app version and then make a few minor adjustments to suit.

EDIT:
A very small change has been made to the above script to allow it to handle results for MacOS PPC. Windows, Linux and MacOS X Intel were already correctly handled.

To actually trial the script, all you need to do is create a directory and in it place this script and Bikeman's awk script with the latest mod mentioned at the start of this post. Make the script executable and from a console window (I use an xterm) 'cd' to the directory and just run the script. It will ask you for whatever it needs.

I don't anticipate having much time to add new features but I'm certainly interested in bug reports and particularly in suggestions of smarter ways to do things. I'm quite limited in my knowledge of tools outside of what was available in unix 15-25 years ago :).

Cheers,
Gary.

Mike Hewson
Mike Hewson
Moderator
Joined: 1 Dec 05
Posts: 6,143
Credit: 129,218,220
RAC: 25,772

RE: Wow!! Nice

Message 77787 in response to message 77785

Quote:
Wow!! Nice work!!


Thanks! Actually I meant to openly acknowledge this guy who wrote the library! :-)
( although it's in the page source code too, as per GNU Lesser GPL )

Quote:
In doing that, points that are perhaps a little suspect seem to show up very clearly.


Indeed they do stand out nicely, as the algorithm is 'half-dumb' with respect to such outliers. :-)

Quote:
As for 'principal value' .... What exactly is a 'principal value'?


It's the value that a 'standard' sine function takes on - height along the vertical axis - for a given phase, the argument along the horizontal axis. I normalise it to the range [0,1].

[ this is a math terminology as : while each angle has a given sine(angle), a given sine corresponds to infinitely many angles. 'Principal value' implies that you have chosen a limited domain ( x-axis interval ). Various purposes ..... ]

A single point is defined by sequence number & runtime - a pair of numbers - in the usual x & y co-ordinate sense. Point pairs are two points grouped together ( 2 x 2 = 4 numbers now ) for examination in order to attempt deduction of what sine curve they may belong too.

The algorithm extracts all possible pairs of points for a given frequency, works out the parameters of the particular sine curve for each ( with some pairs exempted from later analysis - complex ). Those curve parameters - pairs of specific peak and variance values each representing a candidate sinusoid for a given point pair - are a collection which is then subject to measurements of center ( average ) and spread ( standard deviation ). I form a 'Mr Average Sine' from that and the rest follows. I've been kludging/guessing a bit with the choice of some of the algorithm's parameters - but hey, this is applied maths! :-)

The algorithm's weak aspects are evident with :

(1) too few points

(2) points too close in principal value ( denominator stuff )

(3) the given point set is whacko and doesn't actually reflect sinusoidal behaviour

(4) outliers

I cull and/or refuse to estimate those scenarios ....

So for a given particular sequence value you map it back to the equivalent sequence number for the first cycle ( if it's not already there ), work out how far along the cycle it is as a fraction - phase equals sequence divided by period ( PI multiplies in as well - 'because' ). Finally take the sine of that value which is labelled as the principal value. So the peaks will have principal value of zero, the troughs have principal value of one, and the average has principal value ~ 0.63. The problem lends itself to this approach, and your earlier comment about phase gave me the epiphany of normalising! :-)

Quote:
I now intend to play with the tool using different and more extensive datasets so hopefully there will be more feedback to come.

Looking forward to it! We can visualise/compare the detail now just by posting/passing CSV blocks and plugging them into RR ..... :-)

Cheers, Mike.

I have made this letter longer than usual because I lack the time to make it shorter. Blaise Pascal

archae86
archae86
Joined: 6 Dec 05
Posts: 2,930
Credit: 3,708,204,050
RAC: 4,724,087

RE: Here is the current

Message 77788 in response to message 77786

Quote:

Here is the current version (unfortunately indenting is lost) of the script.


For those who want to see your indenting as you wrote it, this dodge still works:

Click the "reply" button for the message containing the code of interest.

Scroll through the quoted material in the message formatting box to find the part you want, then just select and copy that.

Then don't click the "post reply" button.

I stumbled on this many months ago when some of us were trying to help each other with ap_info file issues. Those are really harder to read with the indenting gone.

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5,354
Credit: 48,525,022,562
RAC: 57,648,534

RE: RE: Here is the

Message 77789 in response to message 77788

Quote:
Quote:

Here is the current version (unfortunately indenting is lost) of the script.

For those who want to see your indenting as you wrote it, this dodge still works:

Thanks very much for the tip. I've just tested it out and it works fine. The tabs are all back where they should be. I had the tab interval set to 4 instead of the default 8 since in some places there were about 6 or 7 levels of nesting :).

Now that you mention the trick again, I think I did see you post it previously somewhere a while ago ...

Thanks again.

Cheers,
Gary.

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5,354
Credit: 48,525,022,562
RAC: 57,648,534

RE: RE: As for 'principal

Message 77790 in response to message 77787

Quote:
Quote:
As for 'principal value' .... What exactly is a 'principal value'?

It's the value that a 'standard' sine function takes on ...

Thanks for all the details as I'm sure others will now appreciate more fully, just how your tool works. For my purposes "Principal Value = |sin(phase)|" would have done nicely :). It's obvious now but I couldn't see it at the time I asked.

Also, whilst it's nice to see all the cycles in the data (which gives a wide graph), it might also be useful to see just one period. Would it be possible to have a tick box or similar that would "compress" the multiple periods of a normal plot into just one period? Using the "Next"/"Prev" buttons for seq# would then move to the appropriate data point and show the true seq# in the numeric display but the plotted point would be at its converted position in the single period. This would effectively allow the relationship of the actual data points to the model line to be seen more accurately, I think.

Quote:
Quote:
I now intend to play with the tool using different and more extensive datasets so hopefully there will be more feedback to come.

Looking forward to it! We can visualise/compare the detail now just by posting/passing CSV blocks and plugging them into RR ..... :-)

Yes, indeed. I have around 100 data files accumulating so as I start looking at them and find anything interesting, I'll just pass you the CSV data. Too easy!!

Cheers,
Gary.

Mike Hewson
Mike Hewson
Moderator
Joined: 1 Dec 05
Posts: 6,143
Credit: 129,218,220
RAC: 25,772

RE: Also, whilst it's nice

Message 77791 in response to message 77790

Quote:
Also, whilst it's nice to see all the cycles in the data (which gives a wide graph), it might also be useful to see just one period. Would it be possible to have a tick box or similar that would "compress" the multiple periods of a normal plot into just one period? Using the "Next"/"Prev" buttons for seq# would then move to the appropriate data point and show the true seq# in the numeric display but the plotted point would be at its converted position in the single period. This would effectively allow the relationship of the actual data points to the model line to be seen more accurately, I think.


Quite right! :-)

I'll bung in a button ( now talking RR_V7A already! ) to swap between the two view types. It's a quick-ish redraw using the first-cycle mapping ....

Other possibles:

- I'm also pondering/reviewing the estimates aspect - I'd mentioned earlier somewhere of going to a median measure to reduce outlier sensitivity. Needs thought ...

- prissy changes to colors etc, more 'web safe' for want of a better phrase.

- put in the brief/verbose reporting selection that I promised.

- 'simple' editing of headings for the plots.

- print the plots by themselves. Alas this is actually quite non-trivial if you want to remain cross-platform!

- a look at inter-sequence variability to examine that cliff/ledges/step/0.45 business noted earlier elsewhere.

Cheers, Mike.

( edit ) Here's an example ( thanks to archae86 ) of where the analysis wobbles off, by the points straying from ~ sinusoidal pattern :

368.35,75,31944
368.35,74,31804
368.35,73,31783
368.35,72,31816
368.35,71,31658
368.35,70,31532
368.35,69,31304
368.35,68,32480
368.35,67,31637
368.35,66,32473
368.35,65,33310
368.35,64,34886
368.35,63,33528
368.35,62,34145
368.35,61,35229
368.35,60,34687
368.35,59,34699
368.35,58,35342
368.35,57,35567
368.35,56,37068
368.35,55,36548
368.35,54,36520
368.35,53,35622
368.35,52,34807
368.35,51,34080
368.35,50,33324
368.35,49,33094
368.35,48,32907
368.35,47,32655
368.35,46,32210
368.35,45,31440
368.35,44,31344
368.35,43,31493
368.35,42,31302
368.35,41,31090
368.35,40,30518
368.35,39,29759
368.35,38,29805
368.35,37,29231

What happened there? You may well ask .... :-)

I have made this letter longer than usual because I lack the time to make it shorter. Blaise Pascal

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5,354
Credit: 48,525,022,562
RAC: 57,648,534

RE: I'll bung in a button (

Message 77792 in response to message 77791

Quote:
I'll bung in a button ( now talking RR_V7A already! ) to swap between the two view types. It's a quick-ish redraw using the first-cycle mapping ....

Thanks very much!

Quote:

( edit ) Here's an example ( thanks to archae86 ) of where the analysis wobbles off, by the points straying from ~ sinusoidal pattern :

....

What happened there? You may well ask .... :-)

I haven't plotted the points but I can see exactly what you are referring to :).

Something happened probably during or around seq# 40 which gave a step improvement of close to 10% in crunching performance. Could it have been an app change to a faster app or could it have been hardware improvements like upping the overclock a bit :). Maybe Peter might have some thoughts on this.

Cheers,
Gary.

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5,354
Credit: 48,525,022,562
RAC: 57,648,534

As promised, here are the two

As promised, here are the two files required to get automatic data collection happening on Linux for any hosts you may have an interest in.

Firstly the shell script and secondly the Bikeman awk script that the shell script uses.

INSTRUCTIONS
============

Create a new working directory of your choice (mine is /home/gary/EAH_Results) and place the two files there. They should be owned by you. I gave mine 755 permissions, but 644 would be OK if you feed the shell script to sh. Change to the working directory and execute the shell script. Here are the questions it will ask:-

  • * How many days of enforced wait to set? The idea here is to be kind to the servers and only allow results files older than a certain number of days to be updated. The default is +2 which means that an existing results file needs to be greater than 2 days old to be allowed to be updated. This question allows you to override the default. On your first run there are no existing results files so just accept the default.
    * Do you want to use stored HostIDs? On your first run you will not have any stored HostIDs so selecting 'y' will draw a complaint. The question tells you to just hit for manual hosts entry. This is what you need for the first run.
    * You will then be prompted for a HostID (write them down from the website before you start) and then you will be asked for a HostName. Both will be stored in a file called 'hostids' and the idea of the HostName is to make the file human readable if you are monitoring many hosts. This is fine if they are your own hosts but you will need to 'invent' a suitable name if you were monitoring an otherwise unidentified host. If you don't give a HostName the script will assume that this is a 'throw away' or 'once-off' data collection and it won't pollute your hostids file which is really intended for your own hosts. As you enter HostIDs, the script will check to see if any already have recently collected data, more recent than your enforced wait and will drop any that it finds. On a first run none will be dropped. If any do get dropped, you would notice this at the next stage.
    * After you finish HostID entry with a null entry, the script will present you with a list of those for which data will be collected and will ask for permission to start that process. You can bail out at this point if you change your mind.
    * Data collection can take a little while and a rudimentary progress meter will be constructed using decimal points and plus signs. A pair of these represents a completely finished collection for a single host. When all host data has been collected the program will simply exit.

After a first run you will find a complete hostids file ready for subsequent use and a series of results files - results.nnnnnn - where nnnnnn is the HostId you entered. The results files are in CSV (comma separated variables) format and there should be four items per line - Frequency, Sequence No, Runtime, App_version. If more than one app version was used to crunch a task the value will be recorded as 'mixed'. If you rerun the program every so often, new tasks that your host has crunched in the meantime will be added to your saved results. The program should be smart enough to avoid any duplication.

The results files are designed so that they can be simply pasted into the data entry area of Mike Hewson's RR_V6A which was announced in this thread quite recently.

At this point there are few (if any) sanity checks in the program. Please be careful when you enter a HostID and please think about the servers when deciding which hosts you need to monitor. Select your hosts carefully and don't force unnecessary repetitive updates just to see if any new task may have completed.

If you have any questions or comments, please fire away.

Cheers,
Gary.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.