I've just run RR_V4 with a new set of data and it looks very good. In particular, the summary page of inputs and outputs is just what I need. I can print the new window to a single page and so preserve the calculations for later comparison with runs on other datasets. Since I have many similar boxes to compare, I'd also like to input a few (optional) details about the particular host that could be output as part of the header on the summary page, things like
* hostname / hostid
* hardware string - eg Dual PIII 1400/1024/36.4
* OS details
* BOINC version
* Science App version
Well, I can create a text area so that you can input whatever takes your fancy prior to summary page production. I didn't buttonise 'print' [ no great loss ] as it turns out there is some MS dependence in that. ( I am determined to remain platform neutral )
Quote:
I thought that it might be easy for the Inputs and Outputs Summary button to lead to an intermediate data input screen with a "continue" button at the bottom. The input values could be wholly or partially left blank and so users who didn't need the info wouldn't be inconvenienced. Any data that was supplied would simply appear immediately below the header on the summary page.
Hmmmm .... I could provide check boxes next to each item, for the user to tick what they want to be included in the summary page?
Quote:
Also, for consistency in terminology with the latest versions of BOINC, (ie "results" are now "tasks") perhaps the header should say "S5R3 TASK CRUNCH TIME ..." but this isn't important and I'm NOT nitpicking :).
Whoops, I've struck twice on that. All references to 'work unit' are now task'ed ..... :-)
Quote:
Also, I've noticed that I'm mainly interested in step 5 so what I do is enter a 9th data point at step 4 on the way through. This is usually the next sequence number after the 8 I'm planning on using in step 5. That way, once step 5 has been completed and I've transferred the estimates to step 6, the estimated runtime for the next sequence number will be immediately showing as an output in step 6. Quite often (in fact always if I gather more new data) I would like to add a 10th, 11th, ... data point so rather than scrolling back to step 4 to change the sequence number would it be possible to put an input box for sequence number immediately after the A & B inputs in step 6, thanks?
Ha! Great minds think alike eh? ( or fools never .... ) :-)
I was indeed thinking of swapping Step 4 and 5 entirely to obviate the reverse scroll. They are logically independent steps anyway.
Quote:
YES, YES, PLEASE, PLEASE!!
Yes, I thought that would tickle.... :-)
Quote:
This is probably the best order since data grabbers (eg Bikeman's awk script) will give that order naturally when parsing each line of collected data.
Precisely. In fact one could leave the column headings in at the first line of the CSV ( as they are wont to appear ), because I will input validate using regular expressions ( so a line will be float/COMMA/integer/COMMA/float ) and ignore lines thus breaching ( + report rejections to user ). This area is not hard to vary, just create a different regEx pattern as needed, and plonk in a line to test it, within the loop that sucks in each line one at a time.
Quote:
Here is another thought for consideration. As mentioned recently by Peter, it looks like (for a given platform anyway) crunch times may be relatively frequency independent. It might be interesting therefore to augment frequency and sequence with a phase value. People doing plots could then simply plot crunch time against phase and use data from the same host even though that host might be (like now) receiving a variety of different frequency work. They could use different coloured points for different frequencies and when all were superimposed, the vertical scatter (or lack of it) would give an indication of the effect of frequency.
Brilliant!. I was actually angst'ing over how to cope with the curve fitting without having to assume anything about A and B's frequency dependence ( if any ). Phase .... of course ... not only pretty pictures but subject to stats testing.
Quote:
So, the thought is really that your step 5 screen could contain three inputs per line, frequency, sequence and runtime and the summary page could show 4 columns of output, frequency, sequence, phase and runtime.
Yup, yup and yup .... :-)
Quote:
As always, these are thoughts and not commands :).
Yes ... master :-))
Quote:
Now for some more data for you. I tested RR_V4 with a dual PIII 1Gig server running Linux with the 4.27 app. I transitioned this server quite early and so have plenty of 4.27 data at constant frequency, 764.50. In the list below I've included all the available dataset (including 4.14 and "mixed" data) and marked those that were used for the 8 data points with a relevant DP# (ie Data Point Number). If a data point wasn't used I've left this last column blank. I've also included the next tasks in the cache that haven't completed yet. The hostid of this machine is 946535.
Hope this is of some use to you. I'll send the last two values for the list above once they have crunched.
Thanks indeedy! I'm also looking at archae86's ( thanks! ) data for four machines he kindly sent to me too.
Hypothesis: It seems that dual core machines ( that are enabled to be used as such for E@H ) will tend to be more likely to receive sequence numbers closer together ( eg. 287-288, 298-299 ). If each core is given a task/thread, and they start and finish ~ in sync, then they'll be served with fresh work next to each other in line come update/report time. Indeed one of his machines had dealt with an unbroken sequence from 37 to 75 inclusive at a single frequency! [ This has relevance for the RR curve fitting algorithm with respect to rejection of case pairs with similiar sin(phase), and the small difference in sines in the denominator as mentioned earlier ]
Cheers, Mike.
I have made this letter longer than usual because I lack the time to make it shorter ...
... and my other CPU is a Ryzen 5950X :-) Blaise Pascal
Nice shell script, Gary! However if your workunit cache is so big that it covers the whole (or most) of the first result page, this won't work, I guess you also have to consider following pages (using the offset URL parameter) to be on the safe side.
I decided to make the script handle multiple pages of results so here is version 2. I just tested it for a host with >20 results and it got them all successfully. Please realise that it should get all the data for as many pages as there are but I haven't given it much of a test with multiple hostids yet. So if it screws up you will just have to yell at me :).
It uses the second version of Bikeman's awk script and so will not collect results that errored out for any reason. I also decided to trim the output of the awk script so that there are just three columns in the output results - frequency, sequence# and crunch time - so that the format is compatible with what Mike Hewson's RR_V5 will need for data input. My shell scripting skills are very rusty so I don't claim any elegance here - just a quick and dirty hack that seems to work. Sure beats retrieving and updating manually.
BTW, the script is supposed to append new data to previously harvested stuff and that hasn't been tested at all yet. I'll run it again when new entries appear on the website to make sure they get added to the file nicely.
There are no sanity checks for the hostid that you are asked to input so if you feed it rubbish you'll get rubbish back. I might add some error checking and recovery at a later stage.
#!/bin/sh
#
# Script to retrieve all the online database results for one or many hostIDs and to
# parse the output to produce a unique set of information for each host of interest.
#
while true
do
echo -n "Next hostID to collect results for (Hit to exit)? : "
read hid
if [ "X$hid" == "X" ]
then
exit 0
else
curl -s "http://einstein.phys.uwm.edu/results.php?hostid=$hid" > results.raw
fi
while true
do
tmp=`cat results.raw | tail -15 | grep Next | sed s/^.*offset.// | sed s/.Next.*//`
if [ X$tmp != X ]
then
curl -s "http://einstein.phys.uwm.edu/results.php?hostid=$hid&offset=$tmp" >> results.raw
else
break
fi
done
cat results.raw | awk -f parser.awk | cut -c10- > results.tmp
if [ -f results.$hid ]
then
mv results.$hid results.sav
cat results.sav results.tmp | sort | uniq > results.$hid
rm -f results.sav results.tmp
else
mv results.tmp results.$hid
fi
rm -f results.raw
done
This is designed to run in a directory of your choosing on a Linux system. You need the above code (called whatever you like - eg grabdata.sh) together with Bikeman's awk script (2nd version) both in the chosen directory. If you don't make the shell script executable, you can simply feed it to a shell, eg
sh grabdata.sh
and (when it asks) feed it as many hostids (one at a time) as you want to collect data for. If you do set the execute bit then you can run it directly from a command prompt.
The results will appear in a file called "results.$hostid" so you can easily recognise the different ones you collect. You never delete these files as they are appended to each time you run the script and give it that particular hostid. This way you will keep records long after the results disappear from the online database. If these records are important to you then back them up.
Features most of the requests received, I think. :-)
Many thanks in particular to archae86 who provided me with a swag of data from his machines, to test with! :-)
Thanks to Gary for the 'phase' hint! :-)
This is the 'heavy' version which takes in bulk wads of CSV data, by pasting into a text area. Analysis output is into an editable text area which is then used in summary generation via spawned pages. The prediction algorithm is basically unchanged, but can now be applied to an unlimited number of data points from whatever sets you choose. Your runtime system will determine behaviour limits ( ie. dynamic arrays used ) - there are none pre-programmed!
Kindly note it is the user's onus to assign significance to estimates - in particular whether or not to assume any dependence of the peak runtime and variance upon sky search frequency. Just drop the appropriate data points in and out of the analysis to suit. Indeed I'm rather curious of whether they allow good cross frequency predictions!
You may also need to review your browser's settings for "user data persistence in forms", pop-up blocking, window reuse/recycling, script settings/permissions, and the like.
Please let me know if you think it is vomiting, has a fever, is delirious, drunk and/or disorderly - for any reasons what-so-ever :-)
Thoughts for RR_V6:
#1 For Step 4 - controls to vary analysis output format as in: number precision levels, inclusion/exclusion of particular items, 'brief' and 'verbose' presets etc...
#2 I've discovered a free ( GNU Lesser General Public License ) Javascript graphics library which looks very promising indeed! I feel I may be able to generate graphs of the style we have seen posted here - specifically showing curve fitting to data points against axes. These would display dynamically depending on selection of data subsets and their analysis. Suggestions are welcome, but within the limits of combining the usual types of 2D primitives [ pen/color/thickness/font/fill/rectangle/arc/ellipse/polygon/polyline ... etc ]. :-)
#3 Given that #2 above ~ 'requires', within the HTML, an inclusion path to a separate *.js file ( ~ 24K on local disk & within same directory/folder ), then I might package files into a zip/RAR ( compacts to ~ 20K ) that could unwrap ( maybe self-extract? ) prior to use. What would be suitable across all likely platforms?
I will also do a minor upgrade of RR_V4 to become RR_V5B, now the 'lite' version. I've split into the two types to stop the code getting too rowdy ... :-)
[ It'd be a real shame if the E@H tasks lost their cyclic frequency dependence, eh? I wouldn't be writing sort algorithms on key/value pairs in associative arrays for starters ... :-) ]
Cheers, Mike.
I have made this letter longer than usual because I lack the time to make it shorter ...
... and my other CPU is a Ryzen 5950X :-) Blaise Pascal
[ It'd be a real shame if the E@H tasks lost their cyclic frequency dependence, eh? I wouldn't be writing sort algorithms on key/value pairs in associative arrays for starters ... :-) ]
A real shame indeed :). - You seem to be loving every minute of it :).
I'm playing with RR_V5A at the moment and I'll give a full report in due course. I'm also playing with my data gathering script - I'm keeping results for about 80 machines now and sucking in an update for each results file about every second day or so. The current version of the script seems to be working well - it has quite a few extra features with more to come - which is the main reason for this post.
I cut and pasted some data (30 lines) from one of my results files into RR_V5 and that worked fine. I printed out the summary page which also worked fine. There are some things in the output I'll need to report on when I get time. In cutting the input data, I needed to ensure that only one version of the science app was involved. I could do that because I had manually recorded when the changeover had occurred. A bit tedious for 80 hosts so I'd only done that for a few of interest. Also more tedious in the future if Bernd brings out a spate of new betas.
So one of the new features in my script is to grab the version of the app used to crunch the task. It's smart enough to recognise "mixed app" tasks as well. The app version info will now be recorded as a 4th field in my results files. It would be very handy if RR would at least not be too upset and perhaps just ignore the 4th field, or better still parse the field and separate the output into blocks based on app version. That way when Bernd announces a new beta with supposed speed improvements, a machine or two can be immediately switched and once a few results are in RR should give a nice verdict without any manual intervention and fiddling with the data gathering process :).
Here is an example of the new data format showing the transition from 4.21 to 4.27.
So one of the new features in my script is to grab the version of the app used to crunch the task. It's smart enough to recognise "mixed app" tasks as well.
Good stuff. Which representation of the version does your script snag? Since you recognize (sorry for the "z", but I'm from the USA) mixed aps, I assume you are finding something in stderr out?
That would be good. I've been using BOINCView logs, which use the simplistic version information created at download time, so at every version change, the entire work queue is mislabeled, not just the mixing problem of results in process.
... I assume you are finding something in stderr out?
Bikeman's awk script collects the taskID as well as the performance info. Previously I'd just be throwing that field away. Now, before throwing it away, I'm using it to grab the task details page (which contains stderr.out) and pass it through grep to find lines containing the string "einstein_S5R3_" and then stripping out everything bar the actual app version.
I then test what's left to see how many different versions were used. Most of the time there's just one, although that same value gets reinserted in stderr.out every time the app is restarted. It's possible that a task could be started with one version, continued with a second and finished with a third so I just record those results, where more than one version is found, as "mixed".
This all means that I don't have to be concerned about the version change or how results are "branded" in the cache of work on hand. Version changes (like the official Windows app becoming 4.26) can happen at short notice when one is not paying attention so I feel a bit "safer" now :).
A real shame indeed :). - You seem to be loving every minute of it :).
Yup, like a pig in the proverbial ..... and I'm certainly happy to continue :-)
[ Careful spotters will note than some portions of development can still be utilised even if the cyclic behaviour is lost .... ]
Quote:
I'm playing with RR_V5A at the moment and I'll give a full report in due course. I'm also playing with my data gathering script - I'm keeping results for about 80 machines now and sucking in an update for each results file about every second day or so. The current version of the script seems to be working well - it has quite a few extra features with more to come - which is the main reason for this post.
I cut and pasted some data (30 lines) from one of my results files into RR_V5 and that worked fine. I printed out the summary page which also worked fine. There are some things in the output I'll need to report on when I get time. In cutting the input data, I needed to ensure that only one version of the science app was involved. I could do that because I had manually recorded when the changeover had occurred. A bit tedious for 80 hosts so I'd only done that for a few of interest. Also more tedious in the future if Bernd brings out a spate of new betas.
So one of the new features in my script is to grab the version of the app used to crunch the task. It's smart enough to recognise "mixed app" tasks as well. The app version info will now be recorded as a 4th field in my results files. It would be very handy if RR would at least not be too upset and perhaps just ignore the 4th field, or better still parse the field and separate the output into blocks based on app version. That way when Bernd announces a new beta with supposed speed improvements, a machine or two can be immediately switched and once a few results are in RR should give a nice verdict without any manual intervention and fiddling with the data gathering process :).
Actually, that won't be too difficult! I've been writing 'loose'/'dynamic' components ( quasi object-oriented ) in expectation, so type of this stuff can wedged in without much ado. [ For the savvy : I'm using multi-dimensional, variable length, associative arrays - currently flowering in a 5D phase space near you ]. I guess all we need is some agreement on the overall approach. Best to stick with CSV as it is common & real easy to parse.
There's a host of possibilities here. Probably the simplest, easiest and most versatile approach ( future proof ? ) is to:
- have variable ordering of fields ( ie. get the algorithm to do the work ).
- no programmatic upper limit on number of fields.
- the first line in the file is the 'heading line' which declares/specifies the data naming and their order.
- reserve the field names FREQUENCY, SEQUENCE, and RUNTIME for use as per status quo. Require these to be present.
- user can invent their own field names by mentioning them in the first/heading line. The name will propagate as a string through to reports.
- obviously it is the user's onus to ensure subsequent lines contain the data ( and it's order ) that reflects the meaning declared on the heading line. [ Punt any lines in their entirety into the bit bucket upon the breach, for instance if too few data elements provided with respect to field declarations ]
- assume sorting upon the fields named FREQUENCY and SEQUENCE.
- a user field can be sorted upon if it's declared name is appended with ( say ) an '*S', as in APP_VERSION*S.
- a sortable field must have a self evident ordinal property ( numbers are fine, alphabetic strings are fine, mixtures not so good but do-able ). Assume ascending sort default, but use '*SD' to indicate 'sort descending'.
- for all thus sortable fields ( FREQUENCY, SEQUENCE +/- user's ) the order of their mention in heading line determines the sort heirarchy in reporting. If you give it a "silly" ordering that's what you'll get!
- for any non-sorted fields ( ie. name mentioned without the appended '*S' ) these get included for reporting within the lowest sort level after the currently reported stuff ( phase et al ).
ALERT! I've uncovered an error in some Javascript implementations, with regard to parsing of strings that represent numeric values if they have leading zeroes. Subsequently analysis will veer off with weirdness!
I've corrected this problem with a workaround, so please re-download at your earliest opportunity. You may have leading zeroes with this update.
[ NB. Errr Gary, your data has said leading zeroes - which is how I found out! ]
Cheers, Mike.
I have made this letter longer than usual because I lack the time to make it shorter ...
... and my other CPU is a Ryzen 5950X :-) Blaise Pascal
NB. Errr Gary, your data has said leading zeroes - which is how I found out!
I'm sorry they gave you strife but they were deliberately put in there for a couple of (what I consider :).) good reasons.
Firstly human readability. If you are planning to scan a results file looking for anomalies, having all the commas line up is very helpful.
Secondly sorting of lines. Sequence numbers 1, 1x, 1xx all "sort" (when no flags are used) together and before any starting with 2-9, no matter how many digits. I guess there is probably a flag to the sort utility that fixes this so I'll go read the manpage :).
If it's helpful, I'll take out the leading zeroes and find a more elegant solution to getting them to sort correctly. What would happen if they were leading spaces rather than leading zeroes? I do like to scan down neat columns of data :).
NB. Errr Gary, your data has said leading zeroes - which is how I found out!
I'm sorry they gave you strife but they were deliberately put in there for a couple of (what I consider :).) good reasons.
Firstly human readability. If you are planning to scan a results file looking for anomalies, having all the commas line up is very helpful.
Secondly sorting of lines. Sequence numbers 1, 1x, 1xx all "sort" (when no flags are used) together and before any starting with 2-9, no matter how many digits. I guess there is probably a flag to the sort utility that fixes this so I'll go read the manpage :).
If it's helpful, I'll take out the leading zeroes and find a more elegant solution to getting them to sort correctly. What would happen if they were leading spaces rather than leading zeroes? I do like to scan down neat columns of data :).
It's OK, it's fixed. I've stripped leading zeroes. Continue as you please! :-)
There's parseInt() and parseFloat() Javascript functions which you'd expect ought not to worry about any leading zeroes in the string affecting the interpreted value of the number it represents. However implementations are not uniform in their behaviour, or I've gravely misunderstood the functional semantics of what is meant by 'parse' ... :- )
As for spaces : I'll alter behaviour so that the fields within a line ( ie. as defined by a comma separator ) may have leading or trailing space. Other spaces within a field, not contiguous with a comma either side or line extremities, will fail that field and thus cause the line to be rejected. Thus:
[pre] stuff , next_one , last_one [/pre]
will be OK. But:
[pre] stuff , next one , last_one [/pre]
will fail.
[ As I will do this on a per-field basis then trimming at the line beginning and end will come out in the wash .... ]
Cheers, Mike.
( edit ) Changes made, altered version at same link, & spaces will now behave as described here.
( edit ) Oh, and 'space' means 'whitespace' as a generic idea. This includes tabulations et al, as well as ' ', in contiguous combinations.
( edit ) I failed to mention earlier that: if/when one sorts into a hierarchy as per some 'heading' specification given by the user, then that would imply the generated report would contain summary sections at each level reflecting that hierarchy choice. This means that if your lowest sort level is SEQUENCE then the level that immediately bounds/surrounds it will include a summary ( averages etc ) over all SEQUENCE settings. So if say SEQUENCE was then enclosed by FREQUENCY, then for each FREQUENCY value a summary is generated for all SEQUENCE values at that FREQUENCY value. Now if in turn FREQUENCY is enclosed by APP_VERSION say, then for each given value of APP_VERSION a summary will be produced over all values of FREQUENCY enclosed by it ( using the SEQUENCE summaries from each ). That way the amalgamated/aggregated data/information flows upwards towards your first choice of sort parameter - but you can still look deeper/down the pyramid if you like, at secondary and tertiary levels. I'll shut up and slope off to bed now ... :-)
( edit ) Well I did misunderstand 'parse', actually. Leading zeroes imply an octal radix. Tis now fixed with an explicit decimal radix parameter, and I've also chosen a more robust cross-platform choice of whitespace trimming regEx. Sigh .... :-)
I have made this letter longer than usual because I lack the time to make it shorter ...
... and my other CPU is a Ryzen 5950X :-) Blaise Pascal
RE: I've just run RR_V4
)
Well, I can create a text area so that you can input whatever takes your fancy prior to summary page production. I didn't buttonise 'print' [ no great loss ] as it turns out there is some MS dependence in that. ( I am determined to remain platform neutral )
Hmmmm .... I could provide check boxes next to each item, for the user to tick what they want to be included in the summary page?
Whoops, I've struck twice on that. All references to 'work unit' are now task'ed ..... :-)
Ha! Great minds think alike eh? ( or fools never .... ) :-)
I was indeed thinking of swapping Step 4 and 5 entirely to obviate the reverse scroll. They are logically independent steps anyway.
Yes, I thought that would tickle.... :-)
Precisely. In fact one could leave the column headings in at the first line of the CSV ( as they are wont to appear ), because I will input validate using regular expressions ( so a line will be float/COMMA/integer/COMMA/float ) and ignore lines thus breaching ( + report rejections to user ). This area is not hard to vary, just create a different regEx pattern as needed, and plonk in a line to test it, within the loop that sucks in each line one at a time.
Brilliant!. I was actually angst'ing over how to cope with the curve fitting without having to assume anything about A and B's frequency dependence ( if any ). Phase .... of course ... not only pretty pictures but subject to stats testing.
Yup, yup and yup .... :-)
Yes ... master :-))
Thanks indeedy! I'm also looking at archae86's ( thanks! ) data for four machines he kindly sent to me too.
Hypothesis: It seems that dual core machines ( that are enabled to be used as such for E@H ) will tend to be more likely to receive sequence numbers closer together ( eg. 287-288, 298-299 ). If each core is given a task/thread, and they start and finish ~ in sync, then they'll be served with fresh work next to each other in line come update/report time. Indeed one of his machines had dealt with an unbroken sequence from 37 to 75 inclusive at a single frequency! [ This has relevance for the RR curve fitting algorithm with respect to rejection of case pairs with similiar sin(phase), and the small difference in sines in the denominator as mentioned earlier ]
Cheers, Mike.
I have made this letter longer than usual because I lack the time to make it shorter ...
... and my other CPU is a Ryzen 5950X :-) Blaise Pascal
Bikeman
)
Bikeman wrote:
I decided to make the script handle multiple pages of results so here is version 2. I just tested it for a host with >20 results and it got them all successfully. Please realise that it should get all the data for as many pages as there are but I haven't given it much of a test with multiple hostids yet. So if it screws up you will just have to yell at me :).
It uses the second version of Bikeman's awk script and so will not collect results that errored out for any reason. I also decided to trim the output of the awk script so that there are just three columns in the output results - frequency, sequence# and crunch time - so that the format is compatible with what Mike Hewson's RR_V5 will need for data input. My shell scripting skills are very rusty so I don't claim any elegance here - just a quick and dirty hack that seems to work. Sure beats retrieving and updating manually.
BTW, the script is supposed to append new data to previously harvested stuff and that hasn't been tested at all yet. I'll run it again when new entries appear on the website to make sure they get added to the file nicely.
There are no sanity checks for the hostid that you are asked to input so if you feed it rubbish you'll get rubbish back. I might add some error checking and recovery at a later stage.
This is designed to run in a directory of your choosing on a Linux system. You need the above code (called whatever you like - eg grabdata.sh) together with Bikeman's awk script (2nd version) both in the chosen directory. If you don't make the shell script executable, you can simply feed it to a shell, eg
sh grabdata.sh
and (when it asks) feed it as many hostids (one at a time) as you want to collect data for. If you do set the execute bit then you can run it directly from a command prompt.
The results will appear in a file called "results.$hostid" so you can easily recognise the different ones you collect. You never delete these files as they are appended to each time you run the script and give it that particular hostid. This way you will keep records long after the results disappear from the online database. If these records are important to you then back them up.
Cheers,
Gary.
Okey dokey ... here's
)
Okey dokey ... here's RR_V5A.
Features most of the requests received, I think. :-)
Many thanks in particular to archae86 who provided me with a swag of data from his machines, to test with! :-)
Thanks to Gary for the 'phase' hint! :-)
This is the 'heavy' version which takes in bulk wads of CSV data, by pasting into a text area. Analysis output is into an editable text area which is then used in summary generation via spawned pages. The prediction algorithm is basically unchanged, but can now be applied to an unlimited number of data points from whatever sets you choose. Your runtime system will determine behaviour limits ( ie. dynamic arrays used ) - there are none pre-programmed!
Kindly note it is the user's onus to assign significance to estimates - in particular whether or not to assume any dependence of the peak runtime and variance upon sky search frequency. Just drop the appropriate data points in and out of the analysis to suit. Indeed I'm rather curious of whether they allow good cross frequency predictions!
You may also need to review your browser's settings for "user data persistence in forms", pop-up blocking, window reuse/recycling, script settings/permissions, and the like.
Please let me know if you think it is vomiting, has a fever, is delirious, drunk and/or disorderly - for any reasons what-so-ever :-)
Thoughts for RR_V6:
#1 For Step 4 - controls to vary analysis output format as in: number precision levels, inclusion/exclusion of particular items, 'brief' and 'verbose' presets etc...
#2 I've discovered a free ( GNU Lesser General Public License ) Javascript graphics library which looks very promising indeed! I feel I may be able to generate graphs of the style we have seen posted here - specifically showing curve fitting to data points against axes. These would display dynamically depending on selection of data subsets and their analysis. Suggestions are welcome, but within the limits of combining the usual types of 2D primitives [ pen/color/thickness/font/fill/rectangle/arc/ellipse/polygon/polyline ... etc ]. :-)
#3 Given that #2 above ~ 'requires', within the HTML, an inclusion path to a separate *.js file ( ~ 24K on local disk & within same directory/folder ), then I might package files into a zip/RAR ( compacts to ~ 20K ) that could unwrap ( maybe self-extract? ) prior to use. What would be suitable across all likely platforms?
I will also do a minor upgrade of RR_V4 to become RR_V5B, now the 'lite' version. I've split into the two types to stop the code getting too rowdy ... :-)
[ It'd be a real shame if the E@H tasks lost their cyclic frequency dependence, eh? I wouldn't be writing sort algorithms on key/value pairs in associative arrays for starters ... :-) ]
Cheers, Mike.
I have made this letter longer than usual because I lack the time to make it shorter ...
... and my other CPU is a Ryzen 5950X :-) Blaise Pascal
RE: [ It'd be a real shame
)
A real shame indeed :). - You seem to be loving every minute of it :).
I'm playing with RR_V5A at the moment and I'll give a full report in due course. I'm also playing with my data gathering script - I'm keeping results for about 80 machines now and sucking in an update for each results file about every second day or so. The current version of the script seems to be working well - it has quite a few extra features with more to come - which is the main reason for this post.
I cut and pasted some data (30 lines) from one of my results files into RR_V5 and that worked fine. I printed out the summary page which also worked fine. There are some things in the output I'll need to report on when I get time. In cutting the input data, I needed to ensure that only one version of the science app was involved. I could do that because I had manually recorded when the changeover had occurred. A bit tedious for 80 hosts so I'd only done that for a few of interest. Also more tedious in the future if Bernd brings out a spate of new betas.
So one of the new features in my script is to grab the version of the app used to crunch the task. It's smart enough to recognise "mixed app" tasks as well. The app version info will now be recorded as a 4th field in my results files. It would be very handy if RR would at least not be too upset and perhaps just ignore the 4th field, or better still parse the field and separate the output into blocks based on app version. That way when Bernd announces a new beta with supposed speed improvements, a machine or two can be immediately switched and once a few results are in RR should give a nice verdict without any manual intervention and fiddling with the data gathering process :).
Here is an example of the new data format showing the transition from 4.21 to 4.27.
Let us know what you think :).
Cheers,
Gary.
RE: So one of the new
)
Good stuff. Which representation of the version does your script snag? Since you recognize (sorry for the "z", but I'm from the USA) mixed aps, I assume you are finding something in stderr out?
That would be good. I've been using BOINCView logs, which use the simplistic version information created at download time, so at every version change, the entire work queue is mislabeled, not just the mixing problem of results in process.
RE: ... I assume you are
)
Bikeman's awk script collects the taskID as well as the performance info. Previously I'd just be throwing that field away. Now, before throwing it away, I'm using it to grab the task details page (which contains stderr.out) and pass it through grep to find lines containing the string "einstein_S5R3_" and then stripping out everything bar the actual app version.
I then test what's left to see how many different versions were used. Most of the time there's just one, although that same value gets reinserted in stderr.out every time the app is restarted. It's possible that a task could be started with one version, continued with a second and finished with a third so I just record those results, where more than one version is found, as "mixed".
This all means that I don't have to be concerned about the version change or how results are "branded" in the cache of work on hand. Version changes (like the official Windows app becoming 4.26) can happen at short notice when one is not paying attention so I feel a bit "safer" now :).
Cheers,
Gary.
RE: A real shame indeed :).
)
Yup, like a pig in the proverbial ..... and I'm certainly happy to continue :-)
[ Careful spotters will note than some portions of development can still be utilised even if the cyclic behaviour is lost .... ]
Actually, that won't be too difficult! I've been writing 'loose'/'dynamic' components ( quasi object-oriented ) in expectation, so type of this stuff can wedged in without much ado. [ For the savvy : I'm using multi-dimensional, variable length, associative arrays - currently flowering in a 5D phase space near you ]. I guess all we need is some agreement on the overall approach. Best to stick with CSV as it is common & real easy to parse.
There's a host of possibilities here. Probably the simplest, easiest and most versatile approach ( future proof ? ) is to:
- have variable ordering of fields ( ie. get the algorithm to do the work ).
- no programmatic upper limit on number of fields.
- the first line in the file is the 'heading line' which declares/specifies the data naming and their order.
- reserve the field names FREQUENCY, SEQUENCE, and RUNTIME for use as per status quo. Require these to be present.
- user can invent their own field names by mentioning them in the first/heading line. The name will propagate as a string through to reports.
- obviously it is the user's onus to ensure subsequent lines contain the data ( and it's order ) that reflects the meaning declared on the heading line. [ Punt any lines in their entirety into the bit bucket upon the breach, for instance if too few data elements provided with respect to field declarations ]
- assume sorting upon the fields named FREQUENCY and SEQUENCE.
- a user field can be sorted upon if it's declared name is appended with ( say ) an '*S', as in APP_VERSION*S.
- a sortable field must have a self evident ordinal property ( numbers are fine, alphabetic strings are fine, mixtures not so good but do-able ). Assume ascending sort default, but use '*SD' to indicate 'sort descending'.
- for all thus sortable fields ( FREQUENCY, SEQUENCE +/- user's ) the order of their mention in heading line determines the sort heirarchy in reporting. If you give it a "silly" ordering that's what you'll get!
- for any non-sorted fields ( ie. name mentioned without the appended '*S' ) these get included for reporting within the lowest sort level after the currently reported stuff ( phase et al ).
As an example, CSV input of:
APP_VERSION*SD,FREQUENCY,SEQUENCE,RUNTIME,FOFFLE
4.21,0724.60,040,38297,foffle_val_A
4.21,0724.60,057,37213,foffle_val_C
4.21,0724.60,049,37119,foffle_val_B
4.21,0724.60,069,38735,foffle_val_D
4.21,0724.60,231,46945,foffle_val_E
mixed,0724.65,101,49042,foffle_val_G
4.27,0724.65,080,38987,foffle_val_Z
4.27,0724.70,005,49625,foffle_val_X
4.27,0724.70,018,43731,foffle_val_J
4.27,0724.75,025,40767,foffle_val_A
4.27,0724.75,048,34721,foffle_val_R
4.27,0724.75,049,34541,foffle_val_W
4.27,0724.75,073,36818,foffle_val_Q
would yield in the report:
[pre]APP_VERSION = 4.27
FREQUENCY = 724.65
SEQUENCE = 80
RUNTIME = 38987
current stuff
FOFFLE = foffle_val_Z
FREQUENCY = 724.70
SEQUENCE = 5
RUNTIME = 49625
current stuff
FOFFLE = foffle_val_X
SEQUENCE = 18
RUNTIME = 43731
current stuff
FOFFLE = foffle_val_J
FREQUENCY = 724.75
SEQUENCE = 25
RUNTIME = 40767
current stuff
FOFFLE = foffle_val_A
SEQUENCE = 48
RUNTIME = 34721
current stuff
FOFFLE = foffle_val_R
SEQUENCE = 49
RUNTIME = 34541
current stuff
FOFFLE = foffle_val_W
SEQUENCE = 73
RUNTIME = 36818
current stuff
FOFFLE = foffle_val_Q
APP_VERSION = 4.21
FREQUENCY = 724.60
SEQUENCE = 40
RUNTIME = 38297
current stuff
FOFFLE = foffle_val_A
SEQUENCE = 49
RUNTIME = 37119
current stuff
FOFFLE = foffle_val_B
SEQUENCE = 57
RUNTIME = 37213
current stuff
FOFFLE = foffle_val_C
SEQUENCE = 69
RUNTIME = 38735
current stuff
FOFFLE = foffle_val_D
SEQUENCE = 231
RUNTIME = 37119
current stuff
FOFFLE = foffle_val_E
mixed
FREQUENCY = 724.65
SEQUENCE = 101
RUNTIME = 49042
current stuff
FOFFLE = foffle_val_G[/pre]
... you get the idea. :-)
Ta muchly! I'll knead the bread on that ... :-)
Cheers, Mike.
I have made this letter longer than usual because I lack the time to make it shorter ...
... and my other CPU is a Ryzen 5950X :-) Blaise Pascal
ALERT! I've uncovered an
)
ALERT! I've uncovered an error in some Javascript implementations, with regard to parsing of strings that represent numeric values if they have leading zeroes. Subsequently analysis will veer off with weirdness!
I've corrected this problem with a workaround, so please re-download at your earliest opportunity. You may have leading zeroes with this update.
[ NB. Errr Gary, your data has said leading zeroes - which is how I found out! ]
Cheers, Mike.
I have made this letter longer than usual because I lack the time to make it shorter ...
... and my other CPU is a Ryzen 5950X :-) Blaise Pascal
RE: NB. Errr Gary, your
)
I'm sorry they gave you strife but they were deliberately put in there for a couple of (what I consider :).) good reasons.
Firstly human readability. If you are planning to scan a results file looking for anomalies, having all the commas line up is very helpful.
Secondly sorting of lines. Sequence numbers 1, 1x, 1xx all "sort" (when no flags are used) together and before any starting with 2-9, no matter how many digits. I guess there is probably a flag to the sort utility that fixes this so I'll go read the manpage :).
If it's helpful, I'll take out the leading zeroes and find a more elegant solution to getting them to sort correctly. What would happen if they were leading spaces rather than leading zeroes? I do like to scan down neat columns of data :).
Cheers,
Gary.
RE: RE: NB. Errr Gary,
)
It's OK, it's fixed. I've stripped leading zeroes. Continue as you please! :-)
There's parseInt() and parseFloat() Javascript functions which you'd expect ought not to worry about any leading zeroes in the string affecting the interpreted value of the number it represents. However implementations are not uniform in their behaviour, or I've gravely misunderstood the functional semantics of what is meant by 'parse' ... :- )
As for spaces : I'll alter behaviour so that the fields within a line ( ie. as defined by a comma separator ) may have leading or trailing space. Other spaces within a field, not contiguous with a comma either side or line extremities, will fail that field and thus cause the line to be rejected. Thus:
[pre] stuff , next_one , last_one [/pre]
will be OK. But:
[pre] stuff , next one , last_one [/pre]
will fail.
[ As I will do this on a per-field basis then trimming at the line beginning and end will come out in the wash .... ]
Cheers, Mike.
( edit ) Changes made, altered version at same link, & spaces will now behave as described here.
( edit ) Oh, and 'space' means 'whitespace' as a generic idea. This includes tabulations et al, as well as ' ', in contiguous combinations.
( edit ) I failed to mention earlier that: if/when one sorts into a hierarchy as per some 'heading' specification given by the user, then that would imply the generated report would contain summary sections at each level reflecting that hierarchy choice. This means that if your lowest sort level is SEQUENCE then the level that immediately bounds/surrounds it will include a summary ( averages etc ) over all SEQUENCE settings. So if say SEQUENCE was then enclosed by FREQUENCY, then for each FREQUENCY value a summary is generated for all SEQUENCE values at that FREQUENCY value. Now if in turn FREQUENCY is enclosed by APP_VERSION say, then for each given value of APP_VERSION a summary will be produced over all values of FREQUENCY enclosed by it ( using the SEQUENCE summaries from each ). That way the amalgamated/aggregated data/information flows upwards towards your first choice of sort parameter - but you can still look deeper/down the pyramid if you like, at secondary and tertiary levels. I'll shut up and slope off to bed now ... :-)
( edit ) Well I did misunderstand 'parse', actually. Leading zeroes imply an octal radix. Tis now fixed with an explicit decimal radix parameter, and I've also chosen a more robust cross-platform choice of whitespace trimming regEx. Sigh .... :-)
I have made this letter longer than usual because I lack the time to make it shorter ...
... and my other CPU is a Ryzen 5950X :-) Blaise Pascal