HostID 1001562 - Richard Haselgrove's Q6600 Quad Core

archae86
archae86
Joined: 6 Dec 05
Posts: 3,008
Credit: 4,847,784,355
RAC: 3,348,361

RE: In order to fine-tune

Message 78827 in response to message 78826

Quote:
In order to fine-tune the period-estimate, it would be possible to actually check the period length by visualizing the result if you are lucky enough to have results with sequence numbers that are multiples of the supposed period length. I've got a database of a couple of hundreds of results, I'll check If I have any of those candidates.


Single-point estimates are pretty badly subject to error from the noise in CPU time for an individual sample. I prefer to calculate the residual to an estimate for a large series, and plot it along with the actual data.

When one is a bit below or a bit above the actual period, the residuals drift up or down in a way that is pretty easy to spot on top of the random noise. That is not what I did in creating the orginal estimates for about a dozen widely diverse frequencies from which I estimated .000206 as a fit in the first place. It is what I did when I rechecked that for recent high frequencies for which I had many samples, and concluded that for the specific frequencies I had .0002044 was definitely too low, with perhaps .0002055 being about right.

I think we are more in trouble on the 10-Hz step function than in overall period estimation or the functional form.

As it happens, my "cleanest" host--the one with least CPU time noise, got Work Units spanning much of three cycles quite recently at frequencies within a tenth Hz or two to 792.15.

Using Mike Hewson's Ready Reckoner V7a which had the .000206 value, and falsifying the frequencies which were not exactly 792.15 to get it to treat them all as one, it generated this graph in the multi-cycle form:

I then falsified the frequency to 802.15 and to 782.15. The cumulative error going out to the later cycles was striking in both cases. Yet that shift of 10 Hz is just what we are currently getting wrong at each boundary, if I correctly understand Bikeman's insight, and he has it right.

By contrast, the waveshape error, arising from the incorrect use of sin rather than quadratic form, seems much less. For this specific frequency the cycle period error is also quite low.

I hope this illustrates why I think the order of priority should be:

1. get the 10-Hz step matter resolved: is it really there, and exactly where is the step boundary?

2. with the 10-Hz step in place, back-check that the implied cycle period estimate from Gary's analysis of the skygrid file line counts in fact works well with the considerable historical archive over a broad frequency range. I suspect it will--but I want to check.

3. try out the quadratic waveform representation, for truth, for better waveform accuracy, and for simplicity of computation.
[edited to correct a single spelling error]

archae86
archae86
Joined: 6 Dec 05
Posts: 3,008
Credit: 4,847,784,355
RAC: 3,348,361

RE: Might be worth keeing

Message 78828 in response to message 78825

Quote:
Might be worth keeing an eye on Peanut's host. Since the list Gary posted this morning (tasks up to 0909.30), I see he's been issued work at 0909.45 and 0909.50: with any luck, he'll carry on and step right over any 0910.xx transition point.


Do we know a way to establish which skygrid is actually used for a particular result computation?

I can look at my own host's message log and see when one is downloaded relative to work assignment times, and make a pretty strong inference, but I don't know a way to get even that level of assurance on someone else's host. Sadly none of my four hosts is currently working near a 10 Hz boundary.

Anyway, my guess is that the scheduler knows full well when a new skygrid is required, and is unlikely to just march on over the boundary. Sure would be nice, though. We'll see.

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5,491
Credit: 63,225,900,229
RAC: 54,304,983

RE: Gary, this is very

Message 78829 in response to message 78824

Quote:

Gary, this is very interesting. I have two comments initially:

1. As the Skygrid file gets used over a 10 Hz range, and is labelled for something near the top of that range, and you are adding .433 to that label, there is a built-in reason for the resulting value of the "archae86 constant" to be a bit lower than one based on observation of the cycle behavior of specific results.

Absolutely, and I should have made that clear. I'm not at all saying that your value was inaccurate. I'm just troubled by what we should be using for frequency and so as I experimented with higher values, so your constant became lower.

I should have also made it clear that, with a large number of machines to observe, I get to see quite a lot of data filename frequencies and exactly what skygrid file is being used with that data frequency. In general, the skygrid filename frequency is up to 10Hz above the data frequency so without much checking, I originally assumed that the 10Hz range was based on exact whole numbers - ie skygrid_800 applied precisely for data filename frequencies from 790.00 to 800.00.

I do clearly recall seeing an example of a data frequency of nn0.x0 or nn0.x5 that was using the nn0 skygrid, although I don't recall exactly what x was. I recall the event clearly because I thought at the time that it was rather unusual and it was breaking my preconceived notion of how things were supposed to work. This was before you published your cycle model so I didn't really have a reason to investigate further.

I've grepped through the results lists I'm keeping to see If I can find current examples. The problem now is that earlier versions of the app (4.14, 4.15) used to show the command line flags including the skygrid filename. The recent versions 4.27 and 4.32 don't show that information. Anyone know of another way to work out exactly what skygrid file was used when it no longer seems to appear in stderr.out? All the hosts I'm keeping records for are now on 4.27 (Linux) or 4.32 (Windows), (I think), so I don't have an easy way to check after the event. I guess I'll just have to wait until another example comes up.

Quote:
2. As you've focused on the skygrid relation, you've not mentioned the 10 Hz step function. When I first read your post, I thought you might have found that the true transition point is .433 Hz above the labeled frequency, But you actually seem to be finding that as a relationship fit.

If I correctly understand you - yes. I think there will be a '10Hz step function' as you call it but I don't claim to understand why. All I'm doing is pointing out how the addition of .433 seems to bring a remarkable consistency - too consistent to be a coincidence. Smarter minds than mine might be able to explain this.

Quote:
Assuming this is all right, the implied revision seems to be to use a 10-Hz step quantization of the period estimate. It does, so far as I know, remain a need to establish just where that step should be. From someone's (you, Bikeman?) previous post I vaguely recall that the step to use of the next higher skygrid file is about .5 Hz above is label. If it is actually .433, that would unify things even more.

I mentioned it early on in the big thread and have basically repeated it in this post. I intend to attempt to find a current example - ie a running process where I can actually see the skygrid being used by looking in the slot directory. I can also use 'ps ax | grep einstein' on Linux to see all the flags being used. If someone knows a better way, please say so.

Quote:
so instead of:
period = .000206*frequency^2
we should use:
period = .00020417*stepped_frequency^2

I think so.

Quote:

We still don't know how to determine stepped_frequency, but expect it to climb in steps at 10-Hz intervals "about1" .5 Hz above the more obvious 10-Hz integer multiple points, and to take the numeric value of "about2" .433 Hz above the round number top end of that 10-Hz interval which is in the actual skygrid file.

With my usual desire to back theory with observation, I'd like to backcheck this against actual cyclic behavior from my archives if we can get a specific function agreed for review. It seems to be that you have been specific indeed about everything save "about1".

I've been specific about .433 because that is the precise value that "works".
The only reason I mentioned .45 (not .5) was simply the guess that since data filename frequencies went in .05 steps then 800.45 would round to 800 whereas 800.50 would round to 801. My guess then was that data filename values of 790.50 to 800.45 would be associated with the 800 skygrid. Until I can find the observations to support this, it's really no more than a guess. If someone can add light on why .433 "works" I'll be very happy.

Quote:
For those who might be tempted to think the 10-Hz step thing, if real, is modest enough to ignore, I'll point out that for a frequency going from 900 to 901 Hz, this implies a "jump" in cycle length from 165.5 to 169.2 For sequence numbers out near the third peak, this is a peak point shift of almost twelve sequence numbers.

This drift in the peak position is exactly what was concerning me and why I asked Mike to try the mods I suggested on RH's data and see what he thought.

Quote:
At the current time, I think improving this cycle estimate has more impact than ....

I agree.

Quote:
Gary, Bikeman, am I comprehending what you are saying?

Answering for myself - yes, I believe so.

Quote:
More specifically, does anyone know from any source the "correct" value of about1?

No. I think we may need current observations to find it.

Quote:
Before I invest time in backchecking this possible revision, I'd like comments on whether you think it probably the right way forward.

I think it's the right way forward. I'm tempted to think that observing a fast machine doing higher frequencies with lots of sequence numbers will better show the effects of tweaking the parameters being used in the model.

Cheers,
Gary.

peanut
peanut
Joined: 4 May 07
Posts: 162
Credit: 9,644,812
RAC: 0

Richard definitely has a lock

Richard definitely has a lock on the 909.15 freq band. The RR predicts the trough at 426 and Richard has a 429. Once again, the -0's are just a quirk in my system. They just mean Richard hasn't crunched them yet.

909.15 429 -0
909.15 430 -0
909.15 431 -0
909.15 432 -0
909.15 433 -0
909.15 434 -0
909.15 435 -0
909.15 436 -0
909.15 437 -0
909.15 438 -0
909.15 439 -0
909.15 440 -0
909.15 441 -0
909.15 442 -0
909.15 443 -0
909.15 444 -0
909.15 445 -0

archae86
archae86
Joined: 6 Dec 05
Posts: 3,008
Credit: 4,847,784,355
RAC: 3,348,361

RE: RE: Before I invest

Message 78831 in response to message 78829

Quote:
Quote:
Before I invest time in backchecking this possible revision, I'd like comments on whether you think it probably the right way forward.

I think it's the right way forward. I'm tempted to think that observing a fast machine doing higher frequencies with lots of sequence numbers will better show the effects of tweaking the parameters being used in the model.


Gary, thanks for your detailed and thoughtful reply.

I've spent a couple of hours this afternoon backchecking. My first thought was to redo my cycle period estimate for each of my old large samples, then try to work forward for consistency checking with the revised estimate. But your method yields directly a much more precise estimate than I can get, so I decided instead to use Ready Reckoner v7c (the one with an adjustable sky grid density constant) set to your .00020417. I prepped input material by using an Excel cell function to boost the frequency in the result name to the next higher integer multiple of 10 plus .433. Then I copied all the bigger groups into RR and looked at the graph for good fit on repetition cycle frequency.

(one comment--RR v7c appears not to have some form of garbage collection for internal state accumulated when one handles successive batches of inputs. It can process a first set of many dozens, even a couple of hundred, inputs in reasonable time, but slows way down if one continues by using the clear input button and just pasting in new sets. I found it faster to kill the tab it was running in and reload the html file once every few sets.)

I had lots of chances to purge outliers caused by application changes for which my label lagged the truth, and a few other instances, so I was not asleep. Generally the frequency cycle fit looked quite good.

I can't say that this exercise had enough precision to say that the revised method is actually better than the old, but the task was not that, but rather a conservative backcheck to assure that the data did not object.

Unfortunately, I don't think I have data at hand which would let me confidently demonstrate the 10-Hz flat step behavior directly from result cycles.

That also leaves the question of what the proper value for "about1" is, the amount above a 10-Hz integer multiple at which things switch to the next higher skygrid. I suspect the answer is at least .05, as I've seen nn0.00 and nn0.05 h1 and l1 files load as part of a big group from just below an nn0 boundary.

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 2,040
Credit: 686,903,812
RAC: 927,941

RE: Richard definitely has

Message 78832 in response to message 78830

Quote:
Richard definitely has a lock on the 909.15 freq band. The RR predicts the trough at 426 and Richard has a 429. Once again, the -0's are just a quirk in my system. They just mean Richard hasn't crunched them yet.


Peanut - did you jinx me, or did I jinx myself with the 'Goldilocks' theory? Next one after 429 is - 0909.25__471.

Looking at it, my _429 has still not been issued to a wingman, whereas we've had both tasks issued for 0909.25__471. I think that lends weight to my suggestion that I've been requesting work faster than the workunit generator is prepared to make it available.

I think I'll chase down to 429 as at present, then restart CPDN on one core to slow the progress down - after all, I am crunching a lot faster than the continuous run I got for 4.07. So 909.25 may have slightly different experimental conditions from 909.15. See if I can generate the 'fast machine doing higher frequencies with lots of sequence numbers' Gary wanted. Comments?

I'm plotting a graph with the results so far, and it's much rougher than the previous one. I'll publish it once 429 is in.

Mike Hewson
Mike Hewson
Moderator
Joined: 1 Dec 05
Posts: 6,208
Credit: 136,034,357
RAC: 64,188

RE: (one comment--RR v7c

Message 78833 in response to message 78831

Quote:
(one comment--RR v7c appears not to have some form of garbage collection for internal state accumulated when one handles successive batches of inputs. It can process a first set of many dozens, even a couple of hundred, inputs in reasonable time, but slows way down if one continues by using the clear input button and just pasting in new sets. I found it faster to kill the tab it was running in and reload the html file once every few sets.)


Yup, I've noticed that too. Javascript does allegedly have garbage collection, but alas doesn't specify precisely when an implementation should do that - only that it should happen at all after a variable has gone out of scope/context ( some time this year perhaps? ). With other languages there is an expectation of promptness here, but in fact the specification evens allows behaviour with thresholds before action. Hence if one's memory usage is below whatever the implementation has decided that is, then it won't actually get done for that window/tab instance! I've tried 'aggressively' resetting array pointers ( using 'new' ) after their reference is no longer required - but it doesn't seem to trigger any speed change. Apparently MSIE up to and including 6 have a memory leak problem too... :-(

Cheers, Mike.

( edit ) So, out of interest, what is your browser?

I have made this letter longer than usual because I lack the time to make it shorter. Blaise Pascal

archae86
archae86
Joined: 6 Dec 05
Posts: 3,008
Credit: 4,847,784,355
RAC: 3,348,361

RE: ( edit ) So, out of

Message 78834 in response to message 78833

Quote:

( edit ) So, out of interest, what is your browser?


I've been running RR v7c in Firefox 2.0.0.12.

I also have Internet Explorer of sufficiently recent version that it has tabs and I can't find a "help about" to tell me the version number.

Next time I'm dinking around with RR v7c I'll try comparing the two.

RandyC
RandyC
Joined: 18 Jan 05
Posts: 3,622
Credit: 111,139,797
RAC: 0

RE: RE: ( edit ) So, out

Message 78835 in response to message 78834

Quote:
Quote:

( edit ) So, out of interest, what is your browser?

I've been running RR v7c in Firefox 2.0.0.12.

I also have Internet Explorer of sufficiently recent version that it has tabs and I can't find a "help about" to tell me the version number.

Sounds like you've somehow unchecked the 'Menu Bar' option. Right-click on one of the tool bars and check it back on. Should be able to get "help about" then.

Seti Classic Final Total: 11446 WU.

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5,491
Credit: 63,225,900,229
RAC: 54,304,983

I'm looking at how to glean

I'm looking at how to glean from my large batch of machines, clear evidence of when the transition to the next skygrid file occurs. As an example, if I'm crunching 769.95 data using a 770 skygrid file, how far can I go before a new skygrid file is mandated. Can I go to 770.xx before a change - where xx is anywhere from 00 to 95?

By grepping the results files I've been keeping, there are a number of hosts that have data files very close to one side or another of the supposed 10Hz boundary. But how can we determine for older results, exactly what skygrid file was being used? For older apps like 4.14 or 4.15, it's recorded in stderr.out. For newer apps like 4.26, 4.27, 4.32, it's not.

As an example, I found a host that crunched 780.10 data with app 4.26 - so no record of the skygrid used. I decided to trawl through stdoutdae.txt in the BOINC directory of that machine looking for when the data was first sent to that machine. Here's a snippet of what I found. These messages immediately follow the server instructions to delete the previous (and quite unrelated) data files:-

Quote:

2008-01-27 05:59:01 [Einstein@Home] Reason: requested by project
2008-01-27 05:59:03 [Einstein@Home] File skygrid_0790Hz_S5R3.dat exists already, skipping download
2008-01-27 05:59:03 [Einstein@Home] [file_xfer] Started download of file h1_0780.10_S5R2
2008-01-27 05:59:03 [Einstein@Home] [file_xfer] Started download of file l1_0780.10_S5R2
2008-01-27 06:01:28 [Einstein@Home] [file_xfer] Finished download of file l1_0780.10_S5R2
2008-01-27 06:01:28 [Einstein@Home] [file_xfer] Throughput 22454 bytes/sec
2008-01-27 06:01:28 [Einstein@Home] [file_xfer] Started download of file h1_0780.15_S5R2
2008-01-27 06:01:45 [Einstein@Home] [file_xfer] Finished download of file h1_0780.10_S5R2
2008-01-27 06:01:45 [Einstein@Home] [file_xfer] Throughput 22205 bytes/sec
2008-01-27 06:01:45 [Einstein@Home] [file_xfer] Started download of file l1_0780.15_S5R2
2008-01-27 06:04:02 [Einstein@Home] [file_xfer] Finished download of file l1_0780.15_S5R2
2008-01-27 06:04:02 [Einstein@Home] [file_xfer] Throughput 23633 bytes/sec
2008-01-27 06:04:02 [Einstein@Home] [file_xfer] Started download of file h1_0780.20_S5R2
2008-01-27 06:04:08 [Einstein@Home] [file_xfer] Finished download of file h1_0780.15_S5R2
2008-01-27 06:04:08 [Einstein@Home] [file_xfer] Throughput 22284 bytes/sec
2008-01-27 06:04:08 [Einstein@Home] [file_xfer] Started download of file l1_0780.20_S5R2
2008-01-27 06:06:25 [Einstein@Home] [file_xfer] Finished download of file l1_0780.20_S5R2
.....

As you can see new data starting at 780.10 was being sent with the 790 skygrid - not the 780 skygrid I was expecting. I wonder if there's a difference when data distribution starts above rather than actually traversing a supposed boundary.

I'll continue to trawl through the identified hosts looking for examples of starting just below a boundary and then traversing it.

Cheers,
Gary.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.