As far as I know the Hough algorithm involves very little arithmetic but is heavily using updates on matrices, where, unfortunately, the matrix cells are not accessed in a sequential order (so the underlying array can not be traversed with a constant stride. The pattern of access will be too unpredictable for the CPU's hardware prefetcher to be of any help, so it's possible that memory latency becomes a limiting factor. Maybe some explicit prefetching will help.
IIRC the size of the matrices varies with frequency. The update is data dependent. In principle the matrix is a "Hough Plane" or "Hough space graph" as called in the Wikipedia article about the Hough transform. There is a lookup table (cache) associated to some calculations done before updating the map. The number of hits / misses to this cache btw. is responsible for the run-time variation we see for workunits with the same number of templates. The values added are not binary (1 or 0), but involve a weight factor.
The "new" code is currently undergoing some internal review and validation process. Though I doubt that serious issue arise, it may still be changed.
I guess its more efficient right now to concentrate on getting the remaining bugs out of the rest of the code and get the SSE optimization into the other architectures.
I guess its more efficient right now to concentrate on getting the remaining bugs out of the rest of the code and get the SSE optimization into the other architectures.
Could you make an overview post of what bugs remain? Is it mostly the input domain issues, and is that with the XLAL code???
My mac mini had two results complete and 1 validated ok already with 4.10 Prior low time was ~51000 sec and the new app dropped it to ~43,000. Sweet!!!
..and that was a "slow" one.
Low times on a 1.83GHz Core Duo Mac mini can now be well below 40k sec. .
Peanut's Mac Pro is also doing well (it's easily found, just look at the top of the "Top Hosts" stats.
Peanut's Mac Pro is also doing well (it's easily found, just look at the top of the "Top Hosts" stats.
I wish I had the cash to buy one of those systems, because I wouldn't buy one. I'd use it to pay down my debt so I can then go to work for $9/hr (the only jobs that will take me) and not have to worry about taking that low pay and then getting a 2nd job to make sure that I can pay all my bills...
I am still amazed by the reduction of CPU secs on my core duo mac minis (1.83 and 1.66Ghz). 4.10 turns a low power pretty quick box into a low power super fast box. :0
My Mac Pro is definitely a high priced toy. I am pretty frugal with money for the most part and have stayed out of debt for the most part. So I had some money to "blow" and help the economy in a small way. If everyone was like me the economy would tank fast from lack of consumer spending.
App 4.10 is good for the Pro for sure, though not as stunning a speed up as for the core duo's. I have not seen an invalid result yet.
We get some -185 ("can't start App") errors from 8-core Macs that have a shmem message in stderr_out. The problem is that the number of shared memory segments on MacOS (actually FreeBSD, I believe) is limited to 8. The Client uses shared memory to communicate with the Apps, so if there is anything else than BOINC that uses shared memory running on such a system, you get into trouble on one or the other end.
There is a way around this, but this will require a new App (and a recent Core Client)
I have run into the shared memory issue. It caused me to error out 8 tasks once. After that, I did what this site http://www.spy-hill.net/help/apple/SharedMemory.html said and have not had any problems since. I was not aware of the 8 limit, but the default size of the shared memory segment seems too low for many core machines.
Also, in the first post in this thread there was mention of possible improvement to the "stuck" task problem. Since I have put 4.10 on my macs, none of them have had a task get stuck. That is another good point for 4.10.
I have run into the shared memory issue. It caused me to error out 8 tasks once. After that, I did what this site http://www.spy-hill.net/help/apple/SharedMemory.html said and have not had any problems since. I was not aware of the 8 limit, but the default size of the shared memory segment seems too low for many core machines.
Also, in the first post in this thread there was mention of possible improvement to the "stuck" task problem. Since I have put 4.10 on my macs, none of them have had a task get stuck. That is another good point for 4.10.
Thanks for the link. Is it the same for Leopard, or is this a Tiger-only issue?
As far as I know the Hough
)
As far as I know the Hough algorithm involves very little arithmetic but is heavily using updates on matrices, where, unfortunately, the matrix cells are not accessed in a sequential order (so the underlying array can not be traversed with a constant stride. The pattern of access will be too unpredictable for the CPU's hardware prefetcher to be of any help, so it's possible that memory latency becomes a limiting factor. Maybe some explicit prefetching will help.
CU
Bikeman
IIRC the size of the matrices
)
IIRC the size of the matrices varies with frequency. The update is data dependent. In principle the matrix is a "Hough Plane" or "Hough space graph" as called in the Wikipedia article about the Hough transform. There is a lookup table (cache) associated to some calculations done before updating the map. The number of hits / misses to this cache btw. is responsible for the run-time variation we see for workunits with the same number of templates. The values added are not binary (1 or 0), but involve a weight factor.
The "new" code is currently undergoing some internal review and validation process. Though I doubt that serious issue arise, it may still be changed.
I guess its more efficient right now to concentrate on getting the remaining bugs out of the rest of the code and get the SSE optimization into the other architectures.
BM
BM
RE: I guess its more
)
Could you make an overview post of what bugs remain? Is it mostly the input domain issues, and is that with the XLAL code???
RE: Could you make an
)
I moved this over to the Client Errors of S5R2/S5R3 Apps thread in the Problems & Bug reports board.
BM
BM
Peanut wrote RE: My mac
)
Peanut wrote
..and that was a "slow" one.
Low times on a 1.83GHz Core Duo Mac mini can now be well below 40k sec. .
Peanut's Mac Pro is also doing well (it's easily found, just look at the top of the "Top Hosts" stats.
Bikeman
RE: Peanut's Mac Pro is
)
I wish I had the cash to buy one of those systems, because I wouldn't buy one. I'd use it to pay down my debt so I can then go to work for $9/hr (the only jobs that will take me) and not have to worry about taking that low pay and then getting a 2nd job to make sure that I can pay all my bills...
:sigh:
I am still amazed by the
)
I am still amazed by the reduction of CPU secs on my core duo mac minis (1.83 and 1.66Ghz). 4.10 turns a low power pretty quick box into a low power super fast box. :0
My Mac Pro is definitely a high priced toy. I am pretty frugal with money for the most part and have stayed out of debt for the most part. So I had some money to "blow" and help the economy in a small way. If everyone was like me the economy would tank fast from lack of consumer spending.
App 4.10 is good for the Pro for sure, though not as stunning a speed up as for the core duo's. I have not seen an invalid result yet.
We get some -185 ("can't
)
We get some -185 ("can't start App") errors from 8-core Macs that have a shmem message in stderr_out. The problem is that the number of shared memory segments on MacOS (actually FreeBSD, I believe) is limited to 8. The Client uses shared memory to communicate with the Apps, so if there is anything else than BOINC that uses shared memory running on such a system, you get into trouble on one or the other end.
There is a way around this, but this will require a new App (and a recent Core Client)
BM
BM
I have run into the shared
)
I have run into the shared memory issue. It caused me to error out 8 tasks once. After that, I did what this site http://www.spy-hill.net/help/apple/SharedMemory.html said and have not had any problems since. I was not aware of the 8 limit, but the default size of the shared memory segment seems too low for many core machines.
Also, in the first post in this thread there was mention of possible improvement to the "stuck" task problem. Since I have put 4.10 on my macs, none of them have had a task get stuck. That is another good point for 4.10.
RE: I have run into the
)
Thanks for the link. Is it the same for Leopard, or is this a Tiger-only issue?
CU
H-B