Only got thru 5 of these (V0.05) before I had to shut down for the week. They are averaging 12400 seconds compared to V0.04 time of 24,400 seconds. These on a i7 6900K @4GHz.
I had wondered if these super fast times were too good to be true. When 0.04 morphed into 0.05 (which was just as good, if not slightly better) I again wondered about this. Was this was just a further tweak or was it perhaps an indicator of real problems with 0.04 because it shouldn't have been that fast.
0.05 has now been removed from the list of applications and there is no test app to replace it. We are back to 0.03 so I guess that means there shouldn't have been that sort of speedup after all :-(. However, it was good while it lasted :-).
EDIT: Just noticed 0.06!! It's not marked as beta. I wonder how this one will go?
I've got a 0.06 running now. It's early days but it looks like the speed is much the same as 0.04 and 0.05. This is based on the current rate of progress remaining constant. It's a lot faster than 0.03 was up to this point (7% complete).
A 0.03 took around 30,000s on this host. This 0.06 is projected to take 16,200s.
The 0.06 app turned up last night, just as I was preparing to wind things up for the day. It was listed on the apps page without the beta tag. This morning it's now listed as beta so I've adjusted the thread title to reflect that.
I'm running these CPU tasks on a couple of Athlon 200GE based systems - Ryzen processors but lower core count and clock speed compared to the more expensive members of the series. I have times for apps from 0.03 to 0.06. Initially, I wasn't allowing test apps so there aren't any 0.04 results to give.
There is a short note from Bernd in Technical News about what's going on so it looks like this very welcome speed increase might be here to stay! Thanks Bernd!!
Here are the approx crunch times so far for the above processor model. I should add that the internal Vega graphics capability was being used to crunch FGRPB1G tasks simultaneously. The task mix was 2xO1OD1E and 1xFGRPB1G. The GW times cover some quite different spin frequencies and I don't know if that has any effect on the speed of crunching.
I'll update the above table once there are a few more 0.06 tasks to give a better indication. At this point I wouldn't claim that 0.06 is necessarily worse than 0.05, although I think it could be - slightly. There is a bit of variability in the times so a bigger sample size is needed. I was quite interested to see that the calculations appear to be relatively linear since last night's prediction at 7% done turned out to be quite accurate.
One of the 0.06 tasks has already validated against a 0.03 so that looks hopeful for what Bernd is trying to achieve.
In the app versions 0.04-0.06 part of the computation is sped up by using a "lookup table" for some usually slow computation (sinc()). This is less precise, but much faster. The difference between these app versions is the size of the table; we're looking for the size that gives enough precision for reliable validation. Speed between these App versions shouldn't vary at all, there's no difference in the code.
A doubling of speed is indeed pretty impressive, what I got from averaging over all hosts that ran both is only 20-25%. There may be outliers, we issued these app versions as "Beta Test" only due to the probably poor validation.
I want to upgrade, on this project, my GTX1060. I know AMD GPUs produce a lot more work on BRP tasks than Nvidea GPUs due to their double precision capabilities is that also true on the gravity wave apps? I'm thinking of something like an RX580. Any comments would be appreciated.
I want to upgrade, on this project, my GTX1060. I know AMD GPUs produce a lot more work on BRP tasks than Nvidea GPUs due to their double precision capabilities is that also true on the gravity wave apps?
I don't think DP performance was important, GPU memory throughput was. Of course I don't know about the upcoming applications. The GTX 1060 doesn't seem to require short term replacement and the new applications are still in development, why not just wait a bit? Maybe you get to see a direct comparison when the GPU tasks no longer need to be verified by CPUs. And I have prepared a host with both a GTX 750 and a HD 7750 that I plan to start when linux apps are there. The hardware is a bit old but I hope we still can learn something from it.
And I have prepared a host with both a GTX 750 and a HD 7750 that I plan to start when linux apps are there. The hardware is a bit old but I hope we still can learn something from it.
They are both still quite nice for low power. I suspect the HD 7750 will do better, simply because it is better at OpenCl.
It would of course be nice if they come out with a CUDA version. I would wait and see too.
I know AMD GPUs produce a lot more work on BRP tasks than Nvidea GPUs due to their double precision capabilities ...
Binary Radio Pulsar searches (BRP) finished for discrete GPUs like yours a long time ago - I think you probably are referring to the current Gamma ray pulsar searches based on Fermi satellite data (FGRP). Double precision is only used for the last 10% (the follow up stage calculations) and as long as a GPU has some DP capability, the crunch time wont be significantly altered.
I agree with the advice given by the previous posters. I really do think your current GPU is fine and it would be wise to wait and see how the new search performs when the app is fully developed and mature. That could take a while and I wouldn't be surprised to see it be quite different to how the FGRPB1G app behaves. Best to make your GPU choice when you see the final performance.
From previous posts you have made, I get the impression that RAC is important to you. From the tiny bit of information available so far, switching to the GW GPU app is likely to lead to a significant reduction in RAC, even if there are no 'teething' problems with the development of the app. If the importance of the science (in your own opinion) is the paramount consideration, then by all means switch to the new search but at least wait until things settle down - unless you really do enjoy a bumpy ride :-).
My personal opinion is that whilst it's nice to be involved in finding previously unknown gamma ray pulsars, the holy grail will be the ultimate detection of continuous GW emissions. The huge spike (relatively speaking) from a BH-BH or NS-NS merger has become a bit 'mundane' these days (only joking :-) ) and is not something that E@H would ever be involved in detecting first. E@H is well positioned for the much harder detection of continuous emissions. They must be there - they just need to be found :-).
My current intention is to just wait for a Linux app and then to run just a host or two during the testing phase. Once it seems 'safe' (ie. a stable app, out of test status and not being changed at regular intervals) I would like to transition the whole fleet to a GW search (hopefully GPU based if that provides the most efficient use of hardware), whatever the RAC situation happened to be. It's nice to see a high RAC but it's insignificant in comparison to the prospect of finding continuous GW.
Only got thru 5 of these
)
Only got thru 5 of these (V0.05) before I had to shut down for the week. They are averaging 12400 seconds compared to V0.04 time of 24,400 seconds. These on a i7 6900K @4GHz.
I had wondered if these super
)
I had wondered if these super fast times were too good to be true. When 0.04 morphed into 0.05 (which was just as good, if not slightly better) I again wondered about this. Was this was just a further tweak or was it perhaps an indicator of real problems with 0.04 because it shouldn't have been that fast.
0.05 has now been removed from the list of applications and there is no test app to replace it. We are back to 0.03 so I guess that means there shouldn't have been that sort of speedup after all :-(. However, it was good while it lasted :-).
EDIT: Just noticed 0.06!! It's not marked as beta. I wonder how this one will go?
Cheers,
Gary.
I've got a 0.06 running now.
)
I've got a 0.06 running now. It's early days but it looks like the speed is much the same as 0.04 and 0.05. This is based on the current rate of progress remaining constant. It's a lot faster than 0.03 was up to this point (7% complete).
A 0.03 took around 30,000s on this host. This 0.06 is projected to take 16,200s.
Cheers,
Gary.
The 0.06 app turned up last
)
The 0.06 app turned up last night, just as I was preparing to wind things up for the day. It was listed on the apps page without the beta tag. This morning it's now listed as beta so I've adjusted the thread title to reflect that.
I'm running these CPU tasks on a couple of Athlon 200GE based systems - Ryzen processors but lower core count and clock speed compared to the more expensive members of the series. I have times for apps from 0.03 to 0.06. Initially, I wasn't allowing test apps so there aren't any 0.04 results to give.
There is a short note from Bernd in Technical News about what's going on so it looks like this very welcome speed increase might be here to stay! Thanks Bernd!!
Here are the approx crunch times so far for the above processor model. I should add that the internal Vega graphics capability was being used to crunch FGRPB1G tasks simultaneously. The task mix was 2xO1OD1E and 1xFGRPB1G. The GW times cover some quite different spin frequencies and I don't know if that has any effect on the speed of crunching.
I'll update the above table once there are a few more 0.06 tasks to give a better indication. At this point I wouldn't claim that 0.06 is necessarily worse than 0.05, although I think it could be - slightly. There is a bit of variability in the times so a bigger sample size is needed. I was quite interested to see that the calculations appear to be relatively linear since last night's prediction at 7% done turned out to be quite accurate.
One of the 0.06 tasks has already validated against a 0.03 so that looks hopeful for what Bernd is trying to achieve.
Cheers,
Gary.
Xeon X56.. @ 4 GHz, Windows
)
Xeon X56.. @ 4 GHz, Windows 7
Avg run time (5 concurrent):
v0.03 : 31951 s
v0.05 : 18923 s
v0.06 (reported 11 Apr 19:35 UTC) : 19352 s
v0.06 (reported 12 Apr 11:15 UTC) : 18227 s
In the app versions 0.04-0.06
)
In the app versions 0.04-0.06 part of the computation is sped up by using a "lookup table" for some usually slow computation (sinc()). This is less precise, but much faster. The difference between these app versions is the size of the table; we're looking for the size that gives enough precision for reliable validation. Speed between these App versions shouldn't vary at all, there's no difference in the code.
A doubling of speed is indeed pretty impressive, what I got from averaging over all hosts that ran both is only 20-25%. There may be outliers, we issued these app versions as "Beta Test" only due to the probably poor validation.
BM
I want to upgrade, on this
)
I want to upgrade, on this project, my GTX1060. I know AMD GPUs produce a lot more work on BRP tasks than Nvidea GPUs due to their double precision capabilities is that also true on the gravity wave apps? I'm thinking of something like an RX580. Any comments would be appreciated.
Thanx in advance.
Betreger wrote:I want to
)
I don't think DP performance was important, GPU memory throughput was. Of course I don't know about the upcoming applications. The GTX 1060 doesn't seem to require short term replacement and the new applications are still in development, why not just wait a bit? Maybe you get to see a direct comparison when the GPU tasks no longer need to be verified by CPUs. And I have prepared a host with both a GTX 750 and a HD 7750 that I plan to start when linux apps are there. The hardware is a bit old but I hope we still can learn something from it.
floyd wrote:And I have
)
They are both still quite nice for low power. I suspect the HD 7750 will do better, simply because it is better at OpenCl.
It would of course be nice if they come out with a CUDA version. I would wait and see too.
Betreger wrote:I know AMD
)
Binary Radio Pulsar searches (BRP) finished for discrete GPUs like yours a long time ago - I think you probably are referring to the current Gamma ray pulsar searches based on Fermi satellite data (FGRP). Double precision is only used for the last 10% (the follow up stage calculations) and as long as a GPU has some DP capability, the crunch time wont be significantly altered.
I agree with the advice given by the previous posters. I really do think your current GPU is fine and it would be wise to wait and see how the new search performs when the app is fully developed and mature. That could take a while and I wouldn't be surprised to see it be quite different to how the FGRPB1G app behaves. Best to make your GPU choice when you see the final performance.
From previous posts you have made, I get the impression that RAC is important to you. From the tiny bit of information available so far, switching to the GW GPU app is likely to lead to a significant reduction in RAC, even if there are no 'teething' problems with the development of the app. If the importance of the science (in your own opinion) is the paramount consideration, then by all means switch to the new search but at least wait until things settle down - unless you really do enjoy a bumpy ride :-).
My personal opinion is that whilst it's nice to be involved in finding previously unknown gamma ray pulsars, the holy grail will be the ultimate detection of continuous GW emissions. The huge spike (relatively speaking) from a BH-BH or NS-NS merger has become a bit 'mundane' these days (only joking :-) ) and is not something that E@H would ever be involved in detecting first. E@H is well positioned for the much harder detection of continuous emissions. They must be there - they just need to be found :-).
My current intention is to just wait for a Linux app and then to run just a host or two during the testing phase. Once it seems 'safe' (ie. a stable app, out of test status and not being changed at regular intervals) I would like to transition the whole fleet to a GW search (hopefully GPU based if that provides the most efficient use of hardware), whatever the RAC situation happened to be. It's nice to see a high RAC but it's insignificant in comparison to the prospect of finding continuous GW.
Cheers,
Gary.