Validation is better than we hoped for. So far we got 60 valid results, NO invalid, NO inconclusive.
We are still having problems building (actually linking) a "compatible" Linux App, i.e. one that does run on systems not identical to the one the App was built on. We keep working on it.
Our analysis basically consists of two parts, only one of which does so far run on the GPU, so quite a bit is still done on the CPU. Fortunately in the O1OD1 setup that part is rather small, and the total computation is largely dominated by the GPU part. The GPU app will reach its full potential when the other part of the computation is also done on the GPU. We are working on this, but it takes some more time.
... used about 16% of the GPU. FGRP usually uses about 95%.
A more interesting statistic would be crunch time as compared to what a CPU can do, and of course whether they will ultimately give a close enough result for validation. It would also be interesting to know if it's possible to run two GPU tasks concurrently and if that makes any improvement.
Your GPU is a GTX 1070 and the crunch time was approximately 5500s for the two returned tasks so far.
On one of my hosts (low end Athlon 200GE) a CPU task takes ~29,700s.
For a more capable CPU, the crunch time would be somewhat lower so a rough ball-park figure is that the GPU speedup is perhaps around 4x to 5x at the moment. That's reasonable for this early stage so hopefully down the track, some optimisations might be found that increase that factor to something closer to what it is for FGRPB1G.
No it is actually should not or only a little bit. 200GE is surely a low-end CPU, BUT it use same modern and fast CPU cores, cache and memory controller as current high-end CPUs. Just core/thread count is low (2/4 compared from 6/12 to 10/20 in current high-end CPUs). But since E@H CPU apps use only 1 thread per WU it is not affect tasks time. Sometimes its even improve run times! (If think it is due to less "competition" for RAM access - low end CPU usually have more RAM throughput on per core basis)
Just number of task running in parallel (and total throughput of course).
To compare result from high-end but OLD CPUs on current CPU GW tasks:
Phenom II X6 @ 3.3 GHz ~ 49 000 s per WU (up to 6 in parallel)
FX-8350 @ 4 Ghz ~ 37500 s (up to 8)
your Athlon 200GE ~ 29700 s (up to 4)
On the first batch of GPU ver a see very unstable total run time (may be even a bug with time accounting - for one WU got a total runtime < CPU time - it is not possible for single threaded app. There is it: https://einsteinathome.org/task/842693540 ) times about from 11 000 to 23 000 s and CPU times are more consistent about 6000 - 8800 s range
GPU load is in 15-25% range and barely heated (about 15-20 degree lower compared to FGRP)
And often GPU load drops from 15-20% to complete zero (also dropping GPU clocks to idle state like 0.15/0.3 Ghz) while load of CPU core assigned to WU jumps to ~90-100%. So i think some parts of the app still runs on CPU only in sequential order (CPU only part ==> CPU+GPU part ==> CPU only ...)
I have not tried to run multiple WUs on GPU yet. Will wait for validation result first. Or even next app ver.
P.S.
Still decent for very 1st GPU app iteration for GW.
Just for you, I resurrected an old Windows 7 machine with 2 1080Tis still on it. Have it set up and crunching. Changed the priority of the work units from below normal to high ( would have gone real time but lets not temp fate) CPU affinity set to only real threads. Using about 8% of the GPU. 99.33% of 1 core per work unit. I'll let these run for the next couple of days and see what the results are. Will have to wait until they complete to see how fast they crunch (just started them) 14% at 7 min 53 secs. Touch base later
That was a v0.08 task. Those had problems that were at least partially fixed already. Server is delivering now v0.11. If you have more v0.08 tasks in queue you could just abort them. I guess it's useless to run them anymore at this point.
If you use square brackets as part of your name, it confuses the quoting mechanism causing your full name to be truncated and that part of your name that comes after the closing square bracket to be included as part of what is being quoted. Check out Richie's reply to see what I'm talking about. To prevent this, I've edited those square brackets for this reply and replaced them with curly ones. You should also be able to use angle brackets or vertical bars if you don't want curly brackets :-).
The GW GPU app is very new and is going through a number of rapid iterations in order to solve teething problems. By accepting these tasks, you are effectively committing yourself to check regularly for reports on problems and for messages from the Devs about changes being made, and to act on that information. It tends to be a distraction to get further reports about things that have already been dealt with by the release of an updated app version.
Also, with test versions, you should expect to see strange, sub-optimal behaviour. It may be unhelpful to abort prematurely, just because you think, "...just increasing time while not using the GPU." It would be better to allow it to continue until it either fails on its own or completes. There is a maximum time limit for the computation so it will end via that mechanism if it really isn't making progress.
For the link you provided, by reading through the stderr output, the task does appear to have been making regular progress right up to when you aborted it. It seems to have been stopped and restarted from a checkpoint once or twice and in fact it looks like it was trying to restart when it was terminated. Another thing I noticed was that, early on, the number of sky points before a checkpoint was about 15 or 16 and then after the first time it was restarted, that number became larger and much more variable.
The restart itself, seems to have taken quite a bit of time to get going. The timestamp when the restart was announced was 10:22:00 but by the time the checkpoint had been read and calculations for sky point 306 had restarted it was 10:24:58 - virtually a full 3 mins to get started again. Perhaps your GPU is pretty slow?
I decided to see if your GPU had been crunching any FGRPB1G tasks. Yes, it had, and it was taking around 11,000s to 12,000 secs for those - extremely slow! From those results, I would suggest that your GPU is going to take an awfully long time to complete an O1OD1E GPU task. Until the app is further improved, you'd be best to avoid this new search for that GPU.
I have one task with the same problem as {AF>EDLS}GuL had. One of WU was sitting idle while occupying CPU и GPU slots (which idle too).
There is it https://einsteinathome.org/task/843085434
I have aborted it after ~5 hours without progress (stuck at few % done) while normal time to fully complete GW task on my hardware ~2-3 hours.
There are no any errors in logs too - WU was running as normal - then do nothing - all stopped including logging.
Before WU abort i have checked task manager - corresponding process(einstein_O1OD1E_0.11_windows_x86_64__GW-opencl-ati-V1.exe) was still in RAM. So it did not crash - just was sitting idle.
I just received some of these units for the first time, and after 7 hours of computing it is only 1% done.
They claim they need (.9 cpu + 1 amd gpu). The problem comes in that since they are not asking for a whole core, they were not getting enough cpu power to run properly. I went in to boinc and told it to only use 3 cores, thereby allowing one free for these units. Computing has really sped up.
I never had a problem running gamma ray pulsar search as the WU call for a whole core.
I hate disabling a whole core as that means only 2 will be crunching if a gamma ray unit is being worked on.
Any way, thought i'd share that on a rx460 they need a whole core to properly crunch.
Yeah, better to assign whole CPU core to it. Otherwise stupid BOINC scheduler will screw up something. Like running 5 CPU WUs on 4 core machine. Or 4 while allowed to use only 3. In BOIN logic 4.9 ~= 4, 3.9 ~=3 and so on. It round down only.
A have already done this via app-config (app_config.xml in project folder - einstein.phys.uwm.edu). There is example if you need it:
I just received some of these units for the first time, and after 7 hours of computing it is only 1% done.
They claim they need (.9 cpu + 1 amd gpu).
There are a couple of reports of very slow progress like this. I run Linux so have not seen any of these new GPU tasks. I was unaware of the default resource allocation settings so thanks for adding that information. The 0.9 CPU is very unfortunate, since there are probably quite a few people using all the CPU cores for CPU tasks who decide to give the new GPU app a run. Exactly the same will happen to them and they may not understand the workaround they could use (exactly what you've done) until the situation is rectified. The Devs really need to change the 0.9 to a full core.
waffleironhead wrote:
The problem comes in that since they are not asking for a whole core, they were not getting enough cpu power to run properly. I went in to boinc and told it to only use 3 cores, thereby allowing one free for these units. Computing has really sped up.
Good catch! The problem is that two separate apps (CPU and GPU) were making heavy demands for the one core so something had to give. Your workaround gives the same outcome as using the suggested app_config.xml, without having to generate that file :-). It's also the same outcome as you would get if the default was 1.0 rather than 0.9. I suspect that when the Devs see this report, the default will be changed to 1.0, at which point you can remove your added BOINC cores restriction.
There is another thing you could try which might give a further improvement. If you were to change the GPU utilization factor for these new tasks to 0.5 GPUs and thereby run 2 GPU tasks concurrently, BOINC would reserve 1.8 cores (ie. effectively only 1) and you could set the number of cores used by BOINC back to 100%. You will be running 3 CPU tasks and 2 GPU tasks (sharing a CPU core) and it would be very interesting to see if that happened to be sufficient so that the 2 GPU tasks didn't slow to a crawl like before. I suspect you might find that the GPU tasks still process fast enough to give you a higher throughput.
I'd love to try this but unfortunately there's no Linux app yet :-).
Validation ist better than we
)
Validation is better than we hoped for. So far we got 60 valid results, NO invalid, NO inconclusive.
We are still having problems building (actually linking) a "compatible" Linux App, i.e. one that does run on systems not identical to the one the App was built on. We keep working on it.
Our analysis basically consists of two parts, only one of which does so far run on the GPU, so quite a bit is still done on the CPU. Fortunately in the O1OD1 setup that part is rather small, and the total computation is largely dominated by the GPU part. The GPU app will reach its full potential when the other part of the computation is also done on the GPU. We are working on this, but it takes some more time.
BM
Gary Roberts wrote:tolafoph
)
No it is actually should not or only a little bit. 200GE is surely a low-end CPU, BUT it use same modern and fast CPU cores, cache and memory controller as current high-end CPUs. Just core/thread count is low (2/4 compared from 6/12 to 10/20 in current high-end CPUs). But since E@H CPU apps use only 1 thread per WU it is not affect tasks time. Sometimes its even improve run times! (If think it is due to less "competition" for RAM access - low end CPU usually have more RAM throughput on per core basis)
Just number of task running in parallel (and total throughput of course).
To compare result from high-end but OLD CPUs on current CPU GW tasks:
Phenom II X6 @ 3.3 GHz ~ 49 000 s per WU (up to 6 in parallel)
FX-8350 @ 4 Ghz ~ 37500 s (up to 8)
your Athlon 200GE ~ 29700 s (up to 4)
On the first batch of GPU ver a see very unstable total run time (may be even a bug with time accounting - for one WU got a total runtime < CPU time - it is not possible for single threaded app. There is it: https://einsteinathome.org/task/842693540 ) times about from 11 000 to 23 000 s and CPU times are more consistent about 6000 - 8800 s range
GPU load is in 15-25% range and barely heated (about 15-20 degree lower compared to FGRP)
And often GPU load drops from 15-20% to complete zero (also dropping GPU clocks to idle state like 0.15/0.3 Ghz) while load of CPU core assigned to WU jumps to ~90-100%. So i think some parts of the app still runs on CPU only in sequential order (CPU only part ==> CPU+GPU part ==> CPU only ...)
I have not tried to run multiple WUs on GPU yet. Will wait for validation result first. Or even next app ver.
P.S.
Still decent for very 1st GPU app iteration for GW.
Ok Bernd, Just for you,
)
Ok Bernd,
Just for you, I resurrected an old Windows 7 machine with 2 1080Tis still on it. Have it set up and crunching. Changed the priority of the work units from below normal to high ( would have gone real time but lets not temp fate) CPU affinity set to only real threads. Using about 8% of the GPU. 99.33% of 1 core per work unit. I'll let these run for the next couple of days and see what the results are. Will have to wait until they complete to see how fast they crunch (just started them) 14% at 7 min 53 secs. Touch base later
Z
Hello, I have problems with
)
Hello, I have problems with the LIGO application on GPU. Some tasks are just increasing time while not using the GPU. I need to manually abort them (https://einsteinathome.org/fr/task/838951630). Other tasks get an access violation (https://einsteinathome.org/fr/task/838792785).
Thanks for your help
[AF>EDLS wrote:GuL]Other
)
That was a v0.08 task. Those had problems that were at least partially fixed already. Server is delivering now v0.11. If you have more v0.08 tasks in queue you could just abort them. I guess it's useless to run them anymore at this point.
{AF>EDLS}GuL wrote:Hello, I
)
A couple of things for your information.
If you use square brackets as part of your name, it confuses the quoting mechanism causing your full name to be truncated and that part of your name that comes after the closing square bracket to be included as part of what is being quoted. Check out Richie's reply to see what I'm talking about. To prevent this, I've edited those square brackets for this reply and replaced them with curly ones. You should also be able to use angle brackets or vertical bars if you don't want curly brackets :-).
The GW GPU app is very new and is going through a number of rapid iterations in order to solve teething problems. By accepting these tasks, you are effectively committing yourself to check regularly for reports on problems and for messages from the Devs about changes being made, and to act on that information. It tends to be a distraction to get further reports about things that have already been dealt with by the release of an updated app version.
Also, with test versions, you should expect to see strange, sub-optimal behaviour. It may be unhelpful to abort prematurely, just because you think, "...just increasing time while not using the GPU." It would be better to allow it to continue until it either fails on its own or completes. There is a maximum time limit for the computation so it will end via that mechanism if it really isn't making progress.
For the link you provided, by reading through the stderr output, the task does appear to have been making regular progress right up to when you aborted it. It seems to have been stopped and restarted from a checkpoint once or twice and in fact it looks like it was trying to restart when it was terminated. Another thing I noticed was that, early on, the number of sky points before a checkpoint was about 15 or 16 and then after the first time it was restarted, that number became larger and much more variable.
The restart itself, seems to have taken quite a bit of time to get going. The timestamp when the restart was announced was 10:22:00 but by the time the checkpoint had been read and calculations for sky point 306 had restarted it was 10:24:58 - virtually a full 3 mins to get started again. Perhaps your GPU is pretty slow?
I decided to see if your GPU had been crunching any FGRPB1G tasks. Yes, it had, and it was taking around 11,000s to 12,000 secs for those - extremely slow! From those results, I would suggest that your GPU is going to take an awfully long time to complete an O1OD1E GPU task. Until the app is further improved, you'd be best to avoid this new search for that GPU.
Cheers,
Gary.
I have one task with the same
)
I have one task with the same problem as {AF>EDLS}GuL had. One of WU was sitting idle while occupying CPU и GPU slots (which idle too).
There is it https://einsteinathome.org/task/843085434
I have aborted it after ~5 hours without progress (stuck at few % done) while normal time to fully complete GW task on my hardware ~2-3 hours.
There are no any errors in logs too - WU was running as normal - then do nothing - all stopped including logging.
Before WU abort i have checked task manager - corresponding process(einstein_O1OD1E_0.11_windows_x86_64__GW-opencl-ati-V1.exe) was still in RAM. So it did not crash - just was sitting idle.
I just received some of these
)
I just received some of these units for the first time, and after 7 hours of computing it is only 1% done.
They claim they need (.9 cpu + 1 amd gpu). The problem comes in that since they are not asking for a whole core, they were not getting enough cpu power to run properly. I went in to boinc and told it to only use 3 cores, thereby allowing one free for these units. Computing has really sped up.
I never had a problem running gamma ray pulsar search as the WU call for a whole core.
I hate disabling a whole core as that means only 2 will be crunching if a gamma ray unit is being worked on.
Any way, thought i'd share that on a rx460 they need a whole core to properly crunch.
Yeah, better to assign whole
)
Yeah, better to assign whole CPU core to it. Otherwise stupid BOINC scheduler will screw up something. Like running 5 CPU WUs on 4 core machine. Or 4 while allowed to use only 3. In BOIN logic 4.9 ~= 4, 3.9 ~=3 and so on. It round down only.
A have already done this via app-config (app_config.xml in project folder - einstein.phys.uwm.edu). There is example if you need it:
waffleironhead wrote:I just
)
There are a couple of reports of very slow progress like this. I run Linux so have not seen any of these new GPU tasks. I was unaware of the default resource allocation settings so thanks for adding that information. The 0.9 CPU is very unfortunate, since there are probably quite a few people using all the CPU cores for CPU tasks who decide to give the new GPU app a run. Exactly the same will happen to them and they may not understand the workaround they could use (exactly what you've done) until the situation is rectified. The Devs really need to change the 0.9 to a full core.
Good catch! The problem is that two separate apps (CPU and GPU) were making heavy demands for the one core so something had to give. Your workaround gives the same outcome as using the suggested app_config.xml, without having to generate that file :-). It's also the same outcome as you would get if the default was 1.0 rather than 0.9. I suspect that when the Devs see this report, the default will be changed to 1.0, at which point you can remove your added BOINC cores restriction.
There is another thing you could try which might give a further improvement. If you were to change the GPU utilization factor for these new tasks to 0.5 GPUs and thereby run 2 GPU tasks concurrently, BOINC would reserve 1.8 cores (ie. effectively only 1) and you could set the number of cores used by BOINC back to 100%. You will be running 3 CPU tasks and 2 GPU tasks (sharing a CPU core) and it would be very interesting to see if that happened to be sufficient so that the 2 GPU tasks didn't slow to a crawl like before. I suspect you might find that the GPU tasks still process fast enough to give you a higher throughput.
I'd love to try this but unfortunately there's no Linux app yet :-).
Cheers,
Gary.