My 980 Ti cards in Win 10 went from 7850 seconds per task with version 1.15 to 350 seconds per task with version 1.16.
Edit:
2x tasks are running at approximately 480 seconds each. I tested with stock GPU memory frequency of 3304 MHz in P2 and 3800 MHz in P2 overclocked but the higher memory frequency made little difference in performance. The difference was a few seconds in runtime.
There is a problem with this app. My computer laggs with this app (I5-4460, Nvidia GTX 750, Win8).
Taskmanager shows one full core for the Einstein GRPBS-1-GPU app, roughly as expected.
However since the computer lagged i looked which task was also active:
There was a task with name "system" active with about 11% which on my computer is close to half a core.
This tasks went to zero % usage at the time one of the GRPBS-1-GPU-Task ended.
As the next GRPBS-1-GPU-Task started this "system"-task was up again.
Therefore GRPBS-1-GPU-Task practically overall use up to 1.5 cores, which in the long run is unmanageable.
I honestly doubt that there are enough 32Bit GPU systems attached to E@H that it makes sense to invest a lot of time in porting the application to use a different FFT library. But we'll try to dig through our DB to find out.
When the FGRP GPU app is a bit more established, we could possibly limit BRP4G to the machines that can't run FGRP; such that when we get new Arecibo data, these machines get work again and it will last a bit longer. However it's not clear if we will get more Arecibo data at all, and when.
I don''t know how the maximum jobs allowed to be downloaded per day is computed, but I have a system which is currently waiting until tomorrow UTC to request more work because it already hit the 640 task download limit for today for that system.
A little arithmetic on the elapsed times for 1.16 work on that system suggests that it is capable of about 600 tasks/day of this flavor at my current settings and tuning.
So while in principal I'm OK, a slightly faster system that earned the same limit would be starved. Unless there is a near-term plan to gather multiple of the actual tasks into a single BOINC task, perhaps a moderate limit relaxation could be considered.
[Note on calculation--which is approximate. The system has one GTX 1070 card which is currently running 2X, and giving elapsed times of a bit over 8 minutes. Rounding down to 8, that gives 360 tasks/day for that card. The 6GB GTX 1060 card is averaging a little under 12 minutes. Rounding up to 12 gives 240 tasks/day for that card.]
OK - MAC Units, (1.13 FGRP), still crunching away and completing TWO at a time on TWO EVGA GTX-750TI SC cards. Completion times running 2 at a time is under 36 Min.
Win XP Pro x64 now on 1.16 FGRP Units, (all Arecibo BRP4G Units now done), and crunching on EVGA GTX-760. Completion times 1 Hr and 2 Min. to 1 Hr and 3 Min. respectively, crunching TWO at a time. MUCH improved over the 1.15 Units.
In close, between my two systems, I'm crunching 6 Total Units at a time on three GPUs. Systems seem stable, GPU Fan noise is minimal. NO Fan Control on MAC; but, Precision X is running on Win XP Pro x64.
Both systems run 15 Hrs per day, from 6 AM to 9 PM - PST; in summer from 6 PM to 9 AM - PDT.
TL
[EDIT:]
MAC should start on the 1.14 Units tonight before 9 PM; or some time tomorrow morning.
TL
[EDIT 2:]
Still getting Invalids on the MAC due to OpenCL Bug. Now up to 4 Invalids. Prior to FGRP work, MAC was on BRP6 work and NEVER had an Invalid, nor an Inconclusive.
So while in principal I'm OK, a slightly faster system that earned the same limit would be starved. Unless there is a near-term plan to gather multiple of the actual tasks into a single BOINC task, perhaps a moderate limit relaxation could be considered.
The current tasks are still the ones sized for testing, in particular to be validated with CPU jobs, so meant to finish in reasonable time with CPU Apps as well. The next type of workunits that we will probably issue on Monday will run quite a bit longer. These should also have a better ratio of CPU vs. GPU utilization.
I don''t know how the maximum jobs allowed to be downloaded per day is computed, but I have a system which is currently waiting until tomorrow UTC to request more work because it already hit the 640 task download limit for today for that system.
A little arithmetic on the elapsed times for 1.16 work on that system suggests that it is capable of about 600 tasks/day of this flavor at my current settings and tuning.
So while in principal I'm OK, a slightly faster system that earned the same limit would be starved. Unless there is a near-term plan to gather multiple of the actual tasks into a single BOINC task, perhaps a moderate limit relaxation could be considered.
Last i check (though it was long ago, so it may be no longer relevant) daily quota was dynamic and calculated based on number of CPU cores and GPUs specific host have (also quota is lowered with every failed WUs, and restored with validated WUs). So for faster system daily quota will be higher.
It seems that the wus duration has increased since yesterday, is it correct?
Not systematically, i.e. in the "size" of the workunits under control by us.
However, the duration of the last part of the computation is data-dependent. If your GPU isn't capable of double precision computation, this part is done on the CPU and will have a noticeable contribution to the overall runtime.
My 980 Ti cards in Win 10
)
My 980 Ti cards in Win 10 went from 7850 seconds per task with version 1.15 to 350 seconds per task with version 1.16.
Edit:
2x tasks are running at approximately 480 seconds each. I tested with stock GPU memory frequency of 3304 MHz in P2 and 3800 MHz in P2 overclocked but the higher memory frequency made little difference in performance. The difference was a few seconds in runtime.
So, no GPU work for all the
)
So, no GPU work for all the sistems running 32 bits? Like mine. There is probably a lot of people out there still using 32bits OS.
There is a problem with this
)
There is a problem with this app. My computer laggs with this app (I5-4460, Nvidia GTX 750, Win8).
Taskmanager shows one full core for the Einstein GRPBS-1-GPU app, roughly as expected.
However since the computer lagged i looked which task was also active:
There was a task with name "system" active with about 11% which on my computer is close to half a core.
This tasks went to zero % usage at the time one of the GRPBS-1-GPU-Task ended.
As the next GRPBS-1-GPU-Task started this "system"-task was up again.
Therefore GRPBS-1-GPU-Task practically overall use up to 1.5 cores, which in the long run is unmanageable.
Any solutions?
I honestly doubt that there
)
I honestly doubt that there are enough 32Bit GPU systems attached to E@H that it makes sense to invest a lot of time in porting the application to use a different FFT library. But we'll try to dig through our DB to find out.
When the FGRP GPU app is a bit more established, we could possibly limit BRP4G to the machines that can't run FGRP; such that when we get new Arecibo data, these machines get work again and it will last a bit longer. However it's not clear if we will get more Arecibo data at all, and when.
BM
I don''t know how the maximum
)
I don''t know how the maximum jobs allowed to be downloaded per day is computed, but I have a system which is currently waiting until tomorrow UTC to request more work because it already hit the 640 task download limit for today for that system.
A little arithmetic on the elapsed times for 1.16 work on that system suggests that it is capable of about 600 tasks/day of this flavor at my current settings and tuning.
So while in principal I'm OK, a slightly faster system that earned the same limit would be starved. Unless there is a near-term plan to gather multiple of the actual tasks into a single BOINC task, perhaps a moderate limit relaxation could be considered.
[Note on calculation--which is approximate. The system has one GTX 1070 card which is currently running 2X, and giving elapsed times of a bit over 8 minutes. Rounding down to 8, that gives 360 tasks/day for that card. The 6GB GTX 1060 card is averaging a little under 12 minutes. Rounding up to 12 gives 240 tasks/day for that card.]
[Update:]OK - MAC Units,
)
[Update:]
OK - MAC Units, (1.13 FGRP), still crunching away and completing TWO at a time on TWO EVGA GTX-750TI SC cards. Completion times running 2 at a time is under 36 Min.
Win XP Pro x64 now on 1.16 FGRP Units, (all Arecibo BRP4G Units now done), and crunching on EVGA GTX-760. Completion times 1 Hr and 2 Min. to 1 Hr and 3 Min. respectively, crunching TWO at a time. MUCH improved over the 1.15 Units.
In close, between my two systems, I'm crunching 6 Total Units at a time on three GPUs. Systems seem stable, GPU Fan noise is minimal. NO Fan Control on MAC; but, Precision X is running on Win XP Pro x64.
Both systems run 15 Hrs per day, from 6 AM to 9 PM - PST; in summer from 6 PM to 9 AM - PDT.
TL
[EDIT:]
MAC should start on the 1.14 Units tonight before 9 PM; or some time tomorrow morning.
TL
[EDIT 2:]
Still getting Invalids on the MAC due to OpenCL Bug. Now up to 4 Invalids. Prior to FGRP work, MAC was on BRP6 work and NEVER had an Invalid, nor an Inconclusive.
I will keep monitoring.
TL
TimeLord04
Have TARDIS, will travel...
Come along K-9!
Join SETI Refugees
So while in principal I'm OK,
)
The current tasks are still the ones sized for testing, in particular to be validated with CPU jobs, so meant to finish in reasonable time with CPU Apps as well. The next type of workunits that we will probably issue on Monday will run quite a bit longer. These should also have a better ratio of CPU vs. GPU utilization.
BM
It seems that the wus
)
It seems that the wus duration has increased since yesterday, is it correct?
archae86 wrote:I don''t know
)
Last i check (though it was long ago, so it may be no longer relevant) daily quota was dynamic and calculated based on number of CPU cores and GPUs specific host have (also quota is lowered with every failed WUs, and restored with validated WUs). So for faster system daily quota will be higher.
Trotador wrote:It seems that
)
Not systematically, i.e. in the "size" of the workunits under control by us.
However, the duration of the last part of the computation is data-dependent. If your GPU isn't capable of double precision computation, this part is done on the CPU and will have a noticeable contribution to the overall runtime.
BM