Oh yes, the computers thermal output certainly helps keep the furnace from cycling so often. The only real issue is with the 970 card machine. It evidently is so heavily loaded with Einstein work that it prevents any new MilkyWay tasks to download. I am almost out. I couldn't figure out why the onboard task count was dropping from the default 160 tasks for two cards. The other 1070 machines are keeping their full load of MilkyWay tasks up. I finally found a log entry that said the reason for not downloading any new work was something along the lines of "not getting work - not highest GPU priority project". My resource allocation is 80% SETI and 10% MilkyWay and 10% Einstein. Evidently the combination of SETI and Einstein GPU work has maxed out the GPU workload on the 970 based machines. I hope that after I work off some of those 600+ Gamma Ray tasks that the machine will begin to pick up MilkyWay work again. The downside to that will be the same situation I ran into with the new Einstein work, the machine will go into exclusive MilkyWay running to equalize the project time deficit. I sure wish BOINC was a "set and forget" application. I am getting tired of having to babysit it all the time to get it to do what I want.
1.16 on Widoz is a great improvement over 1.15, but much slower than the Linux machines I've been paired with.
I've been doing distributed computing since 99 and this is the first instance I've seen where the OS has had a significant effect upon throughput. My GTX660 was processing about 80K per day on BRP6 beta, 60k with the switch over to BRP4, the way it now is I doubt if it will produce much more than 25k. I don't understand.
That is not much of a surprise. AMD's GPUs are much more weighted towards shader hardware which is good for compute jobs than fixed function graphics hardware. The reasons that Nvidia still gets a high market share for GPUs in compute are legacy CUDA programs that have not been converted to OpenCL still exist, Nvidia guarantees tighter accuracy tolerances than AMD does which only guarantees its accuracy to the looser OpenCL accuracy tolerances, and AMD's drivers have for a long time been crashing messes. (It seems that AMD is now making a much needed effort to clean up its drivers, so this hopefully has changed or will change.) The last consumer Nvidia GPUs that have significant double precision hardware support are the original GTX Titan and the GTX Titan Black Edition, which allow the user to enable full double precision support at the cost of limiting the GPU's clock speed due to the additional power required to support all of that double precision hardware.
Now this was interesting to find out. I've been running 'Multi-Directed Continuous Gravitational Wave search CV 1.00 (AVX)' tasks parallel with these FGRPB1G 1.16 GPU tasks in my Windows host with AMD card (host 12467404).
Running even a few of these CV 1.00 (AVX) tasks will cause a significant slow-down on these GPU tasks in this machine! I've used Options > Computing preferences... "use at most xx % of the CPUs" to limit the total amount of tasks running.
Running 1.16 (FGRPopencl-Beta-ati)2x +
4 CV's = 715 s
3 CV's = 675 s
1 CV = 648 s
0 CV's = 720 s ... wtf ??
edit: Now I see. When there is nothing but 2 GPU tasks running and with current BIOS settings this system will think there's no hurry... the load is so low that system starts down-clocking CPU to 2.3GHz and CPU speed keeps jumping widely. I can see that in CPU-Z.
When I resume 1 CPU task and total load is then 3/12 threads this system stops relaxing CPU clock speed and it stays constantly at the high limit. So... with current BIOS settings this system needs at least 1 CPU task alongside those 2 GPU tasks to operate at full speed!
I wonder if that trend with those completion times would be similar also if CPU tasks were something else than 'Multi-Directed Continuous Gravitational Wave search CV'. I'm going to test with 'Gamma-ray pulsar binary search #1 1.05 (FGRPSSE)'s :
Running 1.16 (FGRPopencl-Beta-ati) 2x +
1 FGRPSSE = 646 s ... (and system load was sufficient; CPU clock stayed constantly at the high limit)
2 FGRPSSE's = 650 s
3 FGRPSSE's = 652 s
4 FGRPSSE's = 665 s
5 FGRPSSE's = 678 s
6 FGRPSSE's = 711 s
Those times are not absolutely accurate, but that trend was clearly there.
My conclusions:
Those two CPU applications can have a notably different amount of pressure on a system running these GPU tasks. Running 'CV 1.00 (AVX)' CPU tasks in this machine will slow down these FGRPB1G 1.16 tasks in quite a steep curve per every additional CPU task. Running a few FGRPSSE tasks will result in much more gentle rise on the completion times.
First, you can disable the underclocking due to lack of work by going into your power options control panel in Windows and selecting the high performance power profile. Second, SIMD instructions like AVX instructions, while very efficient power wise when programmed well for the amount of work that they do, are power hogs because they do lots of work per instruction. Your CPU will hit its power or thermal limit and have to slow down to stay within the power and thermal limits when loaded with these instructions.
I run 3 packages in the same time on my Nvidia 1080 average 8 min and a half. Now i'm trying 4 packages getting around 11 min and 10 sec. Really happy with the performance.
Oh yes, the computers thermal
)
Oh yes, the computers thermal output certainly helps keep the furnace from cycling so often. The only real issue is with the 970 card machine. It evidently is so heavily loaded with Einstein work that it prevents any new MilkyWay tasks to download. I am almost out. I couldn't figure out why the onboard task count was dropping from the default 160 tasks for two cards. The other 1070 machines are keeping their full load of MilkyWay tasks up. I finally found a log entry that said the reason for not downloading any new work was something along the lines of "not getting work - not highest GPU priority project". My resource allocation is 80% SETI and 10% MilkyWay and 10% Einstein. Evidently the combination of SETI and Einstein GPU work has maxed out the GPU workload on the 970 based machines. I hope that after I work off some of those 600+ Gamma Ray tasks that the machine will begin to pick up MilkyWay work again. The downside to that will be the same situation I ran into with the new Einstein work, the machine will go into exclusive MilkyWay running to equalize the project time deficit. I sure wish BOINC was a "set and forget" application. I am getting tired of having to babysit it all the time to get it to do what I want.
Zalster wrote:I have over 900
)
I was downloading more tasks just a minute ago and saw a new kind of limiting message I've never seen before:
"Not requesting tasks: too many runnable tasks"
That host had 1009 tasks "in progress" at that point. Looks like power of that message overrides power of the daily quota.
Here's a fresh result for R9 270X :
AMD R9 270X (16.12.1), Intel X56xx @ 4GHz, running 2 tasks : 12min (~710 sec)
(AMD R9 270X (16.12.1), Intel X56xx @ 4GHz, running 1 task : 9min (~550 sec) )
Nice benefit from running 2x also with this AMD !
1.16 on Widoz is a great
)
1.16 on Widoz is a great improvement over 1.15, but much slower than the Linux machines I've been paired with.
I've been doing distributed computing since 99 and this is the first instance I've seen where the OS has had a significant effect upon throughput. My GTX660 was processing about 80K per day on BRP6 beta, 60k with the switch over to BRP4, the way it now is I doubt if it will produce much more than 25k. I don't understand.
That is not much of a
)
That is not much of a surprise. AMD's GPUs are much more weighted towards shader hardware which is good for compute jobs than fixed function graphics hardware. The reasons that Nvidia still gets a high market share for GPUs in compute are legacy CUDA programs that have not been converted to OpenCL still exist, Nvidia guarantees tighter accuracy tolerances than AMD does which only guarantees its accuracy to the looser OpenCL accuracy tolerances, and AMD's drivers have for a long time been crashing messes. (It seems that AMD is now making a much needed effort to clean up its drivers, so this hopefully has changed or will change.) The last consumer Nvidia GPUs that have significant double precision hardware support are the original GTX Titan and the GTX Titan Black Edition, which allow the user to enable full double precision support at the cost of limiting the GPU's clock speed due to the additional power required to support all of that double precision hardware.
Now this was interesting to
)
Now this was interesting to find out. I've been running 'Multi-Directed Continuous Gravitational Wave search CV 1.00 (AVX)' tasks parallel with these FGRPB1G 1.16 GPU tasks in my Windows host with AMD card (host 12467404).
Running even a few of these CV 1.00 (AVX) tasks will cause a significant slow-down on these GPU tasks in this machine! I've used Options > Computing preferences... "use at most xx % of the CPUs" to limit the total amount of tasks running.
Running 1.16 (FGRPopencl-Beta-ati) 2x +
4 CV's = 715 s
3 CV's = 675 s
1 CV = 648 s
0 CV's = 720 s ... wtf ??
edit: Now I see. When there is nothing but 2 GPU tasks running and with current BIOS settings this system will think there's no hurry... the load is so low that system starts down-clocking CPU to 2.3GHz and CPU speed keeps jumping widely. I can see that in CPU-Z.
When I resume 1 CPU task and total load is then 3/12 threads this system stops relaxing CPU clock speed and it stays constantly at the high limit. So... with current BIOS settings this system needs at least 1 CPU task alongside those 2 GPU tasks to operate at full speed!
I wonder if that trend with those completion times would be similar also if CPU tasks were something else than 'Multi-Directed Continuous Gravitational Wave search CV'. I'm going to test with 'Gamma-ray pulsar binary search #1 1.05 (FGRPSSE)'s :
Running 1.16 (FGRPopencl-Beta-ati) 2x +
1 FGRPSSE = 646 s ... (and system load was sufficient; CPU clock stayed constantly at the high limit)
2 FGRPSSE's = 650 s
3 FGRPSSE's = 652 s
4 FGRPSSE's = 665 s
5 FGRPSSE's = 678 s
6 FGRPSSE's = 711 s
Those times are not absolutely accurate, but that trend was clearly there.
My conclusions:
Those two CPU applications can have a notably different amount of pressure on a system running these GPU tasks. Running 'CV 1.00 (AVX)' CPU tasks in this machine will slow down these FGRPB1G 1.16 tasks in quite a steep curve per every additional CPU task. Running a few FGRPSSE tasks will result in much more gentle rise on the completion times.
First, you can disable the
)
First, you can disable the underclocking due to lack of work by going into your power options control panel in Windows and selecting the high performance power profile. Second, SIMD instructions like AVX instructions, while very efficient power wise when programmed well for the amount of work that they do, are power hogs because they do lots of work per instruction. Your CPU will hit its power or thermal limit and have to slow down to stay within the power and thermal limits when loaded with these instructions.
Jesse, thank you for that
)
Jesse, thank you for that information. High performance power profile sounds like a nice alternative (quicker than fiddling with BIOS).
I run 3 packages in the same
)
I run 3 packages in the same time on my Nvidia 1080 average 8 min and a half. Now i'm trying 4 packages getting around 11 min and 10 sec. Really happy with the performance.