We have finally begun to automatically deliver CUDA work & applications (plan class "ABP1cuda23") to machines that satisfy the following requirements:
- enabled NVIDIA GPU work in Einstein@home preferences
- NVidia GPU with at least 450MB of free memory
- Display Driver version 190.38 (&up), i.e. CUDA 2.3 capability
- BOINC Core Client version 6.10 (&up)
Do you have any idea of what kind of "Linows" or "Windux" platforms that can accept to run the Einstein CUDA 23 ?
Now got my first (and probably last) ABP1 3.13 "CUDA" WU finished in over 10 hours time on a Q9650/GeForce GTX260 were CPU time was 8.3 hours and GPU time around 2 hours. This means more than 8 hours of wasted GPU time! Does that make any sense?
My last two ABP1 3.12 CPU only WUs took less than 5 hours on a Q9650.
Are the new WUs more complex or longer than the old ones or is this just another bad joke?
Now got my first (and probably last) ABP1 3.13 "CUDA" WU finished in over 10 hours time on a Q9650/GeForce GTX260 were CPU time was 8.3 hours and GPU time around 2 hours. This means more than 8 hours of wasted GPU time! Does that make any sense?
You don't have any CUDA tasks on your Q9650. The list of tasks for that machine shows 3 completed ABP1 tasks, all of which took around 17k secs and none of which used a GPU for crunching. I decided to look at your other machines and I found the GPU crunched ABP1 task on your Pentium D. There are no other ABP1 tasks still showing on that machine so there's no ability to do a comparison. Here is the list of tasks for your pentium D with the GPU crunched ABP1 task at the top. It's interesting to note that it took much the same time to crunch the ABP1 task (250 credits) as the two previous GW tasks (136 credits) - nearly double the credits for a tiny bit more crunch time.
What is even more interesting is the apparent dramatic slowdown after Nov 20. The three earlier GW tasks took around 11K secs each while the two after this date took 27K and 29k secs respectively. Now there is variability in the GW crunch times but there is usually a variation in credits to compensate - at least partially. Since all GW tasks were awarded the same credit, it's unusual to see such a huge variation in crunch time. Can you think of anything that might have happened to your machine after Nov 20? Something drastic like halving the CPU frequency might do it :-).
Quote:
My last two ABP1 3.12 CPU only WUs took less than 5 hours on a Q9650.
It's not really fair to compare a Pentium D to a Q9650 :-).
Quote:
Are the new WUs more complex or longer than the old ones or is this just another bad joke?
There aren't any new tasks - just the same old tasks being crunched with a new program which is (performance-wise) much the same as the beta test app it replaces.
It might help if you realise that just because one project can make hugely efficient use of a GPUs parallelism, other projects may struggle to do anything like the same even after considerable effort has been expended. You might take that into account when firing off your criticism.
It's normal that the GPU temp will not rise drastically when running this version of the CUDA app, the same was seen during the beta tests. This is because the app still makes heavy use of the CPU for certain parts of the computatations.
This doesn't exclude a speedup of the app by using the GPU:
Let's assume (it's just a simple example) that an app does computations consisting of two parts, A and B, where A has to be executed before B can start. Let's assume that on a CPU, A and B each take 50% of the runtime.
Now assume that only part B can easily be ported to GPU-code, resulting in a (say) 25 fold speedup for this part B. A still has to be done on the CPU.
The result: If the total runtime was 1000 sec before, it is now 520 s, almost doubling the performance. Only 20 seconds of these 520 seconds will be spent in the GPU, or below 4 %. So even small load factors on the GPU can result in reasonable speedups.
Yes, it would be nicer if parts A and B in our example could be done on the GPU for a total speedup of (say) 25-fold performance, but that might be not so easy. Some types of computation lend themselves more easily to parallelization than others.
It's normal that the GPU temp will not rise drastically when running this version of the CUDA app, the same was seen during the beta tests. This is because the app still makes heavy use of the CPU for certain parts of the computatations.
This doesn't exclude a speedup of the app by using the GPU:
Let's assume (it's just a simple example) that an app does computations consisting of two parts, A and B, where A has to be executed before B can start. Let's assume that on a CPU, A and B each take 50% of the runtime.
Now assume that only part B can easily be ported to GPU-code, resulting in a (say) 25 fold speedup for this part B. A still has to be done on the CPU.
The result: If the total runtime was 1000 sec before, it is now 520 s, almost doubling the performance. Only 20 seconds of these 520 seconds will be spent in the GPU, or below 4 %. So even small load factors on the GPU can result in reasonable speedups.
Yes, it would be nicer if parts A and B in our example could be done on the GPU for a total speedup of (say) 25-fold performance, but that might be not so easy. Some types of computation lend themselves more easily to parallelization than others.
CU
Bikeman
Are this the actual numbers for ABP1cuda23 ?
If not, could you post the actual speedup on your machine between the GPU and CPU version?
I find that very bad that it needs a 100% core and that you set cuda on Enable as default.
I understand and we tried not to enable the CUDA app by default. Unfortunately that would have involved a change in the BOINC core client code which is not under our direct control. Please note that this is one of the reasons why we set quite a few minimum requirements. This way the number of volunteers who receive CUDA work is as limited as possible.
WRT the efficiency of the current implementation: we are working on a number of improvements. The CPU part of the radio pulsar search received quite a few changes that will not only benefit the CPU-only application but also the CUDA version, thereby moving the computational ratio towards the GPU. These changes will be released as a new application called "ABP2" - probably in the next 1-2 weeks. In parallel to that we are currently working hard to move the remaining CPU part of the CUDA version more or less completely to the GPU.
Please note that even today the CUDA app wouldn't actually require a full CPU. However, as soon as you tell the client you use less than 100% it doesn't renice the process (reduce it's priority) anymore. From our point of view it's better to have the process claiming one CPU at the lowest priority than using, say, 60% at normal priority.
Are this the actual numbers for ABP1cuda23 ?
If not, could you post the actual speedup on your machine between the GPU and CPU version?
During our final tests last week we observed up to 100% speedup for the Windows version and about 20% for the Linux version. The reason for this rather huge difference is that the CPU parts (as for the CPU-only version) are still faster on Linux than on Windows. As always, your mileage may very depending on the GPU and CPU (Intel vs. AMD) used.
RE: We have finally begun
)
RE: Do you have any idea of
)
Not sure what you mean by that, but their standard app is working fine on this Mandriva 2010.0 Linux system.
The GPU temps suggest that there is not much GPU utilisation, but then again this is their first attempt.
Happy crunchin',
Martin
See new freedom: Mageia Linux
Take a look for yourself: Linux Format
The Future is what We all make IT (GPLv3)
Now got my first (and
)
Now got my first (and probably last) ABP1 3.13 "CUDA" WU finished in over 10 hours time on a Q9650/GeForce GTX260 were CPU time was 8.3 hours and GPU time around 2 hours. This means more than 8 hours of wasted GPU time! Does that make any sense?
My last two ABP1 3.12 CPU only WUs took less than 5 hours on a Q9650.
Are the new WUs more complex or longer than the old ones or is this just another bad joke?
RE: Now got my first (and
)
You don't have any CUDA tasks on your Q9650. The list of tasks for that machine shows 3 completed ABP1 tasks, all of which took around 17k secs and none of which used a GPU for crunching. I decided to look at your other machines and I found the GPU crunched ABP1 task on your Pentium D. There are no other ABP1 tasks still showing on that machine so there's no ability to do a comparison. Here is the list of tasks for your pentium D with the GPU crunched ABP1 task at the top. It's interesting to note that it took much the same time to crunch the ABP1 task (250 credits) as the two previous GW tasks (136 credits) - nearly double the credits for a tiny bit more crunch time.
What is even more interesting is the apparent dramatic slowdown after Nov 20. The three earlier GW tasks took around 11K secs each while the two after this date took 27K and 29k secs respectively. Now there is variability in the GW crunch times but there is usually a variation in credits to compensate - at least partially. Since all GW tasks were awarded the same credit, it's unusual to see such a huge variation in crunch time. Can you think of anything that might have happened to your machine after Nov 20? Something drastic like halving the CPU frequency might do it :-).
It's not really fair to compare a Pentium D to a Q9650 :-).
There aren't any new tasks - just the same old tasks being crunched with a new program which is (performance-wise) much the same as the beta test app it replaces.
It might help if you realise that just because one project can make hugely efficient use of a GPUs parallelism, other projects may struggle to do anything like the same even after considerable effort has been expended. You might take that into account when firing off your criticism.
Cheers,
Gary.
Hello I was happy to look APP
)
Hello
I was happy to look APP Cuda optimisation in my boinc ...
(config : Q9950 + 8800gt)
But there is maybe a problem.
My GPU iddle temperature is 48-49 deg ...
And during cruching process Gpu temp is same : 48-49 deg
So My gpu doesn't seem to be used.
Best Regards.
RE: So My gpu doesn't
)
It's normal that the GPU temp will not rise drastically when running this version of the CUDA app, the same was seen during the beta tests. This is because the app still makes heavy use of the CPU for certain parts of the computatations.
This doesn't exclude a speedup of the app by using the GPU:
Let's assume (it's just a simple example) that an app does computations consisting of two parts, A and B, where A has to be executed before B can start. Let's assume that on a CPU, A and B each take 50% of the runtime.
Now assume that only part B can easily be ported to GPU-code, resulting in a (say) 25 fold speedup for this part B. A still has to be done on the CPU.
The result: If the total runtime was 1000 sec before, it is now 520 s, almost doubling the performance. Only 20 seconds of these 520 seconds will be spent in the GPU, or below 4 %. So even small load factors on the GPU can result in reasonable speedups.
Yes, it would be nicer if parts A and B in our example could be done on the GPU for a total speedup of (say) 25-fold performance, but that might be not so easy. Some types of computation lend themselves more easily to parallelization than others.
CU
Bikeman
Hi all, Question for you,
)
Hi all,
Question for you, if these are CUDA WU, why is my CPU crunching them and my GPU is just sitting idle?
11/26/2009 11:18:21 PM Starting BOINC client version 6.10.18 for windows_intelx86
11/26/2009 11:18:21 PM log flags: file_xfer, sched_ops, task
11/26/2009 11:18:21 PM Libraries: libcurl/7.19.4 OpenSSL/0.9.8l zlib/1.2.3
11/26/2009 11:18:21 PM Data directory: C:\Documents and Settings\All Users\Application Data\BOINC
11/26/2009 11:18:21 PM Running under account User
11/26/2009 11:18:21 PM Processor: 2 GenuineIntel Intel(R) Pentium(R) 4 CPU 3.20GHz [x86 Family 15 Model 4 Stepping 1]
11/26/2009 11:18:21 PM Processor: 1.00 MB cache
11/26/2009 11:18:21 PM Processor features: fpu tsc sse sse2 mmx
11/26/2009 11:18:21 PM OS: Microsoft Windows XP: Professional x86 Edition, Service Pack 3, (05.01.2600.00)
11/26/2009 11:18:21 PM Memory: 1.50 GB physical, 2.85 GB virtual
11/26/2009 11:18:21 PM Disk: 179.31 GB total, 95.15 GB free
11/26/2009 11:18:21 PM Local time is UTC -8 hours
11/26/2009 11:18:21 PM NVIDIA GPU 0: GeForce GTX 260 (driver version 19107, CUDA version 2030, compute capability 1.3, 896MB, 675 GFLOPS peak)
11/26/2009 11:18:21 PM Not using a proxy
11/26/2009 11:18:21 PM Einstein@Home URL http://einstein.phys.uwm.edu/; Computer ID 2063582; resource share 100
11/26/2009 11:18:21 PM SETI@home URL http://setiathome.berkeley.edu/; Computer ID 2378854; resource share 100
11/26/2009 11:18:21 PM SETI@home General prefs: from SETI@home (last modified 04-Mar-2009 23:03:00)
11/26/2009 11:18:21 PM SETI@home Computer location: home
11/26/2009 11:18:21 PM SETI@home General prefs: no separate prefs for home; using your defaults
11/26/2009 11:18:21 PM Preferences limit memory usage when active to 767.36MB
11/26/2009 11:18:21 PM Preferences limit memory usage when idle to 1381.25MB
11/26/2009 11:18:21 PM Preferences limit disk usage to 89.65GB
11/26/2009 11:18:22 PM Einstein@Home Restarting task p2030_54471_60586_0034_G46.39-00.47.S_1.dm_499_1 using einsteinbinary_ABP1 version 313
11/26/2009 11:18:22 PM Einstein@Home Restarting task h1_1085.60_S5R4__1050_S5R6a_1 using einstein_S5R6 version 301
Thanks!
BDDave
RE: RE: So My gpu
)
Are this the actual numbers for ABP1cuda23 ?
If not, could you post the actual speedup on your machine between the GPU and CPU version?
RE: I find that very bad
)
I understand and we tried not to enable the CUDA app by default. Unfortunately that would have involved a change in the BOINC core client code which is not under our direct control. Please note that this is one of the reasons why we set quite a few minimum requirements. This way the number of volunteers who receive CUDA work is as limited as possible.
WRT the efficiency of the current implementation: we are working on a number of improvements. The CPU part of the radio pulsar search received quite a few changes that will not only benefit the CPU-only application but also the CUDA version, thereby moving the computational ratio towards the GPU. These changes will be released as a new application called "ABP2" - probably in the next 1-2 weeks. In parallel to that we are currently working hard to move the remaining CPU part of the CUDA version more or less completely to the GPU.
Please note that even today the CUDA app wouldn't actually require a full CPU. However, as soon as you tell the client you use less than 100% it doesn't renice the process (reduce it's priority) anymore. From our point of view it's better to have the process claiming one CPU at the lowest priority than using, say, 60% at normal priority.
Hope this gives a small insight...
Oliver
Einstein@Home Project
RE: Are this the actual
)
During our final tests last week we observed up to 100% speedup for the Windows version and about 20% for the Linux version. The reason for this rather huge difference is that the CPU parts (as for the CPU-only version) are still faster on Linux than on Windows. As always, your mileage may very depending on the GPU and CPU (Intel vs. AMD) used.
Cheers,
Oliver
Einstein@Home Project