In my computer CUDA work is done by a GeForce GTS 450 graphics card. Does it make sense to have it working on more than one CUDA task (BRPS) at a time? Are 192 CUDA cores enough? The actual GPU usage by a single BRPS task is ~70%.
Gruß
Heinrich
Copyright © 2024 Einstein@Home. All rights reserved.
GTS 450: Does two CUDA tasks at a time make sense?
)
It's not the number of cores that counts but the amount of memory on the card.
Gruß,
Gundolf
Computer sind nicht alles im Leben. (Kleiner Scherz)
Memory is 1024 MB. As others
)
Memory is 1024 MB. As others say, that's enough for up to 3 tasks. But if GPU hasn't got enough computing capabilities, I suppose the computing time of a single task to increase proportional to the number of tasks. So what would it be use for when I try parallel computing of several GPU tasks?
Gruß
Heinrich
At currently 70% utilization
)
At currently 70% utilization you should see some speedup due to running several WUs. I can't say how much, though, except that it shouldn't be higher than 1.43 ;)
Why not just try and report the results here?
MrS
Scanning for our furry friends since Jan 2002
I'm inclined to give it a
)
I'm inclined to give it a try. On the other side I'am cautious about the file app_info.xml. There's not only a single person who is writing about different troubles in this forum here when using this (but necessary) file. Archae86 e.g. wrote, "one highly likely result of an error is loss of all current work in queue or in progress".
What I'm looking for is a faultless app_info file along with explanations or a reference guide which enables the user to adapt this file to his personal needs (according to the user's hardware, OS, crunching parameters)? Does that exist?
I don't like to do things blindly. At least I'd like to have a rough idea, what the consequences are of what I'm doing.
I tried to learn from what I found in this forum. Taking a look at the different suggestions of app_info files, I find them rather different in what they are listing (especially executable files). Which files are essential? GPU ram differs too. One file says 334572800.000000 the next one 220200960.000000. What is the right value or what is it depending on? Do I have to devide the amount of memory available by the number of tasks I want the GPU to do? I've got more of these questions, but will not bother you.
As I already wrote, I'd greatly appreciate a reference guide for app_info files.
Gruß
Heinrich
Before experimenting, you
)
Before experimenting, you could set Einstein to No New Tasks, and letting the existing tasks crunch and report.
Hallo! I´m running an i7
)
Hallo!
I´m running an i7 2100 @3,2GHz with a GTX550Ti, 1GB, 192 shaders @ slot 1 in the motherboard and GT440, 1GB, 96 shaders @ slot 2 of the motherboard. The PCI at slot2 is also 16 bit wide. All BRP4 taks are forced to run at priority level of Normal by Process Tamer without disturbances to normal use like just now, evaluating and writing this.
For the crunching times I get the following measurements:
GTX550Ti
single taks 0.7016 +/- 0.0297 [h]. rel. error 4,23% - form 197 tasks
3 taks parallel 1,707 +/- 0,069 [h], rel. error 4,04% - form 475 tasks
In the mean you get a finished file every (1,707 +/- 0,069 [h])/3 = 0,569 +/- 0.023 [h].
The increase is 0.7016/0.569 - 1 = 23,3 +/- 7,21 [%]
GT440
single task 1.728 +/- 0,0455[h], rel. error 2.63 [%] - from 80 taksk
3 taks parallel 4.551 +/- 0.090[h], rel. error 1,98 [%] - from 180 tasks
In the mean you get a finished file every (4.551 +/- 0,090 [h])/3 = 1.517 +/- 0.030 [h].
The increase is 1.728/1.517 - 1 = 13,9 +/- 3,75 [%]
----------------- GPU-load ----------------------- Memory-load ---------
----------- single taks ----- 3 tasks ---------- single taks ----- 3 tasks
550 --------- ~83% -------- ~96% --------------- ~280MB ------- ~820MB
440 --------- ~60% -------- ~90% --------------- ~330MB ------- ~800MB
It seems, that GPUs with more shaders have a higher benefit from crunching files in parallel, but less than the ratio of the number of shaders.
In the Event Log of the BOINC Manager one will find at startup the peak crunching power of the GPU listed. im my case:
GTX550Ti 486[GFLOPS]
GT440 207[GFLOPS]
The ratio of this gives 2.3478
This ratio is equivalent to the ratio of (shaders * shader clock) for each GPU. In this case (192*1900)/(92*1620) = 2,3456
If one takes the ratio of the mean crunching times for 3 tasks in parallel one get 1,517/0.569 = 2,66, which ist just 14% higher than the ratio above. Also in this case there is an advantage of the GPU with the higher number of shaders. This is very likely due to higher GPU- and Memory-Clock and Memory-Bandwidth at the GTX550Ti.
The relative scattering of the crunching time is singnificently higher at the GPU with more shaders.
Kind regards
Martin
Sorry! In my foregoing post
)
Sorry!
In my foregoing post the sentence
is obsolet. I was unable to kill it, as the editing time of 1h was just over.
Kind regards
Martin
RE: Before experimenting,
)
Easier just to copy your data directory and pull your network connection. If you oops and frag your work you can just close boinc restore the backup and try again.
Thank you all for your
)
Thank you all for your valuable informations. As you were encouraging me, I'll start my first experiments with a multitasking GPU now.
Gruß von
Heinrich
Crunching time results
)
Crunching time results running CUDA BRPS tasks parallelly
As already said, I'm running a GPU Geforce 450 that uses 196 shaders. The board has 1024 MB GDDR5 of RAM. Crunching times are mean values taken from a bunch of >5. Each GPU task was fed by a separate CPU core with data. (The Windows task manager allows for a mapping of a task onto a certain CPU core.)
1 task : 51,4 min ----------- 1 task: 51,4 min
2 tasks: 85,4 min ---- equals 1 task: 42,7 min
3 tasks: 121 min ----- equals 1 task: 40,3 min
A simple calculation shows an increase of speed of about 20% in case of double tasking, about 25 % in case of triple tasking. In the latter case GPU load rose from 70% to values close to 90%. GPU core temperature rose only a little, about 2°C. In other words, the additional electrical power needed may be considered negligible.
Gruß von
Heinrich