If you want BOINC to reserve less, write the factor in there. E.g. if you want to run two tasks on the same GPU, use a factor of 0.5, BOINC will then reserve half a GPU for a task.
it can't be used in case user have several host with different gpu.
for example, some of my hosts can crunch only 2 gpu's tasks at a time, while other can crunch 5 tasks. so i have to mess with appp_info file.
am i right?
Or assign them to different venues and change the setting there. You can four differents sets of preferences - default, home, school, and work.
Or assign them to different venues and change the setting there. You can four differents sets of preferences - default, home, school, and work.
thanks for reminding me about that Richard...i should have mentioned that first, and appended it with the fact that an app_info.xml would only be necessary if the user has more than 4 E@H hosts, each with different GPU factor settings.
Why did the last 13 BRP4 tasks I received taking more than twice as long to complete as any of the previous tasks? Is this normal or should I start looking for something on my end?
Why did the last 13 BRP4 tasks I received taking more than twice as long to complete as any of the previous tasks? Is this normal or should I start looking for something on my end?
are you monitoring your GPU activity w/ a utility like MSI Afterburner or the like? is your nVidia GPU's core clock only half the frequency of what it should be? if so, then at some point your GPU decided to throttle back...whether it was due to a video driver reset or something else, i don't know. but the only way to stop the throttling is to restart the computer as far as i know...
Why did the last 13 BRP4 tasks I received taking more than twice as long to complete as any of the previous tasks? Is this normal or should I start looking for something on my end?
are you monitoring your GPU activity w/ a utility like MSI Afterburner or the like? is your nVidia GPU's core clock only half the frequency of what it should be? if so, then at some point your GPU decided to throttle back...whether it was due to a video driver reset or something else, i don't know. but the only way to stop the throttling is to restart the computer as far as i know...
I use afterburner and all the settings are the same. There has been no throttling back. I run everything @ stock. What percent of a CPU core should it be using? BoincTasks shows 15% CPU usage.
EDIT: I may have found the problem. Normally Einstein uses about 65% of the GPU but for some reason afterburner was showing 85% usage. I couldn't find what was using the other 20% but after a reboot the GPU is back to 65% usage. Very strange. WCG has been beta testing a GPU app this week but there are no tasks for that app running at this time. I have no idea how but I wonder if that had something to do with it. I'll keep a closer eye out when the next batch of betas are released.
Hallo, I would like to ask for some help (or explanation) if possible...
Since the the beginning of March I haven´t received any GPU tasks. I suppose it has something to do with the new BRP4 Cuda application but I am not sure whether because of the my project settings or low performance of my graphic card. While updating of project I receive the message: "no work sent, see scheduler log messages on http://einstein.phys.uwm.edu/host_sched_logs/..."
here is the log:
Request: [USER#xxxxx] [HOST#xxxxxxx] [IP xxx.xxx.xxx.116] client 6.12.34
[send] effective_ncpus 4 max_jobs_on_host_cpu 999999 max_jobs_on_host 999999
[send] effective_ngpus 1 max_jobs_on_host_gpu 999999
[send] Not using matchmaker scheduling; Not using EDF sim
[send] CPU: req 0.00 sec, 0.00 instances; est delay 0.00
[send] CUDA: req 1.00 sec, 1.00 instances; est delay 0.00
[send] work_req_seconds: 0.00 secs
[send] available disk 24.18 GB, work_buf_min 0
[send] active_frac 0.995855 on_frac 0.639550 DCF 3.890007
[send] [HOST#xxxxxxx] is reliable
[send] set_trust: random choice for error rate 0.000078: yes
[version] Don't need CPU jobs, skipping version 101 for einstein_S6Bucket ()
[version] Checking plan class 'SSE2'
[version] reading plan classes from file '../plan_class_spec.xml'
[version] Don't need CPU jobs, skipping version 101 for einstein_S6Bucket (SSE2)
[version] Checking plan class 'SSE'
[version] Don't need CPU jobs, skipping version 102 for einstein_S6Bucket (SSE)
[version] no app version available: APP#16 (einstein_S6Bucket) PLATFORM#2 (windows_intelx86) min_version 0
[version] Checking plan class 'BRP4cuda32'
[version] parsed project prefs setting 'gpu_util_brp' : false : 0.000000
[version] driver version required max: -29053, supplied: 29573
[version] Checking plan class 'BRP4SSE'
[version] parsed project prefs setting 'also_run_cpu' : true : 0.000000
[version] Don't need CPU jobs, skipping version 122 for einsteinbinary_BRP4 (BRP4SSE)
[version] no app version available: APP#19 (einsteinbinary_BRP4) PLATFORM#2 (windows_intelx86) min_version 0
[version] Don't need CPU jobs, skipping version 23 for hsgamma_FGRP1 ()
[version] no app version available: APP#17 (hsgamma_FGRP1) PLATFORM#2 (windows_intelx86) min_version 0
[debug] [HOST#xxxxxxx] MSG(high) No work sent
[debug] [HOST#xxxxxxx] MSG(high) see scheduler log messages on http://einstein.phys.uwm.edu//host_sched_logs/xxxx/xxxxxxx
Sending reply to [HOST#xxxxxxx]: 0 results, delay req 60.00
Scheduler ran 0.162 seconds
My configuration:
i5-430M, 4GB DDR3, NVIDIA GT 420M, 1GB VRAM
Win7, x64, NVIDIA driver 295.73
If it is an issue of project setting I would like to ask for an advise how to fix it. Thanks for help in advance...
Rene
NOTE THAT USING THIS SETTING IS PRETTY DANGEROUS. Make sure that you know precisely what you are doing before messing around with this. Wrong settings may even damage your computer (see e.g. here)! If in any doubt, better leave it at the default (1).
BM
PS: Kudos to Oliver for doing the worst part of this implementation (fiddling with project_specific_prefs.inc)!
Bernd,
Could we expand on this a bit, please? I tend to be in doubt about everything I once was thought I was sure about. But I still want to try it.
It seems to me that physical damage may result from
a) Overloading the power supply. This can be measured with a power meter or a UPS with appropriate data connection.
b) Overheating which can be measured with nVidia or OS tools.
Is there another possibility of physical damage?
I suspect there may also be stability problems if too much GPU memory is allocated by BOINC tasks but I don't have a clue what is too much and whether we get clean error messages or if the system just gets wonky.
It sort of all comes down to the reason I stopped putting "Are you sure" dialog boxes in programs. Most of the people who knew what they were doing answered no and called me for an explanation. All the people who didn't have a clue, just said yes and only called if something went wrong.
Unfortunately, yes, there are possibilities for troubles. If one keeps an eye on the temps it's unlikely that you damage your hardware, but there is a good chance to crash your operating system.
Speaking for myself, I usually try to find the limits of my testsystem. I overclock it until it crashes twice an hour, try to run 3 or 4 wu's on one GPU, change settings to run 6 CPU-tasks on a 4 core AMD-cpu aso.
I have learned that, when running too many tasks on one GPU, the system tends to crash once or twice a day, that AMD-drivers tend to crash without any reason and running T4T with the virtual machine brings a strange CPU-usage (which was the reason to experiment with the nr of cores used).
My experience after crunching for many years is: running the system within the power- and thermal limits will not damage your hardware, running the system at standard clock or save overclocking will keep your oerating system stable.
Do not forget to do some maintenance from time to time; check the fans and remove the dust.
If you are a 7/24 cruncher do not forget that your hdd is also running 7/24; you need to use a drive which is rated to run 7/24 (WD RE4-series, VelociRaptor, Seagate and Samsung have 7/24 rated HDD and of course server-hdd's). Two of my system now run with ssd's; let's see what happens there.
Keep in mind that running a system near the thermal limits reduces the lifetime dramatically.
Investing in additional fans might be a good idea.
I do not overclock, mainly because I value stability over speed, but that might be an interesting experience.
I bravely went ahead and tried running 2 GPU tasks on a GTX 560 on a system with I7-2600K 1600MHz RAM in a full tower with enough fans to cool a ceramic kiln. It is running Ubuntu 11.04 and currently has average credit of 20K.
1GPU task:
Timestamp : Wed Mar 21 14:59:45 2012
Driver Version : 270.41.06
Attached GPUs : 1
GPU 0:1:0
Product Name : GeForce GTX 560
PCI
Bus : 1
Device : 0
Domain : 0
Device Id : 120110DE
Bus Id : 0:1:0
Fan Speed : 40 %
Memory Usage
Total : 1023 Mb
Used : 460 Mb
Free : 562 Mb
Compute Mode : Default
Temperature
Gpu : 67 C
Total system power from UPS (Cyberpower CP1000PF) 22.0% of 600W = 132 W
RE: RE: If you want BOINC
)
Or assign them to different venues and change the setting there. You can four differents sets of preferences - default, home, school, and work.
RE: Or assign them to
)
thanks for reminding me about that Richard...i should have mentioned that first, and appended it with the fact that an app_info.xml would only be necessary if the user has more than 4 E@H hosts, each with different GPU factor settings.
Why did the last 13 BRP4
)
Why did the last 13 BRP4 tasks I received taking more than twice as long to complete as any of the previous tasks? Is this normal or should I start looking for something on my end?
RE: Why did the last 13
)
are you monitoring your GPU activity w/ a utility like MSI Afterburner or the like? is your nVidia GPU's core clock only half the frequency of what it should be? if so, then at some point your GPU decided to throttle back...whether it was due to a video driver reset or something else, i don't know. but the only way to stop the throttling is to restart the computer as far as i know...
RE: RE: Why did the last
)
I use afterburner and all the settings are the same. There has been no throttling back. I run everything @ stock. What percent of a CPU core should it be using? BoincTasks shows 15% CPU usage.
EDIT: I may have found the problem. Normally Einstein uses about 65% of the GPU but for some reason afterburner was showing 85% usage. I couldn't find what was using the other 20% but after a reboot the GPU is back to 65% usage. Very strange. WCG has been beta testing a GPU app this week but there are no tasks for that app running at this time. I have no idea how but I wonder if that had something to do with it. I'll keep a closer eye out when the next batch of betas are released.
Hallo, I would like to ask
)
Hallo, I would like to ask for some help (or explanation) if possible...
Since the the beginning of March I haven´t received any GPU tasks. I suppose it has something to do with the new BRP4 Cuda application but I am not sure whether because of the my project settings or low performance of my graphic card. While updating of project I receive the message: "no work sent, see scheduler log messages on http://einstein.phys.uwm.edu/host_sched_logs/..."
here is the log:
Request: [USER#xxxxx] [HOST#xxxxxxx] [IP xxx.xxx.xxx.116] client 6.12.34
[send] effective_ncpus 4 max_jobs_on_host_cpu 999999 max_jobs_on_host 999999
[send] effective_ngpus 1 max_jobs_on_host_gpu 999999
[send] Not using matchmaker scheduling; Not using EDF sim
[send] CPU: req 0.00 sec, 0.00 instances; est delay 0.00
[send] CUDA: req 1.00 sec, 1.00 instances; est delay 0.00
[send] work_req_seconds: 0.00 secs
[send] available disk 24.18 GB, work_buf_min 0
[send] active_frac 0.995855 on_frac 0.639550 DCF 3.890007
[send] [HOST#xxxxxxx] is reliable
[send] set_trust: random choice for error rate 0.000078: yes
[version] Don't need CPU jobs, skipping version 101 for einstein_S6Bucket ()
[version] Checking plan class 'SSE2'
[version] reading plan classes from file '../plan_class_spec.xml'
[version] Don't need CPU jobs, skipping version 101 for einstein_S6Bucket (SSE2)
[version] Checking plan class 'SSE'
[version] Don't need CPU jobs, skipping version 102 for einstein_S6Bucket (SSE)
[version] no app version available: APP#16 (einstein_S6Bucket) PLATFORM#2 (windows_intelx86) min_version 0
[version] Checking plan class 'BRP4cuda32'
[version] parsed project prefs setting 'gpu_util_brp' : false : 0.000000
[version] driver version required max: -29053, supplied: 29573
[version] Checking plan class 'BRP4SSE'
[version] parsed project prefs setting 'also_run_cpu' : true : 0.000000
[version] Don't need CPU jobs, skipping version 122 for einsteinbinary_BRP4 (BRP4SSE)
[version] no app version available: APP#19 (einsteinbinary_BRP4) PLATFORM#2 (windows_intelx86) min_version 0
[version] Don't need CPU jobs, skipping version 23 for hsgamma_FGRP1 ()
[version] no app version available: APP#17 (hsgamma_FGRP1) PLATFORM#2 (windows_intelx86) min_version 0
[debug] [HOST#xxxxxxx] MSG(high) No work sent
[debug] [HOST#xxxxxxx] MSG(high) see scheduler log messages on http://einstein.phys.uwm.edu//host_sched_logs/xxxx/xxxxxxx
Sending reply to [HOST#xxxxxxx]: 0 results, delay req 60.00
Scheduler ran 0.162 seconds
My configuration:
i5-430M, 4GB DDR3, NVIDIA GT 420M, 1GB VRAM
Win7, x64, NVIDIA driver 295.73
If it is an issue of project setting I would like to ask for an advise how to fix it. Thanks for help in advance...
Rene
It's because of your graphics
)
It's because of your graphics driver: driver version required max: -29053, supplied: 29573
See the thread two below this one: Not shipping Windows BRP4 CUDA tasks to recent drivers
and for the reason nVidia 295.51 beta driver problems
RE: NOTE THAT USING THIS
)
Bernd,
Could we expand on this a bit, please? I tend to be in doubt about everything I once was thought I was sure about. But I still want to try it.
It seems to me that physical damage may result from
a) Overloading the power supply. This can be measured with a power meter or a UPS with appropriate data connection.
b) Overheating which can be measured with nVidia or OS tools.
Is there another possibility of physical damage?
I suspect there may also be stability problems if too much GPU memory is allocated by BOINC tasks but I don't have a clue what is too much and whether we get clean error messages or if the system just gets wonky.
It sort of all comes down to the reason I stopped putting "Are you sure" dialog boxes in programs. Most of the people who knew what they were doing answered no and called me for an explanation. All the people who didn't have a clue, just said yes and only called if something went wrong.
Joe
ps: Thanks Oliver
Joe, Unfortunately, yes,
)
Joe,
Unfortunately, yes, there are possibilities for troubles. If one keeps an eye on the temps it's unlikely that you damage your hardware, but there is a good chance to crash your operating system.
Speaking for myself, I usually try to find the limits of my testsystem. I overclock it until it crashes twice an hour, try to run 3 or 4 wu's on one GPU, change settings to run 6 CPU-tasks on a 4 core AMD-cpu aso.
I have learned that, when running too many tasks on one GPU, the system tends to crash once or twice a day, that AMD-drivers tend to crash without any reason and running T4T with the virtual machine brings a strange CPU-usage (which was the reason to experiment with the nr of cores used).
My experience after crunching for many years is: running the system within the power- and thermal limits will not damage your hardware, running the system at standard clock or save overclocking will keep your oerating system stable.
Do not forget to do some maintenance from time to time; check the fans and remove the dust.
If you are a 7/24 cruncher do not forget that your hdd is also running 7/24; you need to use a drive which is rated to run 7/24 (WD RE4-series, VelociRaptor, Seagate and Samsung have 7/24 rated HDD and of course server-hdd's). Two of my system now run with ssd's; let's see what happens there.
Keep in mind that running a system near the thermal limits reduces the lifetime dramatically.
Investing in additional fans might be a good idea.
Alexander
Thank you Alex. I do not
)
Thank you Alex.
I do not overclock, mainly because I value stability over speed, but that might be an interesting experience.
I bravely went ahead and tried running 2 GPU tasks on a GTX 560 on a system with I7-2600K 1600MHz RAM in a full tower with enough fans to cool a ceramic kiln. It is running Ubuntu 11.04 and currently has average credit of 20K.
1GPU task:
Total system power from UPS (Cyberpower CP1000PF) 22.0% of 600W = 132 W
sensors
:
Core 0: +58.0°C (high = +80.0°C, crit = +98.0°C)
Core 1: +61.0°C (high = +80.0°C, crit = +98.0°C)
Core 2: +59.0°C (high = +80.0°C, crit = +98.0°C)
Core 3: +58.0°C (high = +80.0°C, crit = +98.0°C)
Two GPU tasks:
power = 40% (240W)
sensors:
Core 0: +57.0°C (high = +80.0°C, crit = +98.0°C)
Core 1: +61.0°C (high = +80.0°C, crit = +98.0°C)
Core 2: +60.0°C (high = +80.0°C, crit = +98.0°C)
Core 3: +57.0°C (high = +80.0°C, crit = +98.0°C)
So far it looks good to me. I'll let it run. In a couple of days I can report throughput.
Joe