New BRP4 application versions 1.22/1.23

Richard Haselgrove

Joined: 10 Dec 05

Posts: 2143

Credit: 3044464301

RAC: 2021511

RE: RE: If you want BOINC

18 Mar 2012 21:02:24 UTC

Message 108665 in response to message 108663

(moderation:

)

Quote:

Quote:
If you want BOINC to reserve less, write the factor in there. E.g. if you want to run two tasks on the same GPU, use a factor of 0.5, BOINC will then reserve half a GPU for a task.

it can't be used in case user have several host with different gpu.
for example, some of my hosts can crunch only 2 gpu's tasks at a time, while other can crunch 5 tasks. so i have to mess with appp_info file.
am i right?

Or assign them to different venues and change the setting there. You can four differents sets of preferences - default, home, school, and work.

Sunny129

Joined: 5 Dec 05

Posts: 162

Credit: 160342159

RAC: 0

RE: Or assign them to

18 Mar 2012 21:14:00 UTC

Message 108666 in response to message 108665

(moderation:

)

Quote:

Or assign them to different venues and change the setting there. You can four differents sets of preferences - default, home, school, and work.

thanks for reminding me about that Richard...i should have mentioned that first, and appended it with the fact that an app_info.xml would only be necessary if the user has more than 4 E@H hosts, each with different GPU factor settings.

nanoprobe

Joined: 3 Mar 12

Posts: 40

Credit: 12540756

RAC: 0

Why did the last 13 BRP4

20 Mar 2012 20:36:39 UTC

Message 108667

(moderation:

)

Why did the last 13 BRP4 tasks I received taking more than twice as long to complete as any of the previous tasks? Is this normal or should I start looking for something on my end?

Sunny129

Joined: 5 Dec 05

Posts: 162

Credit: 160342159

RAC: 0

RE: Why did the last 13

20 Mar 2012 21:26:51 UTC

Message 108668 in response to message 108667

(moderation:

)

Quote:

Why did the last 13 BRP4 tasks I received taking more than twice as long to complete as any of the previous tasks? Is this normal or should I start looking for something on my end?

are you monitoring your GPU activity w/ a utility like MSI Afterburner or the like? is your nVidia GPU's core clock only half the frequency of what it should be? if so, then at some point your GPU decided to throttle back...whether it was due to a video driver reset or something else, i don't know. but the only way to stop the throttling is to restart the computer as far as i know...

nanoprobe

Joined: 3 Mar 12

Posts: 40

Credit: 12540756

RAC: 0

RE: RE: Why did the last

20 Mar 2012 22:53:32 UTC

Message 108669 in response to message 108668

(moderation:

)

Quote:

Quote:
Why did the last 13 BRP4 tasks I received taking more than twice as long to complete as any of the previous tasks? Is this normal or should I start looking for something on my end?

are you monitoring your GPU activity w/ a utility like MSI Afterburner or the like? is your nVidia GPU's core clock only half the frequency of what it should be? if so, then at some point your GPU decided to throttle back...whether it was due to a video driver reset or something else, i don't know. but the only way to stop the throttling is to restart the computer as far as i know...

I use afterburner and all the settings are the same. There has been no throttling back. I run everything @ stock. What percent of a CPU core should it be using? BoincTasks shows 15% CPU usage.
EDIT: I may have found the problem. Normally Einstein uses about 65% of the GPU but for some reason afterburner was showing 85% usage. I couldn't find what was using the other 20% but after a reboot the GPU is back to 65% usage. Very strange. WCG has been beta testing a GPU app this week but there are no tasks for that app running at this time. I have no idea how but I wonder if that had something to do with it. I'll keep a closer eye out when the next batch of betas are released.

Rene NAD

Joined: 25 Sep 11

Posts: 2

Credit: 1215950

RAC: 0

Hallo, I would like to ask

21 Mar 2012 17:23:17 UTC

Message 108670

(moderation:

)

Hallo, I would like to ask for some help (or explanation) if possible...
Since the the beginning of March I havenÂ´t received any GPU tasks. I suppose it has something to do with the new BRP4 Cuda application but I am not sure whether because of the my project settings or low performance of my graphic card. While updating of project I receive the message: "no work sent, see scheduler log messages on http://einstein.phys.uwm.edu/host_sched_logs/..."

here is the log:
Request: [USER#xxxxx] [HOST#xxxxxxx] [IP xxx.xxx.xxx.116] client 6.12.34
[send] effective_ncpus 4 max_jobs_on_host_cpu 999999 max_jobs_on_host 999999
[send] effective_ngpus 1 max_jobs_on_host_gpu 999999
[send] Not using matchmaker scheduling; Not using EDF sim
[send] CPU: req 0.00 sec, 0.00 instances; est delay 0.00
[send] CUDA: req 1.00 sec, 1.00 instances; est delay 0.00
[send] work_req_seconds: 0.00 secs
[send] available disk 24.18 GB, work_buf_min 0
[send] active_frac 0.995855 on_frac 0.639550 DCF 3.890007
[send] [HOST#xxxxxxx] is reliable
[send] set_trust: random choice for error rate 0.000078: yes
[version] Don't need CPU jobs, skipping version 101 for einstein_S6Bucket ()
[version] Checking plan class 'SSE2'
[version] reading plan classes from file '../plan_class_spec.xml'
[version] Don't need CPU jobs, skipping version 101 for einstein_S6Bucket (SSE2)
[version] Checking plan class 'SSE'
[version] Don't need CPU jobs, skipping version 102 for einstein_S6Bucket (SSE)
[version] no app version available: APP#16 (einstein_S6Bucket) PLATFORM#2 (windows_intelx86) min_version 0
[version] Checking plan class 'BRP4cuda32'
[version] parsed project prefs setting 'gpu_util_brp' : false : 0.000000
[version] driver version required max: -29053, supplied: 29573
[version] Checking plan class 'BRP4SSE'
[version] parsed project prefs setting 'also_run_cpu' : true : 0.000000
[version] Don't need CPU jobs, skipping version 122 for einsteinbinary_BRP4 (BRP4SSE)
[version] no app version available: APP#19 (einsteinbinary_BRP4) PLATFORM#2 (windows_intelx86) min_version 0
[version] Don't need CPU jobs, skipping version 23 for hsgamma_FGRP1 ()
[version] no app version available: APP#17 (hsgamma_FGRP1) PLATFORM#2 (windows_intelx86) min_version 0
[debug] [HOST#xxxxxxx] MSG(high) No work sent
[debug] [HOST#xxxxxxx] MSG(high) see scheduler log messages on http://einstein.phys.uwm.edu//host_sched_logs/xxxx/xxxxxxx
Sending reply to [HOST#xxxxxxx]: 0 results, delay req 60.00
Scheduler ran 0.162 seconds

My configuration:
i5-430M, 4GB DDR3, NVIDIA GT 420M, 1GB VRAM
Win7, x64, NVIDIA driver 295.73
If it is an issue of project setting I would like to ask for an advise how to fix it. Thanks for help in advance...
Rene

Richard Haselgrove

Joined: 10 Dec 05

Posts: 2143

Credit: 3044464301

RAC: 2021511

It's because of your graphics

21 Mar 2012 17:53:50 UTC

Message 108671 in response to message 108670

(moderation:

)

It's because of your graphics driver: driver version required max: -29053, supplied: 29573

See the thread two below this one: Not shipping Windows BRP4 CUDA tasks to recent drivers

and for the reason nVidia 295.51 beta driver problems

joe areeda

Joined: 13 Dec 10

Posts: 285

Credit: 320378898

RAC: 0

RE: NOTE THAT USING THIS

21 Mar 2012 21:43:43 UTC

Message 108672 in response to message 108626

(moderation:

)

Quote:

NOTE THAT USING THIS SETTING IS PRETTY DANGEROUS. Make sure that you know precisely what you are doing before messing around with this. Wrong settings may even damage your computer (see e.g. here)! If in any doubt, better leave it at the default (1).

BM

PS: Kudos to Oliver for doing the worst part of this implementation (fiddling with project_specific_prefs.inc)!

Bernd,

Could we expand on this a bit, please? I tend to be in doubt about everything I once was thought I was sure about. But I still want to try it.

It seems to me that physical damage may result from
a) Overloading the power supply. This can be measured with a power meter or a UPS with appropriate data connection.

b) Overheating which can be measured with nVidia or OS tools.

Is there another possibility of physical damage?

I suspect there may also be stability problems if too much GPU memory is allocated by BOINC tasks but I don't have a clue what is too much and whether we get clean error messages or if the system just gets wonky.

It sort of all comes down to the reason I stopped putting "Are you sure" dialog boxes in programs. Most of the people who knew what they were doing answered no and called me for an explanation. All the people who didn't have a clue, just said yes and only called if something went wrong.

Joe

ps: Thanks Oliver

Alex

Joined: 1 Mar 05

Posts: 451

Credit: 529080095

RAC: 352757

Joe, Unfortunately, yes,

21 Mar 2012 23:02:16 UTC

Message 108673

(moderation:

)

Joe,

Unfortunately, yes, there are possibilities for troubles. If one keeps an eye on the temps it's unlikely that you damage your hardware, but there is a good chance to crash your operating system.

Speaking for myself, I usually try to find the limits of my testsystem. I overclock it until it crashes twice an hour, try to run 3 or 4 wu's on one GPU, change settings to run 6 CPU-tasks on a 4 core AMD-cpu aso.
I have learned that, when running too many tasks on one GPU, the system tends to crash once or twice a day, that AMD-drivers tend to crash without any reason and running T4T with the virtual machine brings a strange CPU-usage (which was the reason to experiment with the nr of cores used).

My experience after crunching for many years is: running the system within the power- and thermal limits will not damage your hardware, running the system at standard clock or save overclocking will keep your oerating system stable.

Do not forget to do some maintenance from time to time; check the fans and remove the dust.

If you are a 7/24 cruncher do not forget that your hdd is also running 7/24; you need to use a drive which is rated to run 7/24 (WD RE4-series, VelociRaptor, Seagate and Samsung have 7/24 rated HDD and of course server-hdd's). Two of my system now run with ssd's; let's see what happens there.

Keep in mind that running a system near the thermal limits reduces the lifetime dramatically.

Investing in additional fans might be a good idea.

Alexander

joe areeda

Joined: 13 Dec 10

Posts: 285

Credit: 320378898

RAC: 0

Thank you Alex. I do not

21 Mar 2012 23:27:12 UTC

Message 108674 in response to message 108673

(moderation:

)

Thank you Alex.

I do not overclock, mainly because I value stability over speed, but that might be an interesting experience.

I bravely went ahead and tried running 2 GPU tasks on a GTX 560 on a system with I7-2600K 1600MHz RAM in a full tower with enough fans to cool a ceramic kiln. It is running Ubuntu 11.04 and currently has average credit of 20K.

1GPU task:

Timestamp                       : Wed Mar 21 14:59:45 2012
Driver Version                  : 270.41.06
Attached GPUs                   : 1
GPU 0:1:0
    Product Name                : GeForce GTX 560
    PCI
        Bus                     : 1
        Device                  : 0
        Domain                  : 0
        Device Id               : 120110DE
        Bus Id                  : 0:1:0
    Fan Speed                   : 40 %
    Memory Usage
        Total                   : 1023 Mb
        Used                    : 460 Mb
        Free                    : 562 Mb
    Compute Mode                : Default
    Temperature
        Gpu                     : 67 C

Total system power from UPS (Cyberpower CP1000PF) 22.0% of 600W = 132 W

sensors
:

Core 0: +58.0Â°C (high = +80.0Â°C, crit = +98.0Â°C)
Core 1: +61.0Â°C (high = +80.0Â°C, crit = +98.0Â°C)
Core 2: +59.0Â°C (high = +80.0Â°C, crit = +98.0Â°C)
Core 3: +58.0Â°C (high = +80.0Â°C, crit = +98.0Â°C)

Two GPU tasks:

    Fan Speed                   : 43 %
    Memory Usage
        Total                   : 1023 Mb
        Used                    : 769 Mb
        Free                    : 254 Mb
    Temperature
        Gpu                     : 71 C

power = 40% (240W)

sensors:
Core 0: +57.0Â°C (high = +80.0Â°C, crit = +98.0Â°C)
Core 1: +61.0Â°C (high = +80.0Â°C, crit = +98.0Â°C)
Core 2: +60.0Â°C (high = +80.0Â°C, crit = +98.0Â°C)
Core 3: +57.0Â°C (high = +80.0Â°C, crit = +98.0Â°C)

So far it looks good to me. I'll let it run. In a couple of days I can report throughput.

Joe

New BRP4 application versions 1.22/1.23

Forums › Technical News

Comment viewing options

Forums › Technical News