Machine crash - maybe.

adrianxw
adrianxw
Joined: 21 Feb 05
Posts: 242
Credit: 322654862
RAC: 0
Topic 197856

I have, for many years, had two or more machines under my desk at home, but a single screen on the desk. I plug the screen into the machine I need to look at. All machines run BOINC. I "upgraded", (upgraded in quotes because XP served me perfectly well and the "upgrade" was forced by MicroSofts dropping of XP support), one machine from XP to Windows 8 a few weeks back and have found a problem with this usage mode. From time to time, when I plug the screen into the machine it is "dead", dead in quotes because I don't know if it is or not, suffice to say the screen is blank and it does not react to its keyboard or mouse, I have to press the restart button.

The Einstein work units that run on this system use the GPU on the graphics card, an ASUS HD 7850 with an up to date driver. Einstein has a 10% quota on the system which it shares with 7 other projects which do not use the GPU. There are, therefore, times when there are no Einstein units present.

What I am wondering is if the GPU use of Einstein is affecting the machine when the screen is not attached.

Wave upon wave of demented avengers march cheerfully out of obscurity into the dream.

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5872
Credit: 117868214872
RAC: 34749042

Machine crash - maybe.

Quote:
... when I plug the screen into the machine it is "dead", dead in quotes because I don't know if it is or not, suffice to say the screen is blank and it does not react to its keyboard or mouse, I have to press the restart button.


Have you tried 'pinging' its IP address from the other machine to see if it responds? I haven't used Windows for a long time now (I'm Linux only) so I'm quite unfamiliar with Windows utilities you might be able to use to check for details on actual running processes like BOINC or the science apps. For example, is there any quick way to look for a process ID (PID) as a way of telling if the machine is still doing something even though the screen isn't responding?

I run virtually all my machines with only power and network attached. Whenever I add monitor, keyboard and mouse, (once in a blue moon) I always get immediate action - unless the machine truly has died :-). However I know in advance if a machine has died because I'm regularly doing automatic checks on every one through a script running on a server machine. I imagine you must be able to do something like this in Windows.

Quote:
The Einstein work units that run on this system use the GPU on the graphics card, an ASUS HD 7850 with an up to date driver. Einstein has a 10% quota on the system which it shares with 7 other projects which do not use the GPU. There are, therefore, times when there are no Einstein units present.


A 7850 can churn out GPU tasks quite quickly. I was extremely surprised to see how long yours are taking (>150,000 secs for BRP5 on average). I have a host almost as old as yours (Q8400 CPU with a 7850) and it does 4 concurrent BRP5 GPU tasks in around 22,000 secs. I'm guessing you haven't kept a CPU core free for GPU support? That can make an incredible difference with AMD GPUs.

Quote:
What I am wondering is if the GPU use of Einstein is affecting the machine when the screen is not attached.


I wouldn't have thought so. It certainly doesn't with Linux.

Cheers,
Gary.

adrianxw
adrianxw
Joined: 21 Feb 05
Posts: 242
Credit: 322654862
RAC: 0

>>> Have you tried

>>> Have you tried 'pinging'

Nope, I can try that next time it happens.

>>> I'm guessing

You guess quite correctly, the jobs claim to use 0.5 CPU's, thus allocating a CPU as a GPU support would appear to waste CPU time that the other projects could use. I am not actually sure I know how to do that anyway!

I have blown an Ubuntu DVD, and will be using that on the other machines here.

Wave upon wave of demented avengers march cheerfully out of obscurity into the dream.

mikey
mikey
Joined: 22 Jan 05
Posts: 12714
Credit: 1839116474
RAC: 3613

RE: >>> Have you tried

Quote:

>>> Have you tried 'pinging'

Nope, I can try that next time it happens.

>>> I'm guessing

You guess quite correctly, the jobs claim to use 0.5 CPU's, thus allocating a CPU as a GPU support would appear to waste CPU time that the other projects could use. I am not actually sure I know how to do that anyway!

I have blown an Ubuntu DVD, and will be using that on the other machines here.

I have 15 Windows pc's here at my home and use the free software VNC-5.2.1 to sit at one pc and bring up every other one without having to plug or unplug anything. I have used Ultra-VNC and several others in the past, but the one above is the latest one for me and it just seems to work better. The free version ONLY allows anonymous connections, but does require a pc name and password to get into each pc. I use the same VNC password for each pc, but each has it's own name of course. You would need to install it on both pc's but after that no plugging and unplugging is needed. It works on my 32 bit XP machine, my 32 and 64 bit Win7 machines and my 64bit Windows Home server machine. I can sit at any one and get into any other one with no problems. Transferring files CAN be a problem, so I use the public folders on my Home Server for that, you could get around that because the pc's are close to each other, a usb stick would work just fine.

Holmis
Joined: 4 Jan 05
Posts: 1118
Credit: 1055935564
RAC: 0

RE: You guess quite

Quote:
You guess quite correctly, the jobs claim to use 0.5 CPU's, thus allocating a CPU as a GPU support would appear to waste CPU time that the other projects could use. I am not actually sure I know how to do that anyway!


Easiest way would be to go to your computing prefs. and change "On multiprocessors, use at most 100% of the processors" to 75%, the do a manual update of the project in Boinc to get the new prefs downloaded. Or if you're using local prefs set in Boinc Manager then in the advanced view go to the menu Tools -> Computing preferences and on the Processor usage tab change the "On multiprocessor systems, use at most 100% of the processors" to 75%.
Local prefs always override the web based settings.

This will always leave one core free to support the GPU but will waste resources if no GPU tasks are running. To get things more optimized you'll need create an app_config.xml file, full instructions are here.

Open Notepad and past the following:

[pre]

einsteinbinary_BRP4G

1
1



einsteinbinary_BRP5

1
1


[/pre]
Then when saving make sure you choose "All filetypes" and ANSI encoding and name the file app_config.xml and save it to Boinc\projects\einstein.phys.uwm.edu.
When done open Boinc Manager in advanced view and in the meny advanced choose "Read config files" and the settings will take effect. This will dynamically reserve one core for GPU support when running GPU tasks but will let all 4 cores run CPU task if there are no GPU tasks.

mikey
mikey
Joined: 22 Jan 05
Posts: 12714
Credit: 1839116474
RAC: 3613

RE: RE: You guess quite

Quote:
Quote:
You guess quite correctly, the jobs claim to use 0.5 CPU's, thus allocating a CPU as a GPU support would appear to waste CPU time that the other projects could use. I am not actually sure I know how to do that anyway!

Easiest way would be to go to your computing prefs. and change "On multiprocessors, use at most 100% of the processors" to 75%, the do a manual update of the project in Boinc to get the new prefs downloaded. Or if you're using local prefs set in Boinc Manager then in the advanced view go to the menu Tools -> Computing preferences and on the Processor usage tab change the "On multiprocessor systems, use at most 100% of the processors" to 75%.
Local prefs always override the web based settings.

This will always leave one core free to support the GPU but will waste resources if no GPU tasks are running. To get things more optimized you'll need create an app_config.xml file, full instructions are here.

Open Notepad and past the following:

[pre]

einsteinbinary_BRP4G

1
1



einsteinbinary_BRP5

1
1


[/pre]
Then when saving make sure you choose "All filetypes" and ANSI encoding and name the file app_config.xml and save it to Boinc\projects\einstein.phys.uwm.edu.
When done open Boinc Manager in advanced view and in the meny advanced choose "Read config files" and the settings will take effect. This will dynamically reserve one core for GPU support when running GPU tasks but will let all 4 cores run CPU task if there are no GPU tasks.

I always forget you can do that on the webpage, which IS BETTER for most people! My directions were intended more for those with multiple pc's that want to set each machine differently.

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5872
Credit: 117868214872
RAC: 34749042

RE: ... the jobs claim to

Quote:
... the jobs claim to use 0.5 CPU's, thus allocating a CPU as a GPU support would appear to waste CPU time that the other projects could use.


Yes, that was exactly my attitude when I first started using GPUs quite a while ago. For nvidia GPUs (where the 'reservation' is 0.2 CPUs), I found that I could load up all CPU cores without any significant effect on the GPU crunch time. This was for GTX650 GPUs. Since then, I've been using AMD GPUs (7770 and 7850) and If you run a GPU task without a free CPU core, there is a fair bit of a penalty.

Others have chimed in with various ways of improving things, all of which should improve the GPU crunch times. I'll offer a slightly different alternative. It would be interesting to try them all and see what works best for you. If you use an app_config.xml, you have ultimate control over the resources allocated to a GPU task at the expense of setting it up and maintaining it if circumstances change. The example given shows how to reserve a full CPU core for a single GPU task. You could choose different values to allow different behaviour - for example crunching two GPU tasks concurrently whilst reserving a full CPU core.

My experience with 7850 GPUs (2GB version like you have) is that they seem to thrive on multiple concurrent tasks. I do 4 concurrent on all of mine. I'm not suggesting that for you, but I would strongly suggest trying two concurrent. The simplest way to do that avoids using app_config.xml completely. Just go to the EAH preferences page on the website and look for the GPU utilization factor of BRP apps. If you haven't set it previously, it will be showing a value of 1. If you have allocated your machine to one of the 4 venues (default home work school) change the setting for the appropriate venue (otherwise the 'default' setting) to be 0.5 rather than 1 and save the change. The setting change will be communicated to your computer the next time it asks for BRP work. You can promote this with a temporary work cache increase to get immediate results.

The benefit of doing 2 concurrent GPU tasks is that a core will be reserved automatically by BOINC without you having to change the % CPUs setting or having to construct an app_config.xml. If you don't like the results, its very easy to reverse - edit the setting back to what it was and request a new task to receive the change. I would expect the result of doing two concurrent would be two GPU tasks finishing in a lot less time than it's currently taking to do one. Of course, you will have one less core available for CPU tasks.

Cheers,
Gary.

adrianxw
adrianxw
Joined: 21 Feb 05
Posts: 242
Credit: 322654862
RAC: 0

I have set this up and look

I have set this up and look at it tomorrow. It looks to be an easier method.

Wave upon wave of demented avengers march cheerfully out of obscurity into the dream.

adrianxw
adrianxw
Joined: 21 Feb 05
Posts: 242
Credit: 322654862
RAC: 0

Looking at it today, it would

Looking at it today, it would seem that it is running two jobs in stead of one, and the jobs are taking ~65% less time. Looking at the system in general, it is not running at 100% usage however, it varies over a few cycles from 60% to 80%.

Wave upon wave of demented avengers march cheerfully out of obscurity into the dream.

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5872
Credit: 117868214872
RAC: 34749042

Looking at your tasks list, I

Looking at your tasks list, I see both BRP4G and BRP5 tasks finishing something like 3 to 5 times faster than they were previously. You've certainly got a real spurt in production showing up :-).

Cheers,
Gary.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.