Help! Are my GPU cards starting to fail?

Ron Kosinski
Ron Kosinski
Joined: 23 Mar 05
Posts: 57
Credit: 1069205892
RAC: 852151
Topic 222387

Within the past month, two of my boxes (ID: 8183504, ID: 10810284) with almost identical GPU cards (NVIDIA GTX-760) have started to crash and burn on GPU work.  They are overclocked using MSI Afterburner and have been operating well until now.  I had not made any software or hardware changes in the systems for months leading up to the crashes.  The first couple of crashes wiped out ALL waiting GPU work for both Einstein and Milky Way.  I have been decreasing the core and memory over-clock speeds and the situation is getting better but has not gone away.  Right now I only seem to be crashing out on some but not all 2.07 Gravitational Wave work units. The GPU temperatures when running the above mentioned work units are now around mid 50C range and lower (low 50C range) on other work units from Einstein and Milky Way.  Before, the temps were running around high 50C and mid 50C range respectfully.  I have an app_cofig file on both computers that both current Einstein GPU work units (1.22 Gamma Ray and 2.07 Grav. Wave) run with one full CPU core and alone on the GPU.  I do run two 1.46 MW@H Separation GPU work units con-currently.  After SETI@Home closed down, I switched to MW@H.  Just this past week, I updated the MSI Afterburner to the latest revision (only one revision behind) and updated the GPU cards to the latest NVIDIA drivers (hadn't updated in several months).

I'm not sure what additional information I need to supply to help.

Richie
Richie
Joined: 7 Mar 14
Posts: 656
Credit: 1702989778
RAC: 0

Your GTX 760 cards have 2GB

Your GTX 760 cards have 2GB memory. That is not enough for many of the new GW tasks. That's why they keep crashing. 4GB card would be required.

Ron Kosinski
Ron Kosinski
Joined: 23 Mar 05
Posts: 57
Credit: 1069205892
RAC: 852151

So, the GPU cards are not

So, the GPU cards are not failing, per say. It's just they don't have enough memory to run the newer GW tasks?

I was looking at possibly replacing them with a Radeon RX580 or RX5500, both with 8 GB memory. 

Or, I do have additional PCI-E slots on my MB.  If I just added the Radeon boards to my system, is there a way to restrict the GW tasks, or all Einstein tasks to the Radeon board?  I have 850 watt PS in both boxes.

Any thoughts on either card regarding it's usefulness crunching Einstein and MW@H tasks?

Ian&Steve C.
Ian&Steve C.
Joined: 19 Jan 20
Posts: 3977
Credit: 47387282642
RAC: 65306280

put the 760's in a computer

put the 760's in a computer that has preferences set to not accept the GW tasks. run the gamma ray tasks only. they should do fine there, if you don't mind the power draw of such old cards.

_________________________________________________________________________

mikey
mikey
Joined: 22 Jan 05
Posts: 12715
Credit: 1839117411
RAC: 3616

Ron Kosinski wrote: So, the

Ron Kosinski wrote:

So, the GPU cards are not failing, per say. It's just they don't have enough memory to run the newer GW tasks?

I was looking at possibly replacing them with a Radeon RX580 or RX5500, both with 8 GB memory. 

Or, I do have additional PCI-E slots on my MB.  If I just added the Radeon boards to my system, is there a way to restrict the GW tasks, or all Einstein tasks to the Radeon board?  I have 850 watt PS in both boxes.

Any thoughts on either card regarding it's usefulness crunching Einstein and MW@H tasks? 

Yes you can do exceptions like thru an app_config.xml file.

What I did was put my 760's in one machine and the ATI gpu's in  a different machine, that way I did not have to do exceptions.

Ron Kosinski
Ron Kosinski
Joined: 23 Mar 05
Posts: 57
Credit: 1069205892
RAC: 852151

Everyone, Thank you for

Everyone,

Thank you for all the help and ideas.  I will put the NVIDIA cards in one box and the Radeon cards in the other box.

Would this work in the app_config file for the box with the NVIDiA cards;

<exclude_gpu>
   [<type>NVIDIA</type>]
   [<app>einstein_O2MDF</app>]
</exclude_gpu>

Or would it be better to create a separate Project Preference for the box with the NVIDIA card and un-check the GW GPU project?

Any preference between the two Radeon cards: RX580 or RX5500? They are both around the same price on Amazon now?  I was looking at cards from XFX.

mikey
mikey
Joined: 22 Jan 05
Posts: 12715
Credit: 1839117411
RAC: 3616

Ron Kosinski wrote: Any

Ron Kosinski wrote:

Any preference between the two Radeon cards: RX580 or RX5500? They are both around the same price on Amazon now?  I was looking at cards from XFX.

Here's some info on your basic choices:

NAME GPU CLOCK BOOST CLOCK MEMORY CLOCK
AMD Radeon RX 580 1257 MHz 1266 MHz 2000 MHz

 

Name GPU Clock Boost Clock Memory Clock Other Changes
HP RX 5500 1500 MHz 1845 MHz 1750 MHz

 

[url]https://gpu.userbenchmark.com/Compare/AMD-RX-580-vs-AMD-RX-5500/3923vs4059[/url]

To me the key is long term because today's crunching is different than tomorrow's crunching and unless you get one at a VERY VERY good price you should be thinking long  term usage.

Jim Brossard
Jim Brossard
Joined: 4 Apr 20
Posts: 8
Credit: 306115350
RAC: 362936

It turns out all of my GPU

It turns out all of my GPU cards have 2 GB of RAM.

If I'm reading this correctly, I should un-select any Gravitational Wave applications in settings, but Binary Radio and Gamma-ray Pulsar applications are okay to run?

Regards,

Jim

Keith Myers
Keith Myers
Joined: 11 Feb 11
Posts: 4979
Credit: 18797080493
RAC: 7791781

To be absolutely safe from

To be absolutely safe from generating errors on the 2GB cards, you should not crunch GW gpu tasks as there are occasional ones that need more than 3GB of card memory. They will fail on cards that have less than 4GB of memory. But those come in spurts depending on the frequency band. Your failed tasks just get resent to someone else.

 

Jim Brossard
Jim Brossard
Joined: 4 Apr 20
Posts: 8
Credit: 306115350
RAC: 362936

I looked at the wrong

I looked at the wrong box...

It turns out one system as a 4 GB AMD graphics card, and the other a 2 GB NVIDIA graphics card.

On the 2 GB system I created a cc_config.xml file in C:\ProgramData\BOINC containing the following:

<cc_config>
    <exclude_gpu>
        <url>http://einstein.phys.uwm.edu/</url>
        <app>einstein_O2MDF</app>
      </exclude_gpu>
</cc_config>

I hope that is correct.

Regards,

Jim

mikey
mikey
Joined: 22 Jan 05
Posts: 12715
Credit: 1839117411
RAC: 3616

James wrote: I looked at the

James wrote:

I looked at the wrong box...

It turns out one system as a 4 GB AMD graphics card, and the other a 2 GB NVIDIA graphics card.

On the 2 GB system I created a cc_config.xml file in C:\ProgramData\BOINC containing the following:

<cc_config>
    <exclude_gpu>
        <url>http://einstein.phys.uwm.edu/</url>
        <app>einstein_O2MDF</app>
      </exclude_gpu>
</cc_config>

I hope that is correct.

Regards,

Jim

It would be alot easier, I think, to put each box in a different venue ie home, work, school, and set the tasks allowed for each venue. No name changes messing things up either.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.