Run only GPU application

marsinph

Joined: 11 Jun 06

Posts: 18

Credit: 72627254

RAC: 0

9 Feb 2018 8:33:02 UTC

Topic 213266

(moderation:

)

What are the app who only use GPU (Or 1CPU + 1GPU ) ?

mmonnin

Joined: 29 May 16

Posts: 292

Credit: 3444726540

RAC: 6889

Select this app:"Gamma-ray

9 Feb 2018 13:02:56 UTC

Message 164211

(moderation:

)

Select this app:

"Gamma-ray pulsar binary search #1 (GPU)"

Uncheck the CPU option if you don't want to run them as well as this option:

"Run CPU versions of applications for which GPU versions are available:"

Some CPU is needed for E@H GPU apps. At 90% GPU load drops and calculations are run on the CPU.

Joined: 7 Jul 17

Posts: 3

Credit: 20018845

RAC: 0

greetings I run GPU only

9 Feb 2018 14:47:08 UTC

Message 164216

(moderation:

)

greetings I run GPU only work. Einstein on my AMD 390x gaming & Milkyway on my Nvidia 1060 and have set my GPU to 0.25, on Einstein project settings. It runs four iterations, on the ADM one on Nvidia. I have noticed on the AMD that my work product has a substantial amount of invalids. ..What could I change to reduce the invalids and still have a high utilization of GPU's

Richie

Joined: 7 Mar 14

Posts: 656

Credit: 1702989778

RAC: 0

P_2 wrote:I have noticed on

9 Feb 2018 16:20:31 UTC

Message 164220 in response to message 164216

(moderation:

)

P_2 wrote:

I have noticed on the AMD that my work product has a substantial amount of invalids. ..What could I change to reduce the invalids and still have a high utilization of GPU's

You could set it to run only 2 or 1 tasks at a time and see if that would bring down the amount of invalids. As you are running Windows on that host you could also easily monitor temperatures of those GPUs (with GPU-Z software or similar). That could give some hints how the cards are handling the load.

Gary Roberts

Moderator

Joined: 9 Feb 05

Posts: 5887

Credit: 119388779128

RAC: 25892084

mmonnin wrote:... At 90% GPU

9 Feb 2018 22:02:54 UTC

Message 164225 in response to message 164211

(moderation:

)

mmonnin wrote:

... At 90% GPU load drops and calculations are run on the CPU.

The calculations are performed in two stages which take approximately 90% and 10% of the total time respectively. The initial stage (performed in single precision) identifies potential candidate signals in the data. The ten most important candidates are then subjected to a re-evaluation in the 'followup' stage. This is performed in double precision. It is also performed on the GPU (not the CPU) if the GPU has double precision capability.

Most of the GPUs that are able to crunch Einstein tasks would likely have some level of double precision capability so I imagine that performing the followup stage on the CPU would be a relatively rare event. There are big tables in Wikipedia for both nvidia and AMD where you can check the double precision capability of any particular GPU.

Cheers,
Gary.

Joined: 7 Jul 17

Posts: 3

Credit: 20018845

RAC: 0

thx for your reply I have my

17 Feb 2018 6:13:36 UTC

Message 164329 in response to message 164220

(moderation:

)

thx for your reply I have my preferences setting for the GPU's at .33 this then runs three (.33x3)=.99 of gpu capacity. my temps for the amd are 167 degrees F & cpu 126 F these two 7 virtual of the cpu and .99 of the adm gpu run the einstein exclusively and the 1060 1.0 and runs 132.6 F .667of cpu milkyway exclusively only using less then 5% of the gpu's vram. overall the temps for the system MB M.2 ssd's are all never above 120 F. So all my components are within operating range and in my view are not accounting for the invalids. but a good question.

I have used this setup successfully at these temps performing dual use of GPU's at a much high vram and temps for extended duration in other applications.

What is a reasonable/acceptable % of invalids that the system expects across 100's of setups. Maybe the admin could post this or can I see that % in looking up other user account info. I haven't tried that but maybe I'm clicking along just fine. nearly at 15 million for the project.

archae86

Joined: 6 Dec 05

Posts: 3164

Credit: 7372621687

RAC: 2182223

P_2 wrote:What is a

17 Feb 2018 14:04:20 UTC

Message 164330 in response to message 164329

(moderation:

)

P_2 wrote:

What is a reasonable/acceptable % of invalids that the system expects across 100's of setups. Maybe the admin could post this or can I see that % in looking up other user account info. I haven't tried that but maybe I'm clicking along just fine. nearly at 15 million for the project.

Something on the order of 0.5% invalid, with appreciable fluctuation, seems to be normal behavior here. I don't think anyone really understands why.

Your system is not "clicking along just fine". Reviewing the Status column at:

https://einsteinathome.org/host/12621048/tasks/5/0?sort=desc&order=Sent

Not only do we see many entries of "Completed, marked as invalid", which is the usual notation when your returned work unit was compared with a quorum partner, and the two differed too much to be accepted, so a tie-breaker unit was sent to a third host, and that result agreed better with your original partner than with you.

But in your case we see many entries marked "Validate error". In this case your returned work flunked a sanity check which is performed after the quorum is ready for checking, but before comparison between units.

On healthy systems, the ratio of Invalid results to Valid results showing on the tasks page runs a few tenths of a percent, but zero of the invalid results are "validate error". On your system at a snapshot as I type the ratio of invalid results to valid results is 21%. Worse yet, I count 31 "validate error" units since the first of February (and some in that period will already have disappeared from the log because the WU has been cleared by other successful returns).

Your system is not healthy. The easiest way to get that result is to have the GPU running faster than it is able successfully to process the work under existing conditions, where "existing conditions" includes the specific application being run, the data presented to that application, the system temperature, the health of the GPUs own cooling provision, settings for fan speeds, settings for clock speeds, adequacy and good behavior of the system power supply, etc.

My personal suggestion is that you employ MSIAfterburner, or some other overclocking tool of your choice, to reduce your core clock and memory clock rates by 10% and monitor results for a day. If you see a drastic reduction in the rate of production of both types of invalid results, then you have confirmation that you are in fact running faster than you can--and you can work on the details.

Mind you--I suggest reducing the actual clock rates by 10%, not the overclock (if any). My advice applies even if you are already underclocking (though I doubt that).

archae86

Joined: 6 Dec 05

Posts: 3164

Credit: 7372621687

RAC: 2182223

In giving a rough typical

17 Feb 2018 15:08:01 UTC

Message 164331

(moderation:

)

In giving a rough typical invalid rate of 0.5%, I should have specified that I was speaking of Einstein GPU applications. It has been a while since I've been involved in Einstein CPU application work, but recall that healthy systems produced essentially zero invalid results.

Joined: 7 Jul 17

Posts: 3

Credit: 20018845

RAC: 0

thank you so much for your

22 Feb 2018 4:00:33 UTC

Message 164373 in response to message 164331

(moderation:

)

thank you so much for your extended reply. I have since reduced the raito for .25 to .33 then finally today .5 considering going to 1.0. I use a Gigabyte specific under-clock which I was running about 5% my temps now for 390x and 1060 61c and 45c respectively.

I will reduce this further by going to 1.0 tonight since I reread your post.

Tom M

Joined: 2 Feb 06

Posts: 6820

Credit: 9752931400

RAC: 2455447

Gary Roberts wrote:mmonnin

26 Jul 2020 14:19:25 UTC

Message 179194 in response to message 164225

(moderation:

)

Gary Roberts wrote:

mmonnin wrote:
... At 90% GPU load drops and calculations are run on the CPU.
The calculations are performed in two stages which take approximately 90% and 10% of the total time respectively. The initial stage (performed in single precision) identifies potential candidate signals in the data. The ten most important candidates are then subjected to a re-evaluation in the 'followup' stage. This is performed in double precision. It is also performed on the GPU (not the CPU) if the GPU has double precision capability. Most of the GPUs that are able to crunch Einstein tasks would likely have some level of double precision capability so I imagine that performing the followup stage on the CPU would be a relatively rare event. There are big tables in Wikipedia for both nvidia and AMD where you can check the double precision capability of any particular GPU.

Gary,

I ran across an AMD video card with Rx 470-class speed single precision but huge double-precision capacity (much higher than RX 470). I am guessing that speeding up the last 10% would not be a major time gain? (Currently, all my AMD cards snap from 90% to 100% practically instantly) on GR #1 gpu tasks.

Tom M

A Proud member of the O.F.A. (Old Farts Association).

Gary Roberts

Moderator

Joined: 9 Feb 05

Posts: 5887

Credit: 119388779128

RAC: 25892084

Tom M wrote:... I am guessing

26 Jul 2020 20:10:49 UTC

Message 179198 in response to message 179194

(moderation:

)

Tom M wrote:

... I am guessing that speeding up the last 10% would not be a major time gain?

The comment you quoted was written a long time ago when the app behaved differently to what happens today. If I remember correctly, the followup stage used to take around 20-40 seconds. These days there is hardly any delay at all - as you have noted.

A high double precision capability will have essentially no effect on crunch time.

Cheers,
Gary.

Run only GPU application

Forums › Getting Started

Comment viewing options

Forums › Getting Started