Sporatic validate errors using Parkes PMPS XT v1.57-BRP6-Beta-cuda55

Keith Myers
Keith Myers
Joined: 11 Feb 11
Posts: 4993
Credit: 18844549263
RAC: 5917729

I've found that both BRP4G

I've found that both BRP4G and BRP6 tasks respond best to GPU memory clock speed increases. More for the BRP6 Beta CUDA55 tasks which seem to benefit the most. So I would try bumping memory up first until you error, then back down and try some small bumps to core speed till you error.

Sorry about the hijack, I have a bad tendency about that.

 

cliff
cliff
Joined: 15 Feb 12
Posts: 176
Credit: 283452444
RAC: 0

Hi Keith, You may like to try

Hi Keith,
You may like to try assigning a diff project to one GPU using the 'exclude' command in cc-config.xml then simply suspend the added project, if you are monitoring the GPU's using SIV or OHM you will see which GPU goes back to basic running and which keeps on crunching.

BOINC and its picking out which GPU is which is a PITA, but it can be determined with a bit of work:-)

What I'd really like to know is why no 2 WU take even remotely the same time to crunch:-(
I've had BRP6 WU complete in 45mins, the next in 50mins and so on.. Every WU is diff from every other WU.. so working out how long it will take to work at 2x or 3x doesn't come up with a standard time..

Cliff,

Been there, Done that, Still no damm T Shirt.

archae86
archae86
Joined: 6 Dec 05
Posts: 3160
Credit: 7248159726
RAC: 1357364

RE: What I'd really like to

Quote:
What I'd really like to know is why no 2 WU take even remotely the same time to crunch:-(
I've had BRP6 WU complete in 45mins, the next in 50mins and so on.. Every WU is diff from every other WU.. so working out how long it will take to work at 2x or 3x doesn't come up with a standard time..


It is not a difference in the Einstein BRP6 work units. They have the same computational content, and on appropriately configured systems take remarkably similar times to complete.

When you have multiple tasks running, the question of which task gets swapped in next can be non-random if the support task running on the CPU is swapped in or not, and how the Windows scheduler decides which core to put it on, how "sticky" that assignment is, and such.

In short, the answer in is details of your machine and of the Windows scheduler, not in variations in the Work units themselves.

Some Einstein applications in the past have had substantial variation in WU work content, and that is of course also true at some other projects. My answer is narrowly about BRP6.

cliff
cliff
Joined: 15 Feb 12
Posts: 176
Credit: 283452444
RAC: 0

Hi Archae86, Well I am a

Hi Archae86,

Well I am a bit OCD where BOINC is concerned, ie I monitor my computer personally and watch it like a hawk:-)

And watching BOINC crunch 1x BRP6 on a single GPU [EVGA 980ti Hybrid] I see
WU alpha complete in xx mins and nn secs and then the very next WU complete in XX+n mins and NN+n seconds, and the diff in timing can be as much as 10 mins between WU..

Hence my question, if ALL WU are precisely the same size, why don't they take the same time to complete? And run x1 and on the same GPU since I use the 'exclude' command in my cc_config to use that particular GPU for E@H tasks only and reserve my other 980ti for MW@H all of which DO take exactly the same time to complete ie 31 seconds.

Its just E@H WU that vary and by a significant amount:-( Hence trying to determine a time to complete is next to impossible for any category of GPU task,
even BRP4G vary between WU by some time.
So either my computer is unique in its manner of working, and with my version of BOINC using properties one can see the throughput rate and it varies with each WU.. or there is summat else going on..

I've been running 1x for months as my old CPU was a bit iffy heat wise so running single instance made sense, now I have a brand new CPU and heat isn't so much of a problem I've decided to try out 2x and now 3x in the GPU, but once again I'm bugged by radically diff timings.

Cliff,

Been there, Done that, Still no damm T Shirt.

AgentB
AgentB
Joined: 17 Mar 12
Posts: 915
Credit: 513211304
RAC: 0

I'm not entirely sure what

I'm not entirely sure what projects you are running, but i might suggest picking one single application eg BRP6 and run it only at x1 with no other tasks.

If stable move to running at x2 etc.

i would also run cpu-z and gpu-z in a window over the length of several tasks - it may reveal some down clocking (heat related for example) which will affect times.

cliff
cliff
Joined: 15 Feb 12
Posts: 176
Credit: 283452444
RAC: 0

RE: When you have multiple

Quote:


When you have multiple tasks running, the question of which task gets swapped in next can be non-random if the support task running on the CPU is swapped in or not, and how the Windows scheduler decides which core to put it on, how "sticky" that assignment is, and such.

In short, the answer in is details of your machine and of the Windows scheduler, not in variations in the Work units themselves.

Some Einstein applications in the past have had substantial variation in WU work content, and that is of course also true at some other projects. My answer is narrowly about BRP6.


Well I now have 3 WU crunching.
PM138_671_44_1 running 3% faster than the other 2
PM138_00671_140_0 & PM138_00671_134_0 are running at 55.080% per hour, the faster WU is running at 61.920% per Hour....

And that faster WU was suspended for 5 mins in order to try to get it to complete at about the same time as the other 2, and will need to be suspended yet again to get a similar completion time so the next 3 will load at the same time..

its bleeding daft:-(

Cliff,

Been there, Done that, Still no damm T Shirt.

cliff
cliff
Joined: 15 Feb 12
Posts: 176
Credit: 283452444
RAC: 0

Hi Agent B, RE: I'm

Hi Agent B,

Quote:

I'm not entirely sure what projects you are running, but i might suggest picking one single application eg BRP6 and run it only at x1 with no other tasks.
E@H on GPU0 & MW@H on GPU1 I use 'exclude' to assign project WU to specific GPU's.

If stable move to running at x2 etc.

I ran 1x for quite a long time, with several GPU's and with 3 AMD CPU's, and the problem has existed since I started crunching E@H tasks..

i would also run cpu-z and gpu-z in a window over the length of several tasks - it may reveal some down clocking (heat related for example) which will affect times.

I use OHM to monitor both CPU and GPU temps, the only O/C is the GTX980ti mem P02 state using NVI and that's set to 3505Mhz for both GPU's.
Running GPUZ wouldn't give me any more info temp wise than OHM and running yet another program would simply tie up more CPU cycles.. OHM also monitors memory speed and GPU temps on the fly...

As for projects I run E@H and MW@H each on separate GPU's..

No CPU tasks are being run for any project on the machine in question.

Cliff,

Been there, Done that, Still no damm T Shirt.

mmonnin
mmonnin
Joined: 29 May 16
Posts: 292
Credit: 3444696540
RAC: 1881334

3505 would be your memory

3505 would be your memory speed on a 980Ti. Is the core speed stable or fluctuating its boost OC?

cliff
cliff
Joined: 15 Feb 12
Posts: 176
Credit: 283452444
RAC: 0

Hi mmonnin, RE: 3505

Hi mmonnin,

Quote:
3505 would be your memory speed on a 980Ti. Is the core speed stable or fluctuating its boost OC?

Core speed's are stable for both 980ti's

I'm using BOINC 7.6.22 and by highlighting each WU tin turn and using the properties tab I can check the throughput as BOINC sees it, and both WU are often diff, only the odd 2x pair manage to achieve the same throughput.

Dunno what exactly is up, but far too many WU are being completed at well diff speeds.

Cliff,

Been there, Done that, Still no damm T Shirt.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.