Gamma-ray pulsar binary search #1 on GPUs

walton748

Joined: 1 Mar 10

Posts: 94

Credit: 1506564710

RAC: 3208657

Matt,it may just have

24 Jan 2017 20:07:55 UTC

Message 154583 in response to message 154581

(moderation:

)

Matt,

it may just have failed and reset by itself then, but according to what I experimented BOINC would not (necessarily) recover. That's why I asked.

As I said, I have not experienced yet what you have experienced, I am rather trying "to get a picture" around new NVidia Card/new technology of the card (FinFET-Process/new Einstein app and some observations that range from "a bit annoying" to "outright disturbing".

Did you reboot the machine meantimes? If not, can you restart BOINC and check the log messages if it detects your card?

<edit> Oh, just realized that you already do gpu-work, even if for another project, so it must</edit>

Cheers,

Walton

Kailee71

Joined: 22 Nov 16

Posts: 35

Credit: 42623563

RAC: 0

archae86 wrote:Kailee71,

25 Jan 2017 7:28:19 UTC

Message 154599 in response to message 154457

(moderation:

)

archae86 wrote:

Kailee71, Mad_Max,

I have at times seen cases where one of my cards has generated an error WU and continued to generate errors (usually very fast on the subsequent ones, say 12 seconds) until reboot.

Hi all,

It's unfortunately happened again. https://einsteinathome.org/task/605504851

Would really appreciate if someone could track down the problem; when it happens that machines gets put on the naughty step for a whole day :-(

OSX 10.11.6., R9 280x, 2 WU/GPU, 12 cores available (that's 24 threads...) doing nothing else. This used to be rocksolid...

Many thanks in advance for any pointers,

Kailee.

archae86

Joined: 6 Dec 05

Posts: 3157

Credit: 7224284931

RAC: 1015785

Kai Leibrandt wrote:Many

25 Jan 2017 14:23:37 UTC

Message 154611 in response to message 154599

(moderation:

)

Kai Leibrandt wrote:

Many thanks in advance for any pointers

I think the number one candidate would be to try reducing clock rates (both core clock and memory clock).

Regarding the undesirable zippering effect in which your machine can dispose of its entire current queue in a few seconds each and then request more work and burn through that until you can't get anymore for a day, during experimentation or periods of concern I sometimes use the practice of getting a reasonable amount of work on board for current purposes and then placing a suspend on the single task with deadline farthest in the future. This limits the damage.

Kailee71

Joined: 22 Nov 16

Posts: 35

Credit: 42623563

RAC: 0

Just a thought - would it not

25 Jan 2017 18:19:00 UTC

Message 154624 in response to message 154611

(moderation:

)

Just a thought - would it not be possible to have boinc do some sanity checks? I.e. if a certain number of tasks error out at least give the user the option to automatically stop asking or more until it's sorted out?

Re: reducing clock rates - I wouldn't mind doing this but under OSX the only way to achieve this is via flashing, and I'm not brave enough for that...

Thanks for your thoughts,

Kailee.

TimeLord04

Joined: 8 Sep 06

Posts: 1442

Credit: 72378840

RAC: 0

@Kailee, Are you still

25 Jan 2017 20:00:04 UTC

Message 154628

(moderation:

)

@Kailee,

Are you still picking up 1.19 Units??? I have yet to receive even one of them on my MAC. I'm still picking up 1.17 Units.

My system is a MAC Pro 3,1, (equivalent), system, (hardware-wise), and is on El Capitan 10.11.4. I have 16 GB DDR2 at 800 MHz and Dual Channel. One 1 TB Western Digital drive with MAC OS, and one 1 TB Western Digital drive with Win 7 Pro x64. Two EVGA GTX-750TI SC cards with 2 GB GDDR5 video RAM. I have the appropriate Alternate NVIDIA Driver, and CUDA Driver for the OS.

Like you, (because of MAC OSX), I cannot monitor, nor manipulate clock speeds, nor fan speeds for the GPUs.

TimeLord04
Have TARDIS, will travel...
Come along K-9!
Join SETI Refugees

Kailee71

Joined: 22 Nov 16

Posts: 35

Credit: 42623563

RAC: 0

TimeLord04

26 Jan 2017 3:18:14 UTC

Message 154647 in response to message 154628

(moderation:

)

TimeLord04 wrote:

@Kailee,

Are you still picking up 1.19 Units??? I have yet to receive even one of them on my MAC. I'm still picking up 1.17 Units.

No I have only been getting 1.17 since the 19th of Jan. I only got a few 1.19s and 1.18, but error rate was very high. As the 1.17s seems to have the zippering effect I need to keep an eye on those also now. For me just rebooting is also not enough, I need to reset the project or it will keep erroring out after just a few seconds of work (typically 12-15s, but after a reboot some will run for 200-300s and then crash, and a project reset then fixes it).

Kailee.

TimeLord04

Joined: 8 Sep 06

Posts: 1442

Credit: 72378840

RAC: 0

Kai Leibrandt

26 Jan 2017 6:18:51 UTC

Message 154651 in response to message 154647

(moderation:

)

Kai Leibrandt wrote:

TimeLord04 wrote:
@Kailee,

Are you still picking up 1.19 Units??? I have yet to receive even one of them on my MAC. I'm still picking up 1.17 Units.

No I have only been getting 1.17 since the 19th of Jan. I only got a few 1.19s and 1.18, but error rate was very high. As the 1.17s seems to have the zippering effect I need to keep an eye on those also now. For me just rebooting is also not enough, I need to reset the project or it will keep erroring out after just a few seconds of work (typically 12-15s, but after a reboot some will run for 200-300s and then crash, and a project reset then fixes it).

Kailee.

For me, since the inception of 1.17 Units, the 1.17 Units have been stable. Due to the MAC OS OpenCL Bug, (noted in my prior posts - brought to light by TBar at SETI), I've had quite a few Invalids show up on my NVIDIA cards. Errors; however, have been 0. At present, my Invalids have dropped to 0, and no Inconclusives are showing in Pending Units; however, this could change again at any time. Since 1.12 onward, Invalids have been prevalent; however, 1.17 seems to generate fewer of them. (Unlike at SETI where MANY MORE Inconclusives show up and a good portion of them turn into Invalids.)

I hope you find an answer, soon. I'm also enjoying the higher OS stability of MAC OS over Windows. I just wish they'd come up with a utility to monitor and adjust GPU Fan Speeds at the least, and Clock Speeds would be beneficial as shown in your case. You'd think, (for NVIDIA), that it wouldn't be hard to port over PrecisionX; but...

TimeLord04
Have TARDIS, will travel...
Come along K-9!
Join SETI Refugees

Alexander Favorsky

Joined: 18 Jun 16

Posts: 36

Credit: 176411761

RAC: 74911

Hi everyone!Recently there

28 Jan 2017 2:48:34 UTC

Message 154733

(moderation:

)

Hi everyone!

Recently there are very few FGRPB1G apps (version 1.18 Beta for Windows, NVidia) sent to me. I receive the following messages: 'No work sent' and 'See scheduler log messages on https://einsteinathome.org/host/12298595/log'. Also there are lots of 'Only one Beta app version result per WU' messages in the log.

What does all this mean?

Defender

Joined: 17 Jul 12

Posts: 19

Credit: 316381193

RAC: 83951

That's because of a general

28 Jan 2017 6:19:37 UTC

Message 154740

(moderation:

)

That's because of a general lack of beta-WUs, that shouldn't be validated against each other. It also has been described in other threads. Don't worry, it's not your fault.

Proud member of SETI.Germany

CElliott

Joined: 9 Feb 05

Posts: 28

Credit: 1001596436

RAC: 640737

@TimeLord04 Thank you for

29 Jan 2017 14:12:19 UTC

Message 154792 in response to message 154486

(moderation:

)

@TimeLord04

Thank you for your detailed help. Although it is great for energy conservation, cost less, and costs less to operate, my CPU does not have hyperthreading. Perhaps because of that I saw a decline in WUs processed per day when I operated two per GPU. The CPU %, as indicated by Boinc Tasks, declined from 99.xx to 66.xx. I had to return to one WU per GPU. Nevertheless, I greatly appreciate the time you took. Thanks again.

Gamma-ray pulsar binary search #1 on GPUs

Forums › Technical News

Comment viewing options

Forums › Technical News