it may just have failed and reset by itself then, but according to what I experimented BOINC would not (necessarily) recover. That's why I asked.
As I said, I have not experienced yet what you have experienced, I am rather trying "to get a picture" around new NVidia Card/new technology of the card (FinFET-Process/new Einstein app and some observations that range from "a bit annoying" to "outright disturbing".
Did you reboot the machine meantimes? If not, can you restart BOINC and check the log messages if it detects your card?
<edit> Oh, just realized that you already do gpu-work, even if for another project, so it must</edit>
I have at times seen cases where one of my cards has generated an error WU and continued to generate errors (usually very fast on the subsequent ones, say 12 seconds) until reboot.
I think the number one candidate would be to try reducing clock rates (both core clock and memory clock).
Regarding the undesirable zippering effect in which your machine can dispose of its entire current queue in a few seconds each and then request more work and burn through that until you can't get anymore for a day, during experimentation or periods of concern I sometimes use the practice of getting a reasonable amount of work on board for current purposes and then placing a suspend on the single task with deadline farthest in the future. This limits the damage.
Just a thought - would it not be possible to have boinc do some sanity checks? I.e. if a certain number of tasks error out at least give the user the option to automatically stop asking or more until it's sorted out?
Re: reducing clock rates - I wouldn't mind doing this but under OSX the only way to achieve this is via flashing, and I'm not brave enough for that...
Are you still picking up 1.19 Units??? I have yet to receive even one of them on my MAC. I'm still picking up 1.17 Units.
My system is a MAC Pro 3,1, (equivalent), system, (hardware-wise), and is on El Capitan 10.11.4. I have 16 GB DDR2 at 800 MHz and Dual Channel. One 1 TB Western Digital drive with MAC OS, and one 1 TB Western Digital drive with Win 7 Pro x64. Two EVGA GTX-750TI SC cards with 2 GB GDDR5 video RAM. I have the appropriate Alternate NVIDIA Driver, and CUDA Driver for the OS.
Like you, (because of MAC OSX), I cannot monitor, nor manipulate clock speeds, nor fan speeds for the GPUs.
Are you still picking up 1.19 Units??? I have yet to receive even one of them on my MAC. I'm still picking up 1.17 Units.
No I have only been getting 1.17 since the 19th of Jan. I only got a few 1.19s and 1.18, but error rate was very high. As the 1.17s seems to have the zippering effect I need to keep an eye on those also now. For me just rebooting is also not enough, I need to reset the project or it will keep erroring out after just a few seconds of work (typically 12-15s, but after a reboot some will run for 200-300s and then crash, and a project reset then fixes it).
Are you still picking up 1.19 Units??? I have yet to receive even one of them on my MAC. I'm still picking up 1.17 Units.
No I have only been getting 1.17 since the 19th of Jan. I only got a few 1.19s and 1.18, but error rate was very high. As the 1.17s seems to have the zippering effect I need to keep an eye on those also now. For me just rebooting is also not enough, I need to reset the project or it will keep erroring out after just a few seconds of work (typically 12-15s, but after a reboot some will run for 200-300s and then crash, and a project reset then fixes it).
Kailee.
For me, since the inception of 1.17 Units, the 1.17 Units have been stable. Due to the MAC OS OpenCL Bug, (noted in my prior posts - brought to light by TBar at SETI), I've had quite a few Invalids show up on my NVIDIA cards. Errors; however, have been 0. At present, my Invalids have dropped to 0, and no Inconclusives are showing in Pending Units; however, this could change again at any time. Since 1.12 onward, Invalids have been prevalent; however, 1.17 seems to generate fewer of them. (Unlike at SETI where MANY MORE Inconclusives show up and a good portion of them turn into Invalids.)
I hope you find an answer, soon. I'm also enjoying the higher OS stability of MAC OS over Windows. I just wish they'd come up with a utility to monitor and adjust GPU Fan Speeds at the least, and Clock Speeds would be beneficial as shown in your case. You'd think, (for NVIDIA), that it wouldn't be hard to port over PrecisionX; but...
Recently there are very few FGRPB1G apps (version 1.18 Beta for Windows, NVidia) sent to me. I receive the following messages: 'No work sent' and 'See scheduler log messages on https://einsteinathome.org/host/12298595/log'. Also there are lots of 'Only one Beta app version result per WU' messages in the log.
That's because of a general lack of beta-WUs, that shouldn't be validated against each other. It also has been described in other threads. Don't worry, it's not your fault.
Thank you for your detailed help. Although it is great for energy conservation, cost less, and costs less to operate, my CPU does not have hyperthreading. Perhaps because of that I saw a decline in WUs processed per day when I operated two per GPU. The CPU %, as indicated by Boinc Tasks, declined from 99.xx to 66.xx. I had to return to one WU per GPU. Nevertheless, I greatly appreciate the time you took. Thanks again.
Matt,it may just have
)
Matt,
it may just have failed and reset by itself then, but according to what I experimented BOINC would not (necessarily) recover. That's why I asked.
As I said, I have not experienced yet what you have experienced, I am rather trying "to get a picture" around new NVidia Card/new technology of the card (FinFET-Process/new Einstein app and some observations that range from "a bit annoying" to "outright disturbing".
Did you reboot the machine meantimes? If not, can you restart BOINC and check the log messages if it detects your card?
<edit> Oh, just realized that you already do gpu-work, even if for another project, so it must</edit>
Cheers,
Walton
archae86 wrote:Kailee71,
)
Hi all,
It's unfortunately happened again. https://einsteinathome.org/task/605504851
Would really appreciate if someone could track down the problem; when it happens that machines gets put on the naughty step for a whole day :-(
OSX 10.11.6., R9 280x, 2 WU/GPU, 12 cores available (that's 24 threads...) doing nothing else. This used to be rocksolid...
Many thanks in advance for any pointers,
Kailee.
Kai Leibrandt wrote:Many
)
I think the number one candidate would be to try reducing clock rates (both core clock and memory clock).
Regarding the undesirable zippering effect in which your machine can dispose of its entire current queue in a few seconds each and then request more work and burn through that until you can't get anymore for a day, during experimentation or periods of concern I sometimes use the practice of getting a reasonable amount of work on board for current purposes and then placing a suspend on the single task with deadline farthest in the future. This limits the damage.
Just a thought - would it not
)
Just a thought - would it not be possible to have boinc do some sanity checks? I.e. if a certain number of tasks error out at least give the user the option to automatically stop asking or more until it's sorted out?
Re: reducing clock rates - I wouldn't mind doing this but under OSX the only way to achieve this is via flashing, and I'm not brave enough for that...
Thanks for your thoughts,
Kailee.
@Kailee, Are you still
)
@Kailee,
Are you still picking up 1.19 Units??? I have yet to receive even one of them on my MAC. I'm still picking up 1.17 Units.
My system is a MAC Pro 3,1, (equivalent), system, (hardware-wise), and is on El Capitan 10.11.4. I have 16 GB DDR2 at 800 MHz and Dual Channel. One 1 TB Western Digital drive with MAC OS, and one 1 TB Western Digital drive with Win 7 Pro x64. Two EVGA GTX-750TI SC cards with 2 GB GDDR5 video RAM. I have the appropriate Alternate NVIDIA Driver, and CUDA Driver for the OS.
Like you, (because of MAC OSX), I cannot monitor, nor manipulate clock speeds, nor fan speeds for the GPUs.
TL
TimeLord04
Have TARDIS, will travel...
Come along K-9!
Join SETI Refugees
TimeLord04
)
No I have only been getting 1.17 since the 19th of Jan. I only got a few 1.19s and 1.18, but error rate was very high. As the 1.17s seems to have the zippering effect I need to keep an eye on those also now. For me just rebooting is also not enough, I need to reset the project or it will keep erroring out after just a few seconds of work (typically 12-15s, but after a reboot some will run for 200-300s and then crash, and a project reset then fixes it).
Kailee.
Kai Leibrandt
)
For me, since the inception of 1.17 Units, the 1.17 Units have been stable. Due to the MAC OS OpenCL Bug, (noted in my prior posts - brought to light by TBar at SETI), I've had quite a few Invalids show up on my NVIDIA cards. Errors; however, have been 0. At present, my Invalids have dropped to 0, and no Inconclusives are showing in Pending Units; however, this could change again at any time. Since 1.12 onward, Invalids have been prevalent; however, 1.17 seems to generate fewer of them. (Unlike at SETI where MANY MORE Inconclusives show up and a good portion of them turn into Invalids.)
I hope you find an answer, soon. I'm also enjoying the higher OS stability of MAC OS over Windows. I just wish they'd come up with a utility to monitor and adjust GPU Fan Speeds at the least, and Clock Speeds would be beneficial as shown in your case. You'd think, (for NVIDIA), that it wouldn't be hard to port over PrecisionX; but...
TL
TimeLord04
Have TARDIS, will travel...
Come along K-9!
Join SETI Refugees
Hi everyone!Recently there
)
Hi everyone!
Recently there are very few FGRPB1G apps (version 1.18 Beta for Windows, NVidia) sent to me. I receive the following messages: 'No work sent' and 'See scheduler log messages on https://einsteinathome.org/host/12298595/log'. Also there are lots of 'Only one Beta app version result per WU' messages in the log.
What does all this mean?
That's because of a general
)
That's because of a general lack of beta-WUs, that shouldn't be validated against each other. It also has been described in other threads. Don't worry, it's not your fault.
Proud member of SETI.Germany
@TimeLord04 Thank you for
)
@TimeLord04
Thank you for your detailed help. Although it is great for energy conservation, cost less, and costs less to operate, my CPU does not have hyperthreading. Perhaps because of that I saw a decline in WUs processed per day when I operated two per GPU. The CPU %, as indicated by Boinc Tasks, declined from 99.xx to 66.xx. I had to return to one WU per GPU. Nevertheless, I greatly appreciate the time you took. Thanks again.