At around the same time that O3MD1 tasks started flowing again, i stopped receiving O3MDF tasks for my NVIDIA GPU under Linux. Is anyone else seeing this issue? Any ideas? Host is https://einsteinathome.org/host/12844421
At around the same time that O3MD1 tasks started flowing again, i stopped receiving O3MDF tasks for my NVIDIA GPU under Linux. Is anyone else seeing this issue? Any ideas? Host is https://einsteinathome.org/host/12844421
Yes, though my case is a bit odd.
I have three hosts running Einstein, and with the "fixed" application the one that formerly errored all O3 GPU units in early December was now able to run them to completion and validation. Initially all three hosts got a really large fraction of GW tasks relative to BRP tasks, so as a means of throttling I turned off O3 task download for all but about an hour a day.
But a couple of days ago or so, this resulted in zero O3 tasks during the hour I permitted both. The next day, as a test, I turned off BRP permission, and still got zero O3 tasks in rather more than an hour. As gazillions of O3 tasks show as ready to send, it seems something thought my system in some way unsuitable.
While composing this comment, I've switched preferences again temporarily only to request O3 GPU tasks. I'll see whether any come now.
[edit to add observations:
After more than an hour with all three hosts requesting O3 GPU work only repeatedly, zero O3 tasks were sent.
Here are what I imagine are the relevant lines from the work request log from one of those hosts late in this hour:
Quote:
[send] Not using matchmaker scheduling; Not using EDF sim
[send] CPU: req 0.00 sec, 0.00 instances; est delay 0.00
[send] ATI: req 8725.21 sec, 0.00 instances; est delay 0.00
[send] work_req_seconds: 0.00 secs
[send] available disk 58.75 GB, work_buf_min 172800
[send] active_frac 0.869587 on_frac 0.999825 DCF 0.196890
[mixed] sending locality work first (0.0542)
[send] send_old_work() no feasible result older than 336.0 hours
[send] send_old_work() no feasible result younger than 208.7 hours and older than 168.0 hours
[mixed] sending non-locality work second
[send] [HOST#12260865] will accept beta work. Scanning for beta work.
[debug] [HOST#12260865] MSG(high) No work sent
Sending reply to [HOST#12260865]: 0 results, delay req 60.00
Scheduler ran 12.210 seconds
I've decided to request FGRP only again for a while pending resolution of this situation]
At around the same time that O3MD1 tasks started flowing again, i stopped receiving O3MDF tasks for my NVIDIA GPU under Linux. Is anyone else seeing this issue? Any ideas? Host is https://einsteinathome.org/host/12844421
The same here. No O3MDF-Tasks for my NVIDIA-GPUs for a few days now.
Thanks, I was running 3 tasks at a time but now I'm running just one on all GPU models. So far so good.
This project DLs far too many WUs and so they quickly trigger Running High Priority. This sometimes switches a running WU to Waiting and so with 3 WUs running and one or two Waiting it may have wanted too much VRAM.
If the supply is going to be continuous might be a good idea to run in RZM.
There was a problem with the project configuration that was fixed just minutes ago. It should work again now.
I confirm that all three of my hosts received new GW O3 GPU work after the change today. They had not received any since seven days earlier, with the last at 14:26 UTC on January 9.
No, it doesn't. The project tries to supply exactly what the client asks for. Your client needs to stop asking :-).
You have to figure out why the client is asking for so much work that high priority mode is being triggered. Because of things like you describe, you really, really, really don't want to allow the client to go into high priority mode (panic mode). Things can become really complicated if you run multiple projects, multiple searches per project and asymmetric resource shares. Perhaps as a first step you might review the settings for work cache size to see if a reduction there lowers the amount of work on hand for Einstein to a point where panic mode is never triggered.
At around the same time that
)
At around the same time that O3MD1 tasks started flowing again, i stopped receiving O3MDF tasks for my NVIDIA GPU under Linux. Is anyone else seeing this issue? Any ideas? Host is https://einsteinathome.org/host/12844421
Vato wrote:At around the
)
Yes, though my case is a bit odd.
I have three hosts running Einstein, and with the "fixed" application the one that formerly errored all O3 GPU units in early December was now able to run them to completion and validation. Initially all three hosts got a really large fraction of GW tasks relative to BRP tasks, so as a means of throttling I turned off O3 task download for all but about an hour a day.
But a couple of days ago or so, this resulted in zero O3 tasks during the hour I permitted both. The next day, as a test, I turned off BRP permission, and still got zero O3 tasks in rather more than an hour. As gazillions of O3 tasks show as ready to send, it seems something thought my system in some way unsuitable.
While composing this comment, I've switched preferences again temporarily only to request O3 GPU tasks. I'll see whether any come now.
[edit to add observations:
After more than an hour with all three hosts requesting O3 GPU work only repeatedly, zero O3 tasks were sent.
Here are what I imagine are the relevant lines from the work request log from one of those hosts late in this hour:
I've decided to request FGRP only again for a while pending resolution of this situation]
Vato schrieb:At around the
)
The same here. No O3MDF-Tasks for my NVIDIA-GPUs for a few days now.
I can confirm this. Still
)
I can confirm this. Still haven't received any O3MDF today.
On my CPU (AMD 3700x, @45 W, 8 tasks) the O3MD1 tasks take about 21 hours each.
There was a problem with the
)
There was a problem with the project configuration that was fixed just minutes ago. It should work again now.
BM
What is Erorr 1152 and can I
)
What is Erorr 1152 and can I do anything to alleviate it?
https://einsteinathome.org/task/1409447039
Aurum wrote: What is Erorr
)
you need to look at the first error in the chain. everything after that is just cascading errors as fallout.
your real issue is this:
you ran out of VRAM. if you're trying to run 4x tasks, it wont work. there is only enough VRAM on the 3080ti for 3x tasks.
_________________________________________________________________________
Thanks, I was running 3 tasks
)
Thanks, I was running 3 tasks at a time but now I'm running just one on all GPU models. So far so good.
This project DLs far too many WUs and so they quickly trigger Running High Priority. This sometimes switches a running WU to Waiting and so with 3 WUs running and one or two Waiting it may have wanted too much VRAM.
If the supply is going to be continuous might be a good idea to run in RZM.
Bernd Machenschalk
)
I confirm that all three of my hosts received new GW O3 GPU work after the change today. They had not received any since seven days earlier, with the last at 14:26 UTC on January 9.
Aurum wrote:This project DLs
)
No, it doesn't. The project tries to supply exactly what the client asks for. Your client needs to stop asking :-).
You have to figure out why the client is asking for so much work that high priority mode is being triggered. Because of things like you describe, you really, really, really don't want to allow the client to go into high priority mode (panic mode). Things can become really complicated if you run multiple projects, multiple searches per project and asymmetric resource shares. Perhaps as a first step you might review the settings for work cache size to see if a reduction there lowers the amount of work on hand for Einstein to a point where panic mode is never triggered.
Cheers,
Gary.