While some things are odd at Einstein right now, an oddity I don't understand is that one of my three machines is not requesting work since a few hours ago. At first I thought it was because it had too many tasks with upload pending. But over time the other two machines finished enough tasks to have even more pending, and they continue to request work.
I've reviewed the preferences comparison page (both project and computing) and don't see a mismatch among my three machines.
Yet consistently if I click "update project" the log includes this discouraging text:
[send] CPU: req 0.00 sec, 0.00 instances; est delay 0.00
[send] ATI: req 0.00 sec, 0.00 instances; est delay 0.00
[send] work_req_seconds: 0.00 secs
No, I don't have suspended tasks. I did have a time-of-day Compute only during a particular period each day, but I have disabled that the same way I did for the other two machines.
I even tried enabling CPU tasks, and clicked a type, but still get zero request both for CPU and for GPU.
I'm all ears for pet ways to get into (and out of) this condition.
Thanks
Copyright © 2024 Einstein@Home. All rights reserved.
Try the logging option
)
Try the logging option work_fetch_debug.
Keith Myers wrote: Try the
)
Thanks
I tried it the neophyte's way, by going to:
BOINCmgr|Options|Event Log options
I ticked the box for work_fetch_debug and hit Apply, then Save for good measure
Then did a project update.
The log gives two clear indications that excessive uploads is my problem:
8/27/2022 5:23:20 PM | Einstein@Home | [work_fetch] REC 350013.644 prio -1.000 can't request work: too many uploads in progress
is one, and the other is:
8/27/2022 5:23:20 PM | Einstein@Home | skip: too many uploads in progress
8/27/2022 5:23:20 PM | | [work_fetch] No project chosen for work fetch
I remain puzzled as to why the other two machines, which have more tasks in the "upload pending" state than does the troubled one have nevertheless continued to request work (and sometimes to receive it).
But I am much more hopeful that once the general project problem with pending uploads on MeerKAT tasks is fixed, this machine will probably resume requesting work. I'll leave it alone until then.
Thanks
archae86 wrote: I remain
)
Well, now a second machine has stopped getting work. This one actually has a message right in the ordinary message log (without a debug flag set) saying "Not requesting tasks: too many uploads in progress"
That second machine currently has 58 MeerKAT tasks uploading, while the first to stop has only 17. I remain a bit puzzled, but plan to be patient, and have good hope that sometime Monday things will probably clear up.
You can try increasing the
)
You can try increasing the number of connection allowed per host and per project in the cc_config.xml file.
The default is only 8 allowed connections per host and only 2 per project.
Try setting the max allowed to 32 and 16 connections for the projects.
As an example:
<max_file_xfers>18</max_file_xfers>
<max_file_xfers_per_project>8</max_file_xfers_per_project>
Then re-read your config files in the Manager.
Boinc has a built in feature
)
Boinc has a built in feature that when your hanging uploads reach the value of 2 x number of CPU cores, you won't be allowed to download any new work from that project. Good idea when no uploads are going thru, not so good if the upload problem concerns only one sub-project.
Harri Liljeroos wrote:Boinc
)
Thanks. You have explained the mystery of the difference in failure threshold of my three machines. As it turns out my three machines include a considerable range of "reported" CPU cores, as I upped the reported number on some in order to be able to get adequate daily work issue sometime in the past, and then forgot about it.
cores pending uploads
4 17
28 58
16 35
Yes, all three are currently not getting any work for this reason. I had all three set to get MeerKAT only. Now, of course, they can't get anything else until the upload problem gets fixed. I hope that may be Monday.
A short term fix could be to
)
A short term fix could be to use the ncpus variable in cc_config to assign a large number of CPU cores. Change the computer preferences to not get any MeerKAT BRP7 work before attempting to get more work, to prevent the situation getting worse.
mid you run any CPU work, you’ll need to adjust the max_concurrent setting for that application in the app_config to prevent too many CPU tasks from running on fake CPU cores.
_________________________________________________________________________
Ian&Steve C. wrote:A short
)
It took me a while to think of it, but I did think of all that (including turning off accepting Beta work, just in case) and was in the process of implementing it at the time you posted.
It worked.
Thanks