Many problems to do with work fetch decisions made by the scheduling server can be diagnosed more readily if the scheduler logs are consulted. For new and/or casual volunteers, just knowing about the logs, let alone actually finding and using them, is probably a non-trivial exercise. The actual interpretation of the confusingly cryptic entries themselves is a further complication. I'm attempting, with this message, to make all of this a little easier if possible :-).
The logs themselves are really a tool for the staff to use in their day-to-day work. They have very kindly made them accessible to all and sundry. They are necessarily cryptic and in many cases you will ignore a lot of the content and look for the one key entry that will explain the cause of your particular difficulty. If you have just a small number of computers, you can find the server log for a particular host by starting at your account -> dashboard and clicking on the link to the host in question that you will see there. This will bring up a 'details' page for that particular host on which you will see a link next to the heading "Last time contacted server:". If you follow that link you will see the full log entry. The text of the link itself is the time and date of that particular contact event.
If you have a lot of hosts (more than about seven) you should start with the "MORE" link at the bottom of the dashboard list so that you can find all your hosts in pages of 20 at a time. In this full list, the far right hand column is called "Last Contact" and contains the server log links for all hosts. Just click the link for the host you're interested in.
Now that you know that logs exist and how you can see them, let's look at some interpretation. I've chosen an 'easy to interpret' log relating to a question asked on the "Problems and bug reports" board quite recently. The purpose here is just to show some basic things to look for in the log itself so the actual question posted isn't important. Here are the first four records in full to show all the fields. The first 3 fields in each record are the date, time, and process ID (PID) of the scheduler instance making the log entries. These fields are not needed for interpretation so the full log will be reproduced later with them edited out for simplicity. For ease of reference I've inserted a record number at the start of each record.
2017-01-14 15:58:25.9427 [PID=28383] Request: [USER#xxxxx] [HOST#12211894] [IP xxx.xxx.xxx.147] client 7.6.33 2017-01-14 15:58:26.0802 [PID=28383] [debug] have_master:1 have_working: 1 have_db: 1 2017-01-14 15:58:26.0802 [PID=28383] [debug] using working prefs 2017-01-14 15:58:26.0802 [PID=28383] [debug] have db 1; dbmod 1484401578.000000; global mod 1484401578.000000 ....
So now here is the full log, minus the first 3 fields, but complete with the added record numbers. I've tried to format it so that there's no line wrapping occurring - I'm not using [ code ] tags this time, for example, so as not to overflow any lines.
1. Request: [USER#xxxxx] [HOST#12211894] [IP xxx.xxx.xxx.147] client 7.6.33
2. [debug] have_master:1 have_working: 1 have_db: 1
3. [debug] using working prefs
4. [debug] have db 1; dbmod 1484401578.000000; global mod 1484401578.000000
5. [send] effective_ncpus 3 max_jobs_on_host_cpu 999999 max_jobs_on_host 999999
6. [send] effective_ngpus 1 max_jobs_on_host_gpu 999999
7. [send] Not using matchmaker scheduling; Not using EDF sim
8. [send] CPU: req 0.00 sec, 0.00 instances; est delay 0.00
9. [send] CUDA: req 14550.53 sec, 0.17 instances; est delay 0.00
10. [send] work_req_seconds: 0.00 secs
11. [send] available disk 17.44 GB, work_buf_min 17280
12. [send] active_frac 0.999988 on_frac 0.991793 DCF 1.000000
13. [send] [HOST#12211894] not reliable; max_result_day 31
14. [send] set_trust: random choice for error rate 0.000023: yes
15. [mixed] sending locality work first (0.7617)
16. [mixed] sending non-locality work second
17. [version] Checking plan class 'FGRPopencl-ati'
18. [version] reading plan classes from file '/BOINC/projects/EinsteinAtHome/plan_class_spec.xml'
19. [version] parsed project prefs setting 'gpu_util_fgrp': 1.000000
20. [version] No ATI devices found
21. [version] Checking plan class 'FGRPopencl-Beta-ati'
22. [version] beta test app versions not allowed in project prefs.
23. [version] Checking plan class 'FGRPopencl-Beta-nvidia'
24. [version] beta test app versions not allowed in project prefs.
25. [version] Checking plan class 'FGRPopencl-nvidia'
26. [version] parsed project prefs setting 'gpu_util_fgrp': 1.000000
27. [version] NVidia device (or driver) doesn't support OpenCL
28. [version] no app version available: APP#40 (hsgamma_FGRPB1G) PLATFORM#9 (windows_x86_64) min_version 0
29. [version] no app version available: APP#40 (hsgamma_FGRPB1G) PLATFORM#2 (windows_intelx86) min_version 0
30. [debug] [HOST#12211894] MSG(high) No work sent
31. [debug] [HOST#12211894] MSG(high) No work is available for Binary Radio Pulsar Search (Arecibo, GPU)
32. [debug] [HOST#12211894] MSG(high) see scheduler log messages on https://einsteinathome.org/host/12211894/log
33. Sending reply to [HOST#12211894]: 0 results, delay req 60.00
34. Scheduler ran 0.170 seconds
Now for some interpretation. I should point out that there is no 'Technical Reference Manual' to consult so all of what follows is guesswork (hopefully educated) on my part :-). I would welcome corrections from anyone (eg. staff) who knows better. Many of the records have an initial 'flag value', eg. [debug], [send], [version], etc. I think these just indicate a particular section of the code and what its particular purpose is. These are my guesses:-
Finally, some extra comments about the [send] flag. There will be some comment about how many CPU cores (ncpus - #5) and how many co-processor devices (ngpus - #6) there are that BOINC has permission to use. If this is not the same as your actual hardware, chances are you have a preference setting somewhere that is interfering - the above machine is a quad core but the log shows ncpus=3 for example. Probably BOINC has been restricted to 75% of the CPU cores. Also in this area, it will show exactly how many seconds of work are being requested by the BOINC client - 0.00 secs of CPU work (#8) and 14550.53 secs of GPU work (#9). I guess the 'CUDA' is a hangover from the past when all NVIDIA GPUs ran apps that used CUDA rather than OpenCL. I don't know why #10 says 0.00 secs. Maybe 'work_req_seconds' just excludes any GPU request.
As I mentioned, this is a pretty straightforward scheduler log. If you have examples of more complex entries that you can't interpret, please feel free to post them and maybe a staff member might comment if other volunteers can't work out what is going on. Hopefully, people getting started might find this useful.
Cheers,
Gary.
Copyright © 2024 Einstein@Home. All rights reserved.
Thank you for that tip! New
)
Thank you for that tip! New users won't have to care about it, whereas for others interesting details may be disclosed.
Somewhere one may find a _trusted_ tag. With that, the servers seems to categorise a machine as either trusted or not trusted. Currently, for FGRPB1 tasks, the threshold value appears to be 0.05. Apparently task results are no longer checked by a second machine once the first machine produces less than one invalid in 20 valid results. That's quite a speedup of the pulsar search project.
Can you get NNT from e@home ,
)
Can you get NNT from e@home , on cell phone or have to go to BOINC Manager?
No you can use the RPC
)
No you can use the RPC command utility boinccmd provided with all BOINC installations.
You can set no new tasks via the rpc terminal command.
Boinccmd tool - BOINC
boinccmd --project http://einstein.phys.uwm.edu/ nomorework