Scheduler borked for Rpi

PorkyPies
PorkyPies
Joined: 27 Apr 16
Posts: 197
Credit: 28796567
RAC: 25634
Topic 218524

I have a bunch or Rpi that are unable to get any work all of a sudden. The server status page says there are 10k BRP4 work units available. I haven't changed anything recently. They normally live on a diet of BRP4 work units running the 1.47 beta app. The new O1OD1E is also classed as a beta app. I suspect the recent change for the O1OD1 engineering run has broken it. Below is a cut and paste of a recent scheduler log from one of the 12 Rpi.

 

2019-03-29 11:24:57.9440 [PID=28078]   Request: [USER#xxxxx] [HOST#12639053] [IP xxx.xxx.xxx.69] client 7.6.33
2019-03-29 11:24:57.9859 [PID=28078] [debug]   have_master:1 have_working: 1 have_db: 1
2019-03-29 11:24:57.9859 [PID=28078] [debug]   using working prefs
2019-03-29 11:24:57.9859 [PID=28078] [debug]   have db 1; dbmod 1461748411.000000; global mod 1461748411.000000
2019-03-29 11:24:57.9859 [PID=28078]    [send] effective_ncpus 4 max_jobs_on_host_cpu 999999 max_jobs_on_host 999999
2019-03-29 11:24:57.9859 [PID=28078]    [send] effective_ngpus 0 max_jobs_on_host_gpu 999999
2019-03-29 11:24:57.9859 [PID=28078]    [send] Not using matchmaker scheduling; Not using EDF sim
2019-03-29 11:24:57.9859 [PID=28078]    [send] CPU: req 17280.00 sec, 4.00 instances; est delay 0.00
2019-03-29 11:24:57.9859 [PID=28078]    [send] work_req_seconds: 17280.00 secs
2019-03-29 11:24:57.9859 [PID=28078]    [send] available disk 1.99 GB, work_buf_min 3456
2019-03-29 11:24:57.9859 [PID=28078]    [send] active_frac 0.999990 on_frac 0.851088 DCF 4.003089
2019-03-29 11:24:57.9869 [PID=28078]    [mixed] sending non-locality work first (0.9847)
2019-03-29 11:24:58.0038 [PID=28078]    [send] [HOST#12639053] will accept beta work.  Scanning for beta work.
2019-03-29 11:24:58.0147 [PID=28078]    [version] Checking plan class 'NEON'
2019-03-29 11:24:58.0186 [PID=28078]    [version] reading plan classes from file '/BOINC/projects/EinsteinAtHome/plan_class_spec.xml'
2019-03-29 11:24:58.0186 [PID=28078]    [version] plan class ok
2019-03-29 11:24:58.0186 [PID=28078]    [version] Checking plan class 'NEON_Beta'
2019-03-29 11:24:58.0186 [PID=28078]    [version] plan class ok
2019-03-29 11:24:58.0186 [PID=28078]    [version] Best version of app einsteinbinary_BRP4 is 1.47 ID 769 NEON_Beta (1.80 GFLOPS)
2019-03-29 11:24:58.0199 [PID=28078]    Only one Beta app version result per WU (#398060410, re#1)
2019-03-29 11:24:58.0199 [PID=28078]    [send] [HOST#12639053] [WU#398060410 p2030.20170413.G57.94-02.06.C.b4s0g0.00000_110] WU is infeasible: Project-specific customization
2019-03-29 11:24:58.0210 [PID=28078]    Only one Beta app version result per WU (#398060473, re#2)
2019-03-29 11:24:58.0220 [PID=28078]    Only one Beta app version result per WU (#398063366, re#3)
2019-03-29 11:24:58.0231 [PID=28078]    Only one Beta app version result per WU (#397860502, re#4)
...
Repeats 119 times for different WU
...

2019-03-29 11:24:58.1206 [PID=28078]    [mixed] sending locality work second
2019-03-29 11:24:58.1232 [PID=28078] [debug]   [HOST#12639053] MSG(high) No work sent
2019-03-29 11:24:58.1232 [PID=28078] [debug]   [HOST#12639053] MSG(high) see scheduler log messages on https://einsteinathome.org/host/12639053/log
2019-03-29 11:24:58.1232 [PID=28078] [debug]   [HOST#12639053] MSG(high) No work available for the applications you have selected.  Please check your preferences on the web site.
2019-03-29 11:24:58.1232 [PID=28078]    Sending reply to [HOST#12639053]: 0 results, delay req 60.00
2019-03-29 11:24:58.1233 [PID=28078]    Scheduler ran 0.184 seconds

 

PorkyPies
PorkyPies
Joined: 27 Apr 16
Posts: 197
Credit: 28796567
RAC: 25634

After leaving them set to NNT

After leaving them set to NNT overnight they are now collecting work. I haven't changed any settings on the Rpi's or the website so I think the scheduler is the likely culprit.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.