Locality scheduler running very slow

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 2143
Credit: 2954799935
RAC: 711628
Topic 225648

Following the burst of O3AS beta tasks for GPU in the middle of the week, the locality scheduler is struggling to check for available tasks in a reasonable time

2021-07-04 09:31:28.4366 [PID=22861]    [mixed] sending locality work first (0.2052)
2021-07-04 09:31:28.4396 [PID=22861]    [send] send_old_work() no feasible result older than 336.0 hours
2021-07-04 09:34:21.2783 [PID=22861]    [send] send_old_work() no feasible result younger than 200.9 hours and older than 168.0 hours
2021-07-04 09:34:31.3388 [PID=22861]    [mixed] sending non-locality work second

Almost 3 minutes is longer than my hosts are prepared to wait for the reply, so I have to ask again (several minutes later, owing to the client backoffs), and receive the allocated tasks as 'lost results'. Hopefully, this will automatically clear when O3AS goes into steady production.

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 2143
Credit: 2954799935
RAC: 711628

On the client, this looks

On the client, this looks like

04/07/2021 10:55:15 | Einstein@Home | Requesting new tasks for NVIDIA GPU<br />
04/07/2021 10:56:22 | Einstein@Home | Scheduler request failed: Timeout was reached<br />
04/07/2021 10:57:51 | Einstein@Home | Another scheduler instance is running for this host


 

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 2143
Credit: 2954799935
RAC: 711628

Out of all those searches

Out of all those searches (across three machines), I got just one resend - speaks well for the quality of the new GW app.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.