Disk space and resend lost results

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 2140
Credit: 2769945407
RAC: 932509
Topic 196292

Following a cri de coeur at SETI, I looked at this account.

I suspect that many of the problems arise from inexperience on the part of the user, but since it's the new GTX 680 Kepler that Oliver was keeping an eye on, I thought it was worth a look.

The server log below is interesting - the server didn't want to send new work because of the lack of disk space, but went ahead and resent lost tasks anyway. There's also a parse error at the bottom, suggesting that our server here is beginning to be overtaken by the newest clients.

2012-04-25 08:48:52.6038 [PID=29236] Request: [USER#xxxxx] [HOST#5147355] [IP xxx.xxx.xxx.103] client 7.0.25
2012-04-25 08:48:52.6134 [PID=29236] [send] No disk space available: disk_max_used_gb 0.00GB disk_max_used_pct 50.00 disk_min_free_gb 0.00GB
2012-04-25 08:48:52.6135 [PID=29236] [send] No disk space available: host.d_total 298.00GB host.d_free 121.63GB host.d_boinc_used_total 10.28GB
2012-04-25 08:48:52.6135 [PID=29236] [send] No disk space available: x1 -0.28GB x2 138.72GB x3 111.63GB x -0.28GB
2012-04-25 08:48:52.6135 [PID=29236] [send] effective_ncpus 12 max_jobs_on_host_cpu 999999 max_jobs_on_host 999999
2012-04-25 08:48:52.6135 [PID=29236] [send] effective_ngpus 1 max_jobs_on_host_gpu 999999
2012-04-25 08:48:52.6135 [PID=29236] [send] Not using matchmaker scheduling; Not using EDF sim
2012-04-25 08:48:52.6135 [PID=29236] [send] CPU: req 0.00 sec, 0.00 instances; est delay 0.00
2012-04-25 08:48:52.6135 [PID=29236] [send] CUDA: req 0.00 sec, 0.00 instances; est delay 0.00
2012-04-25 08:48:52.6135 [PID=29236] [send] work_req_seconds: 0.00 secs
2012-04-25 08:48:52.6135 [PID=29236] [send] available disk 0.00 GB, work_buf_min 864000
2012-04-25 08:48:52.6135 [PID=29236] [send] active_frac 0.792569 on_frac 1.000000 DCF 0.471747
2012-04-25 08:48:52.6568 [PID=29236] [version] Checking plan class 'SSE2'
2012-04-25 08:48:52.6572 [PID=29236] [version] reading plan classes from file '../plan_class_spec.xml'
2012-04-25 08:48:52.6572 [PID=29236] [version] Best version of app einstein_S6LV1 is ID 327 (4.85 GFLOPS)
2012-04-25 08:48:52.6572 [PID=29236] [send] est. duration for WU 119285760: unscaled 13300.41 scaled 7916.57
2012-04-25 08:48:52.6577 [PID=29236] [debug] Sorted list of URLs follows [host timezone: UTC-14400]
2012-04-25 08:48:52.6577 [PID=29236] [debug] zone=-21600 url=http://einstein-dl4.phys.uwm.edu
2012-04-25 08:48:52.6577 [PID=29236] [debug] zone=-21600 url=http://einstein-dl2.phys.uwm.edu
2012-04-25 08:48:52.6577 [PID=29236] [debug] zone=-28800 url=http://einstein.ligo.caltech.edu
2012-04-25 08:48:52.6577 [PID=29236] [debug] zone=+03600 url=http://einstein-mirror.aei.uni-hannover.de/EatH
2012-04-25 08:48:52.6580 [PID=29236] [send] [HOST#5147355] Sending app_version einstein_S6LV1 2 112 SSE2; 4.85 GFLOPS
2012-04-25 08:48:52.6582 [PID=29236] [send] [RESULT#281856837] [HOST#5147355] (resend lost work)
2012-04-25 08:48:52.6594 [PID=29236] [send] est. duration for WU 119285760: unscaled 13300.41 scaled 7916.57
2012-04-25 08:48:52.6594 [PID=29236] [HOST#5147355] Sending [RESULT#281856837 h1_0106.25_S6GC1__163_S6LV1B_2] (est. dur. 7916.57 seconds)
2012-04-25 08:48:52.6607 [PID=29236] [send] est. duration for WU 119278128: unscaled 13300.41 scaled 7916.57
2012-04-25 08:48:52.6608 [PID=29236] [send] [HOST#5147355] Sending app_version einstein_S6LV1 2 112 SSE2; 4.85 GFLOPS
2012-04-25 08:48:52.6610 [PID=29236] [send] [RESULT#281888097] [HOST#5147355] (resend lost work)
2012-04-25 08:48:52.6622 [PID=29236] [send] est. duration for WU 119278128: unscaled 13300.41 scaled 7916.57
2012-04-25 08:48:52.6622 [PID=29236] [HOST#5147355] Sending [RESULT#281888097 h1_0106.30_S6GC1__161_S6LV1B_2] (est. dur. 7916.57 seconds)
2012-04-25 08:48:52.6636 [PID=29236] [send] est. duration for WU 119037674: unscaled 13300.41 scaled 7916.57
2012-04-25 08:48:52.6637 [PID=29236] [send] [HOST#5147355] Sending app_version einstein_S6LV1 2 112 SSE2; 4.85 GFLOPS
2012-04-25 08:48:52.6639 [PID=29236] [send] [RESULT#281893950] [HOST#5147355] (resend lost work)
2012-04-25 08:48:52.6651 [PID=29236] [send] est. duration for WU 119037674: unscaled 13300.41 scaled 7916.57
2012-04-25 08:48:52.6651 [PID=29236] [HOST#5147355] Sending [RESULT#281893950 h1_0106.35_S6GC1__169_S6LV1B_3] (est. dur. 7916.57 seconds)
2012-04-25 08:48:52.6668 [PID=29236] [send] est. duration for WU 120386966: unscaled 13300.41 scaled 7916.57
2012-04-25 08:48:52.6669 [PID=29236] [send] [HOST#5147355] Sending app_version einstein_S6LV1 2 112 SSE2; 4.85 GFLOPS
2012-04-25 08:48:52.6671 [PID=29236] [send] [RESULT#281935305] [HOST#5147355] (resend lost work)
2012-04-25 08:48:52.6683 [PID=29236] [send] est. duration for WU 120386966: unscaled 13300.41 scaled 7916.57
2012-04-25 08:48:52.6683 [PID=29236] [HOST#5147355] Sending [RESULT#281935305 h1_0106.15_S6GC1__58_S6LV1B_1] (est. dur. 7916.57 seconds)
2012-04-25 08:48:52.6696 [PID=29236] [send] est. duration for WU 119301645: unscaled 13300.41 scaled 7916.57
2012-04-25 08:48:52.6696 [PID=29236] [send] [HOST#5147355] Sending app_version einstein_S6LV1 2 112 SSE2; 4.85 GFLOPS
2012-04-25 08:48:52.6700 [PID=29236] [send] [RESULT#281971549] [HOST#5147355] (resend lost work)
2012-04-25 08:48:52.6711 [PID=29236] [send] est. duration for WU 119301645: unscaled 13300.41 scaled 7916.57
2012-04-25 08:48:52.6711 [PID=29236] [HOST#5147355] Sending [RESULT#281971549 h1_0106.30_S6GC1__160_S6LV1B_3] (est. dur. 7916.57 seconds)
2012-04-25 08:48:52.6730 [PID=29236] [send] est. duration for WU 119301646: unscaled 13300.41 scaled 7916.57
2012-04-25 08:48:52.6732 [PID=29236] [send] [HOST#5147355] Sending app_version einstein_S6LV1 2 112 SSE2; 4.85 GFLOPS
2012-04-25 08:48:52.6733 [PID=29236] [send] [RESULT#282009091] [HOST#5147355] (resend lost work)
2012-04-25 08:48:52.6743 [PID=29236] [send] est. duration for WU 119301646: unscaled 13300.41 scaled 7916.57
2012-04-25 08:48:52.6743 [PID=29236] [HOST#5147355] Sending [RESULT#282009091 h1_0106.30_S6GC1__159_S6LV1B_2] (est. dur. 7916.57 seconds)
2012-04-25 08:48:52.6783 [PID=29236] [send] est. duration for WU 119323565: unscaled 13300.41 scaled 7916.57
2012-04-25 08:48:52.6784 [PID=29236] [send] [HOST#5147355] Sending app_version einstein_S6LV1 2 112 SSE2; 4.85 GFLOPS
2012-04-25 08:48:52.6785 [PID=29236] [send] [RESULT#282018025] [HOST#5147355] (resend lost work)
2012-04-25 08:48:52.6795 [PID=29236] [send] est. duration for WU 119323565: unscaled 13300.41 scaled 7916.57
2012-04-25 08:48:52.6795 [PID=29236] [HOST#5147355] Sending [RESULT#282018025 h1_0106.35_S6GC1__167_S6LV1B_3] (est. dur. 7916.57 seconds)
2012-04-25 08:48:52.6833 [PID=29236] [send] est. duration for WU 119325819: unscaled 13300.41 scaled 7916.57
2012-04-25 08:48:52.6834 [PID=29236] [send] [HOST#5147355] Sending app_version einstein_S6LV1 2 112 SSE2; 4.85 GFLOPS
2012-04-25 08:48:52.6835 [PID=29236] [send] [RESULT#282029424] [HOST#5147355] (resend lost work)
2012-04-25 08:48:52.6874 [PID=29236] [send] est. duration for WU 119325819: unscaled 13300.41 scaled 7916.57
2012-04-25 08:48:52.6874 [PID=29236] [HOST#5147355] Sending [RESULT#282029424 h1_0106.30_S6GC1__158_S6LV1B_2] (est. dur. 7916.57 seconds)
2012-04-25 08:48:52.6889 [PID=29236] [send] est. duration for WU 120433646: unscaled 13300.41 scaled 7916.57
2012-04-25 08:48:52.6891 [PID=29236] [send] [HOST#5147355] Sending app_version einstein_S6LV1 2 112 SSE2; 4.85 GFLOPS
2012-04-25 08:48:52.6900 [PID=29236] [send] [RESULT#282049076] [HOST#5147355] (resend lost work)
2012-04-25 08:48:52.6911 [PID=29236] [send] est. duration for WU 120433646: unscaled 13300.41 scaled 7916.57
2012-04-25 08:48:52.6911 [PID=29236] [HOST#5147355] Sending [RESULT#282049076 h1_0106.15_S6GC1__57_S6LV1B_1] (est. dur. 7916.57 seconds)
2012-04-25 08:48:52.6942 [PID=29236] [send] est. duration for WU 120436578: unscaled 13300.41 scaled 7916.57
2012-04-25 08:48:52.6944 [PID=29236] [send] [HOST#5147355] Sending app_version einstein_S6LV1 2 112 SSE2; 4.85 GFLOPS
2012-04-25 08:48:52.6945 [PID=29236] [send] [RESULT#282057729] [HOST#5147355] (res
2012-04-25 08:48:52.1955 [PID=29235] SCHEDULER_REQUEST::parse(): unrecognized: 0