I just noticed that my FX8150/GTX550Ti/Win7-64 host has recently generated several validate errors (and "Completed, marked as invalid"). I'm probably missing something in the stderr output, but I don't see much difference between these invalid tasks and the valid tasks for this host.
Any idea what happening and what could be causing the problem?
Thanks,
MarkR
Copyright © 2024 Einstein@Home. All rights reserved.
Validate Errors
)
Oops. I'm seeing the same thing: Invalid tasks for computer 5744895. I did have a few voltage problems with that host before the holidays, but they were diagnosed and fixed, and I've had no problems with it since.
You are having problems with the NVidia GPU version of the Binary Radio Pulsar Search, mine are with the Intel GPU version. I suspect that we might both be suffering from related problems at the server end, but I'll keep an eye on things.
Reading this I checked my
)
Reading this I checked my account.
3 errors; one compute error saying
couldn't start app: Input file rand_PAS.bank.v3 missing or invalid: md5 checksum failed for file
two download errors saying that this file was corrupted.
All on the same machine: win7-64, i7, 2 AMD gpu's . Old main win7
Richard Haselgrove wrote:I
)
Sorry to hear that I'm not the only one...ha-ha. Hopefully we'll get some feedback from the admins soon.
Yeah, but Richard and I are seeing validate errors. :-(
Oh, dear... I might have
)
Oh, dear... I might have bigger problems with this host. Every Gravitational Wave S6 Directed Search (CasA) v1.05 (SSE2) is crashing... :-(
Any chance we could get some
)
Any chance we could get some feedback from the developers or admins on this issue? I've had recurrences since my original post (...and I believe Richard has, too).
Thanks,
MarkR
RE: Any chance we could get
)
Yes, that list looks longer and newer, doesn't it?
Not worth disturbing anyone at this time of night (it must be a small proportion of tasks failing - that host spits out one every 12 minutes, five per hour, 120 per day) - but maybe a PM in the morning.
I've also validate
)
I've also validate errors:
------> Number of samples: 2097152
------> Trial dispersion measure: 266 cm^-3 pc
------> Scale factor: 1.875
[21:49:04][2000][INFO ] Seed for random number generator is 1084926635.
[21:49:08][2000][INFO ] Derived global search parameters:
------> f_A probability = 0.04
------> single bin prob(P_noise > P_thr) = 1.2977e-008
------> thr1 = 18.1601
------> thr2 = 21.263
------> thr4 = 26.2923
------> thr8 = 34.674
------> thr16 = 48.9881
[21:50:06][2000][INFO ] Checkpoint committed!
[21:51:12][2000][INFO ] Checkpoint committed!
[21:52:17][2000][INFO ] Checkpoint committed!
[21:53:23][2000][INFO ] Checkpoint committed!
[21:54:28][2000][INFO ] Checkpoint committed!
[21:55:34][2000][INFO ] Checkpoint committed!
[21:56:39][2000][INFO ] Checkpoint committed!
[21:57:45][2000][INFO ] Checkpoint committed!
[21:58:50][2000][INFO ] Checkpoint committed!
[21:59:56][2000][INFO ] Checkpoint committed!
[22:01:01][2000][INFO ] Checkpoint committed!
[22:02:07][2000][INFO ] Checkpoint committed!
[22:03:12][2000][INFO ] Checkpoint committed!
[22:04:18][2000][INFO ] Checkpoint committed!
[22:05:23][2000][INFO ] Checkpoint committed!
[22:06:29][2000][INFO ] Checkpoint committed!
[22:07:34][2000][INFO ] Checkpoint committed!
[22:08:40][2000][INFO ] Checkpoint committed!
[22:09:45][2000][INFO ] Checkpoint committed!
[22:10:51][2000][INFO ] Checkpoint committed!
[22:11:57][2000][INFO ] Checkpoint committed!
[22:13:02][2000][INFO ] Checkpoint committed!
[22:14:08][2000][INFO ] Checkpoint committed!
[22:15:13][2000][INFO ] Checkpoint committed!
[22:16:19][2000][INFO ] Checkpoint committed!
[22:17:24][2000][INFO ] Checkpoint committed!
[22:18:30][2000][INFO ] Checkpoint committed!
[22:19:35][2000][INFO ] Checkpoint committed!
[22:20:41][2000][INFO ] Checkpoint committed!
[22:21:46][2000][INFO ] Checkpoint committed!
[22:22:52][2000][INFO ] Checkpoint committed!
[22:23:57][2000][INFO ] Checkpoint committed!
[22:25:03][2000][INFO ] Checkpoint committed!
[22:26:08][2000][INFO ] Checkpoint committed!
[22:27:14][2000][INFO ] Checkpoint committed!
[22:28:19][2000][INFO ] Checkpoint committed!
[22:29:25][2000][INFO ] Checkpoint committed!
[22:30:30][2000][INFO ] Checkpoint committed!
[22:31:36][2000][INFO ] Checkpoint committed!
[22:32:41][2000][INFO ] Checkpoint committed!
[22:33:47][2000][INFO ] Checkpoint committed!
[22:34:52][2000][INFO ] Checkpoint committed!
[22:35:58][2000][INFO ] Checkpoint committed!
[22:37:03][2000][INFO ] Checkpoint committed!
[22:38:08][2000][INFO ] Checkpoint committed!
[22:39:14][2000][INFO ] Checkpoint committed!
[22:40:19][2000][INFO ] Checkpoint committed!
[22:41:25][2000][INFO ] Checkpoint committed!
[22:42:30][2000][INFO ] Checkpoint committed!
[22:43:34][2000][INFO ] OpenCL shutdown complete!
[22:43:34][2000][INFO ] Statistics: count dirty SumSpec pages 0 (not checkpointed), Page Size 1024, fundamental_idx_hi-window_2: 1100505
[22:43:34][2000][INFO ] Data processing finished successfully!
22:43:34 (2000): called boinc_finish
]]>
like this
Greetings
Hi! For the host in the
)
Hi!
For the host in the thread starting message I see validate errors and invalid results for both GPU and CPU apps, and at a much higher rate than usual. At the same time, other hosts that were computing the same work were able to return valid results (at least in several cases I checked). All this does point to hardware problems, I'm afraid. Using a memory checking tool and checking that the cooling works are standard things you'll want to try first.
Cheers
HB
RE: All this does point to
)
And I'm afraid you're probably right, HB. I popped the case open and it didn't take long to find a couple of blown components on the motherboard. I'm very disappointed as this mobo is only a few months old. I'll now get to experience my first RMA with a hardware vendor (in this case, ASUS).
Thanks very much, HB, for your feedback. :-)
MarkR