I don't know whether to laugh or cry at this one.
Name: LATeah0010L_908.0_0_0.0_10782960_0
Work unit ID: 267609487
Created: 29 Dec 2016 16:40:23 GMT
Sent: 29 Dec 2016 17:36:11 GMT
Received: 29 Dec 2016 21:00:16 GMT
Server state: Over
Outcome: Computation error
Client state: Compute error
Exit status: 28 (0x0000001C) Unknown error code
Computer: 6630273
Report deadline: 12 Jan 2017 17:36:11 GMT
Run time (sec): 2,487.47
CPU time (sec): 2,457.66
Validation state: Invalid
Claimed credit: 25.73
Granted credit: 0.00
Application: Gamma-ray pulsar binary search #1 on GPUs v1.17 (FGRPopencl-nvidia)
windows_x86_64 |
<core_client_version>7.6.33</core_client_version> <![CDATA[ <message> The printer is out of paper. (0x1c) - exit code 28 (0x1c) </message> <stderr_txt>
.
.
.
.
.
.
.
% Binary point 12/1255
% Starting semicoherent search over f0 and f1.
% nf1dots: 31 df1dot: 3.344368011e-015 f1dot_start: -1e-013 f1dot_band: 1e-013
% Filling array of photon pairs
.
.
Lots of '.' and then it starts Binary point 13/1255 and so on. Eventually it runs out of paper.
Any thoughts ?
BobM
Copyright © 2024 Einstein@Home. All rights reserved.
Of course if you scroll right
)
Of course if you scroll right to the bottom of the stderr output you will find
which is probably a much better indication of what the problem really was.
Cheers,
Gary.
Ok. But what is the
)
Ok. But what is the implication of 'FPU status flags: PRECISION'
Program error, bad calculation, GPU problem, ...
BobM
I've been getting a higher
)
I've been getting a higher rate of computation error failures in my fleet on the FGRBP1 work than for the last year running the previous two primary Einstein GPU applications. It is well under 1% and scattered across my machines. I've not tried reducing overclocks to try to find whether what was a safe overclock for the previous work is just a little too high for this one.
One unifying theme I have seen is seemingly spurious error messages, as reported in the starting post of this thread. In fact I, too, got an out of paper message on at least one computation error stderr.
I have also gotten
archae86 wrote:One unifying
)
I would guess that these error messages is a result of the program asking windows for the error text associated with an error code. Only problem is that the error code comes from opencl (or internal codes from the program), not windows, so you get a nonsense text.
Windows error codes
BobmALCS wrote:Ok. But what
)
It's not part of the error message - it's there in the stderr output on 'good' results as well as 'bad'.
It's listed as a FPU status flag. Maybe "PRECISION" just means the FPU is giving the best possible precision for the calculations it performs. That would be my guess.
Cheers,
Gary.
Gary Roberts wrote:Of course
)
I have no doubt that someone knows what the error return means but I don't.
So until there is some indication of what the problem is and possibly a solution I wont be running Einstein.
BobM
BOINC on windows tries to
)
BOINC on windows tries to guess the human readable version of an error code and interprets the exit code of an app as if it where a windows error code. This is not the case here.
As seen in the stderr.log the real error code is -36 which is openCL specific and translates to CL_INVALID_COMMAND_QUEUE which is something Bernd will have to look at when he is back.
Christian, thanks for the
)
Christian, thanks for the info. I'll wait patiently.
BobM