Task 'Running out of paper' !!!

BobMALCS
BobMALCS
Joined: 13 Aug 10
Posts: 20
Credit: 54539336
RAC: 0
Topic 204174

I don't know whether to laugh or cry at this one.

 

Task 598996629

Name: LATeah0010L_908.0_0_0.0_10782960_0
Work unit ID: 267609487
Created: 29 Dec 2016 16:40:23 GMT
Sent: 29 Dec 2016 17:36:11 GMT
Received: 29 Dec 2016 21:00:16 GMT
Server state: Over
Outcome: Computation error
Client state: Compute error
Exit status: 28 (0x0000001C) Unknown error code
Computer: 6630273
Report deadline: 12 Jan 2017 17:36:11 GMT
Run time (sec): 2,487.47
CPU time (sec): 2,457.66
Validation state: Invalid
Claimed credit: 25.73
Granted credit: 0.00
Application: Gamma-ray pulsar binary search #1 on GPUs v1.17 (FGRPopencl-nvidia) windows_x86_64

Stderr output

<core_client_version>7.6.33</core_client_version>
<![CDATA[
<message>
The printer is out of paper.
 (0x1c) - exit code 28 (0x1c)
</message>
<stderr_txt>

.
.
.
.
.
.
.
% Binary point 12/1255
% Starting semicoherent search over f0 and f1.
% nf1dots: 31 df1dot: 3.344368011e-015 f1dot_start: -1e-013 f1dot_band: 1e-013
% Filling array of photon pairs
.
.

 

Lots of '.' and then it starts Binary point 13/1255 and so on.  Eventually it runs out of paper.

 

Any thoughts ?

 

BobM

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5849
Credit: 110009666549
RAC: 24059297

Of course if you scroll right

Of course if you scroll right to the bottom of the stderr output you will find

ERROR: /home/bema/fermilat/src/bridge_fft_clfft.c:930: clFinish failed. status=-36
ERROR: opencl_prepare_power_toplist() returned with error 7338939
20:54:28 (10412): [CRITICAL]: ERROR: MAIN() returned with error '-36'
FPU status flags: PRECISION
20:54:40 (10412): [normal]: done. calling boinc_finish(28).
20:54:40 (10412): called boinc_finish

which is probably a much better indication of what the problem really was.

Cheers,
Gary.

BobMALCS
BobMALCS
Joined: 13 Aug 10
Posts: 20
Credit: 54539336
RAC: 0

Ok.  But what is the

Ok.  But what is the implication of 'FPU status flags: PRECISION'

Program error, bad calculation, GPU problem, ...

BobM

archae86
archae86
Joined: 6 Dec 05
Posts: 3145
Credit: 7059424931
RAC: 1263442

I've been getting a higher

I've been getting a higher rate of computation error failures in my fleet on the FGRBP1 work than for the last year running the previous two primary Einstein GPU applications.  It is well under 1% and scattered across my machines.  I've not tried reducing overclocks to try to find whether what was a safe overclock for the previous work is just a little too high for this one.

One unifying theme I have seen is seemingly spurious error messages, as reported in the starting post of this thread.  In fact I, too, got an out of paper message on at least one computation error stderr.

I have also gotten

The network BIOS session limit was exceeded.
 (0x45) - exit code 69 (0x45)"

The network name cannot be found.
 (0x43) - exit code 67 (0x43)

 

Logforme
Logforme
Joined: 13 Aug 10
Posts: 332
Credit: 1714373961
RAC: 0

archae86 wrote:One unifying

archae86 wrote:
One unifying theme I have seen is seemingly spurious error messages, as reported in the starting post of this thread.  In fact I, too, got an out of paper message on at least one computation error stderr.

I would guess that these error messages is a result of the program asking windows for the error text associated with an error code. Only problem is that the error code comes from opencl (or internal codes from the program), not windows, so you get a nonsense text.

Windows error codes

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5849
Credit: 110009666549
RAC: 24059297

BobmALCS wrote:Ok.  But what

BobmALCS wrote:
Ok.  But what is the implication of 'FPU status flags: PRECISION'

It's not part of the error message - it's there in the stderr output on 'good' results as well as 'bad'.

It's listed as a FPU status flag.  Maybe "PRECISION" just means the FPU is giving the best possible precision for the calculations it performs.  That would be my guess.

 

Cheers,
Gary.

BobMALCS
BobMALCS
Joined: 13 Aug 10
Posts: 20
Credit: 54539336
RAC: 0

Gary Roberts wrote:Of course

Gary Roberts wrote:

Of course if you scroll right to the bottom of the stderr output you will find

ERROR: /home/bema/fermilat/src/bridge_fft_clfft.c:930: clFinish failed. status=-36
ERROR: opencl_prepare_power_toplist() returned with error 7338939
20:54:28 (10412): [CRITICAL]: ERROR: MAIN() returned with error '-36'
FPU status flags: PRECISION
20:54:40 (10412): [normal]: done. calling boinc_finish(28).
20:54:40 (10412): called boinc_finish

which is probably a much better indication of what the problem really was.

I have no doubt that someone knows what the error return means but I don't. 

So until there is some indication of what the problem is and possibly a solution I wont be running Einstein.

BobM

Christian Beer
Christian Beer
Joined: 9 Feb 05
Posts: 595
Credit: 127321528
RAC: 354165

BOINC on windows tries to

BOINC on windows tries to guess the human readable version of an error code and interprets the exit code of an app as if it where a windows error code. This is not the case here.

As seen in the stderr.log the real error code is -36 which is openCL specific and translates to CL_INVALID_COMMAND_QUEUE which is something Bernd will have to look at when he is back.

BobMALCS
BobMALCS
Joined: 13 Aug 10
Posts: 20
Credit: 54539336
RAC: 0

Christian, thanks for the

Christian, thanks for the info.    I'll wait patiently.

BobM

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.