All Einstein@Home jobs fail immediately on starting

UnionJack
UnionJack
Joined: 9 Feb 05
Posts: 15
Credit: 70193184
RAC: 13088

No, we're not finished yet.

No, we're not finished yet. This morning I find seven E@H jobs with compilation errors: output file absent. Stdoutdae.txt shows more output files absent than jobs started, which seems odd.

So I'm back to square one. Well, one-and-a-half, maybe, because a lot of jobs ran for several hours before reporting output file absent.

--
Rgds
Peter.

Juha
Juha
Joined: 27 Nov 14
Posts: 49
Credit: 4962746
RAC: 23

RE: So this was a compiler

Quote:
So this was a compiler issue of LLVM? I can't find the libclc library in the dependencies but the gentoo libOpenCL.so has libLLVM as a dependency. Was this the clue to solving this?

Just a guess. Updating libclc is what Paulie and Paul had to do in BRP6-opencl-ati with linux mesa opencl and Aaron Puchert and JohnRH in Ubuntu 16.04 LTS Is Deprecating AMD's fglrx (Catalyst) (released this month). The package seems to be made of mostly header and bytecode files so it's not going to show up with ldd.

Unfortunately it wasn't as easy as updating libclc this time. UnionJack, you could try updating Mesa and LLVM too. Mesa is up to 11.2.2 and LLVM is up to 3.8. It is also possible that your card just isn't supported yet. The others had cards up to Sea Islands / GCN 1.1 but yours is Volcanic Islands / GCN 1.2.

UnionJack
UnionJack
Joined: 9 Feb 05
Posts: 15
Credit: 70193184
RAC: 13088

OK, I'm updating mesa and

OK, I'm updating mesa and llvm now, together with clang and libdrm which were needed as well. I'll watch what happens and report back.

I don't know what GCNs are, but I confirm that it's a Volcanic Islands card. According to the system builder, the global list of PCI IDs at http://pci-ids.ucw.cz/read/PC/1002/6938 (where I'm peterh) and pciutils here, it's an AMD/ATI Tonga XT Radeon R9 380X Nitro 4G D5 [1002:6938] with subsystem 174b:e308. Does that match your understanding?

--
Rgds
Peter.

Juha
Juha
Joined: 27 Nov 14
Posts: 49
Credit: 4962746
RAC: 23

GCN = Graphics Core Next, a

GCN = Graphics Core Next, a GPU microarchitecture.

Oh and success!... or not.

[05:44:53][28438][ERROR] Error creating OpenCL FFT plan (error: -45)

That -45 is CL_INVALID_PROGRAM_EXECUTABLE. I don't know how to fix that. Maybe wait for the next version of LLVM, 3.8.1 is scheduled for June 15.

UnionJack
UnionJack
Joined: 9 Feb 05
Posts: 15
Credit: 70193184
RAC: 13088

Hmm. Maybe I should exclude

Hmm. Maybe I should exclude the GPU from E@H's resources until the next LLVM. Is there a way to do that for a single project?

--
Rgds
Peter.

Juha
Juha
Joined: 27 Nov 14
Posts: 49
Credit: 4962746
RAC: 23

Yes, deselect "Use xyz GPU"

Yes, deselect "Use xyz GPU" in project preferences.

edit: AgentB has been tracking the status of AMD's new Linux driver in Ubuntu 16.04 thread. The beta driver package is made for Ubuntu. I imagine it could be installed on Gentoo but would require some amount of manual work to convert paths and config files to something Gentoo expects.

UnionJack
UnionJack
Joined: 9 Feb 05
Posts: 15
Credit: 70193184
RAC: 13088

Right. Deselected

Right. Deselected OK.

Thanks for the Ubuntu thread pointer. I'll watch that one too.

--
Rgds
Peter.

Aaron Puchert
Aaron Puchert
Joined: 30 May 14
Posts: 13
Credit: 2651954
RAC: 0

RE: For completeness: the

Quote:
For completeness: the message: "[16:32:00][10115][ERROR] Couldn't build OpenCL program (error: -11)!" means CL_BUILD_PROGRAM_FAILURE if there is a failure to build the program executable. This error will be returned if clBuildProgram does not return until the build has completed. and is the return value from clBuildProgram(). I don't know why that fails.

This error code indicates a compiler error. In that case you can get the build log via clGetProgramBuildInfo with param_name = CL_PROGRAM_BUILD_LOG. The log would surely contain a hint why this fails.

Christian Beer
Christian Beer
Joined: 9 Feb 05
Posts: 595
Credit: 126060661
RAC: 332246

The code already does that

The code already does that but only in the debug version not the released working version currently on Einstein@home.

Aaron Puchert
Aaron Puchert
Joined: 30 May 14
Posts: 13
Credit: 2651954
RAC: 0

RE: The code already does

Quote:
The code already does that but only in the debug version not the released working version currently on Einstein@home.

Maybe it makes sense to include this in the released version, since the kernels are compiled on the client. Different clients may obviously have different compilers. In the case of Mesa, any recent version of LLVM/Clang could have been used.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.