CL_BUILD_PROGRAM_FAILURE on 02MDF (GW-opencl-ati)

Joseph Stateson
Joseph Stateson
Joined: 7 May 07
Posts: 173
Credit: 2946598785
RAC: 1257738

Have completed 19 of the beta

Have completed 19 of the beta tests apps, all validated

https://einsteinathome.org/host/12786301/tasks/4/54

It is not obvious the RocM drivers are being used.  I did not purge any drivers before installing the RocM stuff.

I ran a Milkyway job through the system for comparison and it verified ok.  I put its "task" info here for comparison.  Note that is designates OpenCL 2.1 AMD-APP (3004.6).  Can I assume that is RocM? I also see that the driver uses OpenCL 1.2 and not 2.1 but that could be the MW project coded for only 1.2 ?

https://stateson.net/images/mw_rocm_job.txt

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5845
Credit: 109970589732
RAC: 30175881

JStateson wrote:...  Can I

JStateson wrote:
...  Can I assume that is RocM?

I don't know for sure but my guess is perhaps not.

On my machine, the following is what clinfo tells me about the 'platform'.

Number of platforms:           1
   Platform Profile:           FULL_PROFILE
   Platform Version:           OpenCL 2.1 AMD-APP (2671.3)
   Platform Name:              AMD Accelerated Parallel Processing
   Platform Vendor:            Advanced Micro Devices, Inc.
   Platform Extensions:        cl_khr_icd cl_amd_event_callback cl_amd_offline_devices

I don't know if ROCm has a 'clinfo' - perhaps something equivalent with a different name.  Perhaps you can run whatever will give you the same sort of details about your ROCm platform and see if it still uses the same sort of AMD-APP designation.  My designation is very similar to what you show and that is why I suspect you are not actually using ROCm.  My OpenCL is a little old (the 3rd release in 2018) hence the smaller 2671.3 numerical version it shows.  I have more recent installs (based on 19.50) on machines with GCN 1st Gen (Pitcairn) hardware where the numerical versioning shows as 2906.7.  It would seem that your 3004.6 is the same OpenCL type - just a slightly later build.  This is all just guesswork on my part though.

Cheers,
Gary.

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5845
Credit: 109970589732
RAC: 30175881

Just a small update.  I've

Just a small update.  I've now returned the host to processing tasks in the normal FIFO order.  34 new version tasks were promoted and processed prior to reverting.  28 have validated and 6 are pending.  There are no errors or invalids at this point.

Due to the usual crunch time variations, it's impossible to say whether or not there is any change in crunching performance.  My guess is probably no change.  First of all, the sample size is insufficient to be sure about that.  Secondly, there were variations in spin frequency and in the particular pulsar being targeted.  Thirdly, as anyone running this work would know, there is a lot of variation in crunch time due to the difficulty of 'slicing and dicing' tasks that have repeatable run times.  This is not like the GRP search where run times are very stable.

There doesn't seem to be any evidence of problems so hopefully the OP will now be able to run these tasks without them crashing out.

Cheers,
Gary.

Joseph Stateson
Joseph Stateson
Joined: 7 May 07
Posts: 173
Credit: 2946598785
RAC: 1257738

OK, dug into this whole

OK, dug into this whole mess.  I call it a mess as I ended up with a non-working AMD driver that cannot be re-installed and had to do a clean install of 18.04 but I did get it working.  Also, it was probably working back when I first posted about it.

Learned a lot

Learned that ROCm or rocM or whatever it is called is a worthless P.O.S. not worth installing on any mining system where the number of GPU's exceed the number of  GEN3 lanes open to the CPU.  That actually covers a lot of motherboard and CPU not just the cheap ones (like mine)

Discussion here that got my attention, the keywords to look for are "support gen3 atomics:

https://github.com/RadeonOpenCompute/ROCm/issues/589

I then went to github's RadeonOpenComputer (the bible) and sure enough, PCIe Gen 3 is needed for any AMD card under GFX9 AND the slot it fits in needs to support "atomics"  as explained here

https://github.com/RadeonOpenCompute/ROCm#supported-cpus

 

I then went to my system and tried a few things and proved I was running ROCm as  shown below

207-208 

The pink color shows that GW tasks 2.07 failed.  Note that the beta task 2.08 passed.  That verifies I was running ROCm.  The black box shows that ONLY THE AMD CARD IN THE X16 SLOT IS USEABLE!!!!!  note the phrase "PCI rejects atomics".  My H81btc motherboard cannot do PCIe-3 in any of the X1 lanes it would appear.   But then it was only $30 free ship and had a CPU.  Not sure about my other mining rigs:  TB85 and H110btc. For sure, I will not be running ROCm or rocM or WTF it is called.

 

My opinion, worth 2c

 [EDIT] - Clarification:  in the BIOS I can assign GEN3 to any of the 6 slots on the motherboard so I am not stuck with using an X16 slot.  There are just not enough GEN3 "atomics" to go around.  This does not apply to the non ROCm drivers which can run GEN1 on any or all slot easily.  I was running total of 5 GPUs on this board before I messed with the ROCm

Joseph Stateson
Joseph Stateson
Joined: 7 May 07
Posts: 173
Credit: 2946598785
RAC: 1257738

Sagittarius Lupus wrote:Keith

Sagittarius Lupus wrote:
Keith Myers wrote:
Pretty sure NO BOINC project supports RocM drivers.  You must use the stock proprietary AMD drivers with the OpenCL legacy option during install.

That's not helpful. All the other GPU projects I've signed up for are working with these drivers.

Seems this is not true.  Sagittarius must have forgotten about his problem with Milkyway  

https://milkyway.cs.rpi.edu/milkyway/forum_thread.php?id=4505

Another ROCm driver failing spelled out a few weeks earlier same forum on Milkyway

https://milkyway.cs.rpi.edu/milkyway/forum_thread.php?id=4495

 

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4273
Credit: 245216538
RAC: 12981

I made 2.08 the new default

I made 2.08 the new default app version for AMD GPUs. It seems to solve this problem and doesn't cause more than the previous version. The only difference is some additional "_private" declarations in the OpenCL code, so I really don't expect anything to change for the hosts where the previous version worked.

 

BM

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.