Hi Everyone
Not being able to complete the O2MD GPU tasks on my Radeon Fury Box. For example
https://einsteinathome.org/task/899598434
They seem to hang or crash.
This is the machine: https://einsteinathome.org/host/12219055
Any suggestions?
Thanks!
Copyright © 2024 Einstein@Home. All rights reserved.
Gary Roberts posted about a
)
Gary Roberts posted about a problem with SOME gpu's and Gravitional Wave tasks in another thread that may or may not apply to you:
[url]As it turns out, I've very recently explained the cause of this in this message. Before digesting that explanation, look at the 4 messages that preceded my comment because they show the initial query and the responses from Holmis who pointed out the error message which then allowed the problem to be explained.[/url]
Mikey, My comment that you
)
Mikey,
My comment that you quoted was directed at problems that seem specific to Pitcairn and Tahiti series GPUs. As I mentioned, those GPUs are quite old and belong to the 1st gen of the GCN architecture. Gaurav clearly mentioned that his GPUs are "Radeon Fury". If you look here you will see the listing for the Radeon R9 Fury (Fiji Pro) where it clearly states that the architecture is GCN 3rd gen. which is rather more recent than the old 1st gen stuff.
I have one GPU that is 3rd gen, an R9 380 (Tonga Pro) and it has no problem with FGRPB1G tasks. I have no reason to even suspect that this card might have a problem with O2MDF or that there might be any problem with 3rd gen cards in general. This particular card has been crunching FGRPB1G tasks without issue.
I wasn't intending to shift the R9 380 to GW at the moment. However, so that Gaurav doesn't go chasing down some unnecessary rabbit holes, I've made a very temporary switch to O2MDF to check for any problems. I've grabbed a small batch of tasks - just 5 tasks in total - and the first has successfully completed on its own at an elapsed time of just over 16 mins. The remaining 4 have crunched in pairs (2x) and the average time per task for those is quite a lot less, so everything is working as expected.
Of course, crunch times (because of work content variations) are rather variable so you can't read too much into the values shown for the completed tasks. However, it does seem that GCN 3rd gen in general (and Tonga Pro in particular) don't have any issue with the GW app.
Cheers,
Gary.
I see the same
)
I see the same message
"Warning: Program terminating, but clFFT resources not freed. Please consider explicitly calling clfftTeardown( )."
at the end of the log of this Task.
The WU was stuck at about 75%. After a few minutes I aborted the WU.
But only with the first of 100 WUs. The others look ok so far.
Linux Mint, Radeon VII
DF1DX wrote:I see the same
)
That message is a warning that comes after the actual compute error. It's something to do with the way the OpenCL system terminates once processing (successful or otherwise) has finished. As an example, for the hundreds of completely successful FGRPB1G tasks I've ever browsed on the server after successful completion, every single one has had (and still does today) that same warning. I've just checked a GW task and there is no warning so we can assume the GW app has been written in such a way that the OpenCL gods are appeased and that the program has been terminated in a pedantically correct fashion :-).
In your case notice the lonely 'c' character in the output, preceded (and followed) by lots of 'dots'. Each dot represents a calculation loop. The 'c' usually represents a loop where a checkpoint is (or perhaps can be) written. If you compare your output with Gaurav's, he had 2 'c' chars and significantly fewer dots (if I remember correctly). I suspect your task was really 'spinning its wheels' (for unknown reasons) which you 'fixed' by aborting it. Did you perhaps try stopping and restarting BOINC before aborting? Maybe that might have corrected things.
That was probably just some sort of simulated progress. With just one potential checkpoint, it couldn't really be real. The task of my own that I looked at just now (mentioned above) completed in around 16 mins and had about 15 'c' chars. There's supposed to be a checkpoint every minute or so, so 15 checkpoints seems right.
Cheers,
Gary.