AMD GPU + Windows, GW GPU tasks, more than 2 tasks in parallel

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5842
Credit: 109377939540
RAC: 35985657

Robert Meckley wrote:Let me

Robert Meckley wrote:
Let me add my voice of complaint regarding ....

This thread is about why results are being declared invalid when more than two tasks are run concurrently.  Your problem is about why results are not finishing in the first place.  Please don't divert a thread on one particular topic and take it off into a totally different direction.  Your problem is different.  You should have started a new thread.

Your GPU is not finishing any tasks.  They are all ending in a particular type of compute error - "EXIT_TIME_LIMIT_EXCEEDED".  You can see this if you go to your list of GW tasks on the website and click on the TaskID link for any single failed task.   You have 19 of them and each one has taken around 12,440 secs to get to the point where the plug gets pulled.

As it turns out, I've very recently explained the cause of this in this message.  Before digesting that explanation, look at the 4 messages that preceded my comment because they show the initial query and the responses from Holmis who pointed out the error message which then allowed the problem to be explained.

In a nutshell, your hardware is AMD's Tahiti series.  Both Tahiti and Pitcairn series GPUs belong to the very first 'generation' of AMD's "Graphics Core Next" (GCN) architecture which eventually had a number of further (and improved) 'generations'.  It's not surprising that both seem to exhibit the same behaviour and don't seem to be able to cope with the new GW app.  By all means continue to use those series of GPUs with the FGRPB1G search.  It's possible that a future driver update might correct the situation but I think that's a rather forlorn possibility.

The other thing you should note is that as well as the 19 error tasks, you have a further 859 listed as "in progress".  Once tasks are allocated to your machine, the ONLY way to get rid of them properly is to abort them.  Resetting the project is not going to make them go away.   Just highlight every single one of them that shows in BOINC Manager and click the abort button.  They will all go away without any further fuss.

For future reference, whenever you are changing settings to select a different search from what you've been running, please make sure you set your work cache size to a nice low value (eg 0.1 days) just in case the first new tasks have a crazy low estimate which causes some crazy huge number like 878 to get downloaded :-).  You can then see how they run before over-committing yourself.  The first one to crunch should tell you the true crunch time and then you can always take further controlled 'sips' so that you don't find yourself in such an impossible situation.

Cheers,
Gary.

Holmis
Joined: 4 Jan 05
Posts: 1118
Credit: 1055935564
RAC: 0

Gary Roberts wrote:Just

Gary Roberts wrote:
Robert Meckley wrote:
Let me add my voice of complaint regarding ....
Just highlight every single one of them that shows in BOINC Manager and click the abort button.  They will all go away without any further fuss.

And to maybe make it a bit more clear and the process just a bit less tedious you should be able to select multiple tasks (with the use of the shift key) and abort them in bunches.

If you don't have all of the 800 something tasks in your cache then the server will resend the "lost tasks" in bunches of 12 (I think) on every scheduler contact. And when dealing with lost task I don't think the 1 minute deference time between scheduler contacts applies, a click on the Update button after one or two seconds will get you the next bunch. That should be easy to verify.

DF1DX
DF1DX
Joined: 14 Aug 10
Posts: 102
Credit: 2914430839
RAC: 1050621

> So download a version of

Another data point: Not a 5700, but with a Radeon VII:

> So download a version of Linux and put Boinc on it and see what happens...

I have provided a second HD with Linux Mint 19.2 XFCE 64 bit on this Host, installed the AMD driver 19.30 and yesterday ran 100 WUs O2MDF 2.02. Four at the same time.  After 19:00 UTC with five WUs at the same time.

Except for the first WU, which got stuck, I don't see any invalid WU yet.

Under Windows 7 on the *same hardware* more than three WUs at the same time are always(!) invalid.

At the moment I have the impression that my hardware is working and the problem is probably with the driver or the app. Or both together.

To be on the safe side, I ordered a second Radeon VII today.

 

solling2
solling2
Joined: 20 Nov 14
Posts: 219
Credit: 1563236631
RAC: 50741

Gary Roberts schrieb:Richie

Gary Roberts wrote:
Richie wrote:
... IF ALL of the lines below hold truth :

... Hopefully, there will be others running multiple tasks under Windows who can shed some actual light on this.

After I had run a test batch of tasks and got an invalid rate above 50%, I replaced the PSU with a more capable one and reduced my substantial undervolting. Not all results in yet, but things are looking good now. Apparently that fixed it.

If so, one may take a second look at the power draw of O2MDF tasks. While the overall power draw is reduced due to the lower load, there are several spikes around. With Windows well known for working in the background, may that contribute to minimal aberrations, so that a result can be both successful and invalid?

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.