R 290X processing more than 1 task at a time invalids

chase1902
chase1902
Joined: 13 Aug 11
Posts: 37
Credit: 1,264,094,642
RAC: 0
Topic 198165

I am having problems getting more than 1 task at a time to run on my 290X without all tasks been invalid.
I was wondering if any body have managed to get them to work, as I am running out of idea's.
I don't know if windows 10 would be better, has anybody tried with better results.
Tried various drivers with different run times but none will validate where more than one task was run at a time.
I did manage to under clock the card but had to use an old version of CCC and was not any better at getting the task to validate, also very bad run times.
Could always just run one task at a time, but seems a sham to waste what the card is meant to be able to do.

AgentB
AgentB
Joined: 17 Mar 12
Posts: 915
Credit: 513,211,304
RAC: 0

R 290X processing more than 1 task at a time invalids

I guess it is this host ?

I had a look at some of the error tasks, and some of the invalids.

I notice you are check-pointing every second, that seems unusual to me -- i think the default is 60 seconds, so you might want to see if setting at that helps (look in Preferences).

If a host is generating tasks with errors, then the host itself realizes it has failed. Invalids are when results fail against other hosts who work the same task.

Fix errors first, then see if invalids occur.

It is sometimes easier to run only a single project, single application and single task when troubleshooting.

I would expect (at least) x3 on BRP6, so good luck...

Edit: I also noticed from the last contact log ...

http://einstein.phys.uwm.edu/host_sched_logs/11995/11995469

This typo error

2015-08-01 12:35:14.8597 [PID=18794] Request: [USER#xxxxx] [HOST#11995469] [IP xxx.xxx.xxx.168] client 7.4.42
2015-08-01 12:35:14.8603 [PID=18794] [send] effective_ncpus 6 max_jobs_on_host_cpu 999999 max_jobs_on_host 999999
2015-08-01 12:35:14.8603 [PID=18794] [send] effective_ngpus 1 max_jobs_on_host_gpu 999999
2015-08-01 12:35:14.8603 [PID=18794] [send] Not using matchmaker scheduling; Not using EDF sim
2015-08-01 12:35:14.8603 [PID=18794] [send] CPU: req 259200.00 sec, 6.00 instances; est delay 0.00
2015-08-01 12:35:14.8603 [PID=18794] [send] ATI: req 0.00 sec, 0.00 instances; est delay 0.00
2015-08-01 12:35:14.8603 2015-08-01 12:35:14.4255 [PID=18787] SCHEDULER_REQUEST::parse(): unrecognized: 0

Keith Myers
Keith Myers
Joined: 11 Feb 11
Posts: 1,062
Credit: 1,172,625,871
RAC: 2,756,520

I generated both invalids and

I generated both invalids and errors when I tried to run Einstein and MW at the same time when SETI was down earlier in the week. Never had those issues before. I guess you just can't do it without running out of resources. I reported this problem Here.

 

Floyd1
Floyd1
Joined: 29 Jun 14
Posts: 14
Credit: 590,463,278
RAC: 0

I can echo the problem

I can echo the problem running multiple WUs on a 290X.

My AMD card (running at stock speeds) is only running E@H.

I am also running E@H on the iGPU (2 tasks complete in about twice the time for a single task so I cut the load down to a single WU to avoid unnecessarily using a CPU core).

While my quad-core CPU was doing 2 WUs on another project, it left 2 cores for servicing GPU tasks.

When I had 2 x 290X tasks and 2 x iGPU tasks running concurrently, I was getting no errored tasks but 100% invalid BRP6 tasks.

Once I trimmed the 290X tasks to running individually, I was getting 100% validated tasks again.

I guess the BRP6 tasks just don't play well with other tasks, even if those other tasks are from the same family.

AgentB
AgentB
Joined: 17 Mar 12
Posts: 915
Credit: 513,211,304
RAC: 0

RE: Once I trimmed the

Quote:


Once I trimmed the 290X tasks to running individually, I was getting 100% validated tasks again.

I guess the BRP6 tasks just don't play well with other tasks, even if those other tasks are from the same family.

Interesting, what happens when you run x2 on R290X alone (not using Intel GPU)?

Floyd1
Floyd1
Joined: 29 Jun 14
Posts: 14
Credit: 590,463,278
RAC: 0

RE: RE: Once I trimmed

Quote:
Quote:


Once I trimmed the 290X tasks to running individually, I was getting 100% validated tasks again.

I guess the BRP6 tasks just don't play well with other tasks, even if those other tasks are from the same family.

Interesting, what happens when you run x2 on R290X alone (not using Intel GPU)?

I'm still experimenting with permutations.

I'll post again when I have more results to go by....

chase1902
chase1902
Joined: 13 Aug 11
Posts: 37
Credit: 1,264,094,642
RAC: 0

Thanks AgentB, forgot i

Thanks AgentB, forgot i played with the check point (long time ago), i'll change the back.
I only do Einstein tasks and usually only like one type of GPU task on a machine. Never found that mixing GPU tasks are a good idea, can't see if there's a problem as the run times change so much.

I have no idea what this means
2015-08-01 12:35:14.8603 2015-08-01 12:35:14.4255 [PID=18787] SCHEDULER_REQUEST::parse(): unrecognized: 0
Or if it needs fixing, i'm sure it shows on all my computers.

Its strange how one task all validates with out a problem, any more they are all invalids. That was running just the BRP6 jobs, nothing else.

I played with different drivers etc, so the earlier task are a bit off, i just wanted to see if I could get more than one task to work, with no luck what so ever.

The computer is quiet happy running just the one GPU task, so Ive added the CPU tasks to keep it a bit more busy as it is only very slightly effecting the run times. As it is happy ill leave it be for the moment, got one of the other computers to fix (Crashes every night at about 2am), although might have found its problem when checking the temperatures and found it was up to 96C. Perhaps a bit to hot

AgentB
AgentB
Joined: 17 Mar 12
Posts: 915
Credit: 513,211,304
RAC: 0

RE: I have no idea what

Quote:

I have no idea what this means
2015-08-01 12:35:14.8603 2015-08-01 12:35:14.4255 [PID=18787] SCHEDULER_REQUEST::parse(): unrecognized: 0
Or if it needs fixing, i'm sure it shows on all my computers.

see cc_config file here and it appears to have stopped - it was visible in the "last contact" link for that host.

Quote:


Its strange how one task all validates with out a problem, any more they are all invalids. That was running just the BRP6 jobs, nothing else.

The computer is quiet happy running just the one GPU task, so Ive added the CPU tasks to keep it a bit more busy as it is only very slightly effecting the run times. As it is happy ill leave it be for the moment, got one of the other computers to fix (Crashes every night at about 2am), although might have found its problem when checking the temperatures and found it was up to 96C. Perhaps a bit to hot

I'm not happy running my AMD HD7990 at 90C I seem to recall it behaving badly at 98C.

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5,388
Credit: 51,662,394,724
RAC: 69,962,310

RE: ... I have no idea what

Quote:
... I have no idea what this means
2015-08-01 12:35:14.8603 2015-08-01 12:35:14.4255 [PID=18787] SCHEDULER_REQUEST::parse(): unrecognized: 0
Or if it needs fixing, i'm sure it shows on all my computers.


At some point you must have created a client configuration file (cc_config.xml) and inserted the instruction to allow/disallow multiple clients (ie. concurrent instances of BOINC) to run simultaneously. The problem is that you have made a typo in the closing tag by using the string 'clennts' instead of 'clients'. I don't understand why you would be wanting to have such an option and with that typo the whole instruction is going to be ignored anyway. You should either fix the typo or delete the whole line from cc_config.xml. If you really do have a need to run multiple clients, you should read the instructions about cc_config.xml and note the warnings about running multiple clients.

While you're investigating your problems, maybe you should have a good look at all options in your cc_config.xml. You don't really need this file for standard operations so perhaps you should remove it at least temporarily. Can you tell us what you need it for?

I have no idea why you can't successfully run multiple GPU tasks concurrently. I thought your CPU and elapsed times looked a little unusual with the bulk of the elapsed time being made up by just the CPU component. Seems to indicate the GPU is super slick :-). I had a browse through the top hosts list looking for Hawaii series GPUs and found at least 4 or 5 such hosts, mostly with more than one GPU. I was hoping to find examples of multiple concurrent tasks but all I saw were times quite similar to yours (just a bit slower if anything) so I assume they were just running single tasks like you. Perhaps it's just a driver issue that AMD hasn't yet found/resolved. Perhaps you could send a PM to owners of hosts with this card to see if they'd be willing to try running multiple tasks? It would be interesting to know if they see the same as you.

Cheers,
Gary.

AgentB
AgentB
Joined: 17 Mar 12
Posts: 915
Credit: 513,211,304
RAC: 0

RE: I have no idea why you

Quote:

I have no idea why you can't successfully run multiple GPU tasks concurrently. I thought your CPU and elapsed times looked a little unusual with the bulk of the elapsed time being made up by just the CPU component. Seems to indicate the GPU is super slick :-). I had a browse through the top hosts list looking for Hawaii series GPUs and found at least 4 or 5 such hosts, mostly with more than one GPU. I was hoping to find examples of multiple concurrent tasks but all I saw were times quite similar to yours (just a bit slower if anything) so I assume they were just running single tasks like you. Perhaps it's just a driver issue that AMD hasn't yet found/resolved. Perhaps you could send a PM to owners of hosts with this card to see if they'd be willing to try running multiple tasks? It would be interesting to know if they see the same as you.

This Fury thread suggests that 288larson and Gaurav Khanna, were having a problem at x2 (both Windows, both single GPU)

Edit: and you have not fixed the cc_config typo as mentioned above.

Edit++: something weird here this old thread here has exactly the same typo?!

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5,388
Credit: 51,662,394,724
RAC: 69,962,310

RE: This Fury thread

Quote:
This Fury thread suggests that 288larson and Gaurav Khanna, were having a problem at x2 (both Windows, both single GPU)


The 290X and the Fury X are quite different beasts (Hawaii vs Fiji).

Gaurav made a subsequent post where he stated:-

Quote:
Fury X: On 2 WU .. It seems to complete each in 3600.
So effectively 1,800 secs for each.


Also, he wasn't using Windows, he was using Ubuntu.

There was also a post by another participant who said there was only one invalid showing in 288larson's list of results. 288larson later posted that the driver had been updated. I don't really know what the ultimate upshot was as there haven't been further results reported.

I don't tend to follow discussions of top end, power hungry and expensive GPUs too closely as I know it's unlikely I'll be buying them, particularly when they're rather recently released.

Quote:
Edit++: something weird here this old thread here has exactly the same typo?!


Yes, that is a bit weird. You tend to see examples from time to time where people post their cc_config.xml files that are being used for specific purposes and others, who aren't comfortable with designing their own, may use these as 'templates' and perhaps end up with options (complete with typos) that they don't really need. I imagine something like this must have happened and two different people ended up copying the same typo.

Cheers,
Gary.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.