GPU Upgrade Shows No Improvement in Work Unit Completion

Florida Rancher
Florida Rancher
Joined: 4 Oct 13
Posts: 31
Credit: 23998436
RAC: 0

Archae, I appreciate you

Archae, I appreciate you staying with me on this thread and looking over my results from time to time. Your insights are informative and I very much appreciate the time you are taking to educate me. My understanding has improved dramatically and I'm enthusiastic about improving performance.

If things continue to improve with the CUDA55s I'm wondering if the 970 may be able to run 3 WUs. Obviously, the 745 will not run 3 so is there a way to set up a app_config file in the Einstein directory that has separate settings for each GPU?

After changing the "Use at most of the CPUs" to 80% that should have left one CPU available for GPU crunching but Boinc is still running 8 CPU cores. I made that change days ago but the change has never been reflected in the program. I thought maybe it would change when I started crunching the CUDA55s.

Do you think not having a free core is having an impact on GPU performance?

Regards,
Phil

archae86
archae86
Joined: 6 Dec 05
Posts: 3145
Credit: 7023174931
RAC: 1830249

RE: If things continue to

Quote:
If things continue to improve with the CUDA55s I'm wondering if the 970 may be able to run 3 WUs. Obviously, the 745 will not run 3 so is there a way to set up a app_config file in the Einstein directory that has separate settings for each GPU?


Either one can run 3x, the question is relative productivity. I suspect the 745 will lose a bit but not very much. Quite possibly the 970 may gain a bit, maybe enough to make it interesting, quite possibly not. I run my 970 + 750 Ti machine at 2x, but have not checked that particular variable in quite some time. I've relied on the Einstein web page way of setting GPU multiplicity, and have not used a config file to exert control over a multi-GPU machine. I suspect it is possible.

Quote:
After changing the "Use at most of the CPUs" to 80% that should have left one CPU available for GPU crunching but Boinc is still running 8 CPU cores. I made that change days ago but the change has never been reflected in the program. I thought maybe it would change when I started crunching the CUDA55s.


The mechanism works, so you did something wrong.

Unlike changing the number of tasks running on a GPU, nearly all the other changes take effect as soon as your host "phones home", something you can force at almost any time by clicking the Update button in Boincmgr for the project.

There are two basic places to set this type of preference
1. Using the web page for your account at Einstein. This setting is specific to one of the four possible "locations" (aka venue) and has no effect unless you set it for the same location currently occupied by your computer (default, home, work, school). Just to make matters a bit more difficult, the default location is in some places indicated by a dash, elsewhere with the word default
2. setting a local preference, usually using Tools|Computing Preferences in BoincMgr directly on the machine itself. If you set preferences in BoincMGr, they take precedence over settings from the web account page.

You did not mention where or how you set the preference, but I'll guess the two most likely mistakes are:
1. You forgot you had set a local preference earlier, and are trying to assert control from the web page without going into boincmgr to the preferences settings and selecting the option to clear local preferences.
2. or you just set a web preference for a different location than the one at which your host resides.

Quote:
Do you think not having a free core is having an impact on GPU performance?


Yes. How much is a matter of experiment, but it is not likely not to matter at all. Lowering CPU usage yet more may well have further effects.

By the way, while a great many people here use the term "free core" it is wildly inaccurate as regards the real system behavior. What you are really controlling in this case is how many separate tasks BOINC fires up on your machine. BOINC gets no vote on what runs where, so unless you use non-BOINC means to control task affinity (for example using Process Lasso--which I don't recommend to you at this stage) there is no such thing as a "reserved core" for GPU support, or such.

The effects of reducing number of CPU jobs vary quite a bit with application, GPU type, GPU model, and host system characteristics. Experimentation is key.

Florida Rancher
Florida Rancher
Joined: 4 Oct 13
Posts: 31
Credit: 23998436
RAC: 0

Archae: Once again you

Archae:

Once again you came to the rescue and made me look like a dunce.

Quote:
2. setting a local preference, usually using Tools|Computing Preferences in BoincMgr directly on the machine itself. If you set preferences in BoincMGr, they take precedence over settings from the web account page.

Originally I made changes to the preferences in BoincMgr but I didn't institute any of the later changes you suggested in the manager but only on the web account page.

In my BoincMgr, computer preferences is under the Options tab. I went there, made the same changes I did on the web account page (use at most 87.5% of CPUs) and voilà the number of CPU tasks decreased from 8 to 7 instantly dedicating one CPU to GPU tasks.

Thanks again my guru,
Phil

archae86
archae86
Joined: 6 Dec 05
Posts: 3145
Credit: 7023174931
RAC: 1830249

Phil, An odd

Phil,

An odd thing;

Your list of computers now shows two, which appear actually to be the same computer.

The ID you have been working under up until now last reported work at 25 May 2016, 2:19:48 UTC, while the new ID has made lots of reports since 25 May 2016, 14:48:43 UTC.

Did you want that split for some reason? If not, you may be able to merge them, though that may be blocked if you have changed the computer's name.

Florida Rancher
Florida Rancher
Joined: 4 Oct 13
Posts: 31
Credit: 23998436
RAC: 0

Archae: I'm baffled as to

Archae:

I'm baffled as to how that occurred. I didn't add a 2nd computer and I always choose the --- default computer. I wondered what was happening because my RAC dropped like a rock yesterday and today.

Since I started crunching CUDA55 tasks the WUs are completing much faster but overnight my my RAC fell by almost 50,000. I found a way to merge the two computers but it made no change in my total credit or RAC.

The only change I made was to my settings in BoincMgr. Those were the changes you suggested yesterday. I don't think Boinc computing preferences even allows me to choose a computer.

Where did all my work performed in the last 30 hours go? I'm getting some credit but not 100% of what I've performed.

Phil

Florida Rancher
Florida Rancher
Joined: 4 Oct 13
Posts: 31
Credit: 23998436
RAC: 0

Archae: I came across this

Archae:

I came across this statement in the Merge Computers section: "Sometimes BOINC assigns separate identities to the same computer by mistake. You can correct this by merging old identities with the newest one."

Like I stated earlier merging the computers made no difference in my stats. But, something strange is going on. When the CUDA55 tasks for the 970 first began, the run times decreased to 1 hours 43 min from 2 hours 23 min.

Weirdly, some of the WUs for the 970 are now taking up to 4 hours 55 minutes to complete which is twice as long as the CUDA32 WUs were taking. One of the two GPU tasks is taking 2x longer to complete than the other. Why are the times so erratic?

And like I stated earlier my RAC has really fallen off. My current RAC is about the same as it was 3 days ago.

Under the Computers tab in BoincMgr what does this statement represent "While is (sic) BOINC running, fraction of time GPU computing is allowed 0.00%." Shouldn't that number be close to 100%?

I was very much looking forward to the speed increase crunching CUDA55s but now I'm dismayed by the erratic nature of the crunching times.

Dazed and confused,
Phil

archae86
archae86
Joined: 6 Dec 05
Posts: 3145
Credit: 7023174931
RAC: 1830249

RE: I was very much looking

Quote:
I was very much looking forward to the speed increase crunching CUDA55s but now I'm dismayed by the erratic nature of the crunching times.


Most likely you are fiddling too much.

Specifically, you can learn which card(s) actually ran a returned task by reviewing the stderr portion of the task page and searching for the string "CUDA device".

For a currently available example which may have alarmed you:
Task 560062247
is reported as starting running on your 970 at 22:55:43, but at 00:03:29 the next day started up on your 745. Then at 6:22:55 it restarted again on the 970, finishing at 6:29:17.

Tasks can resume from checkpoint on either card. So if you reboot your computer, or even just stop and then restart BOINCmgr, in-process tasks sometimes switch from one card to another.

In normal non-stop processing this does not happen.

Changes in RAC reflect not only your own production but the state of your quorum partners. If a bunch of your previously delinquent partners suddenly report, your RAC will surge through no merit of your own, and vice versa.

When doing this sort of adjustment work, it is better to pay close attention to changes in average elapsed time ("run time" in the table) and to check closely for any invalid results, than to track daily changes in RAC. In the long run RAC will closely approximate your steady state, but it can be a nervous indicator in the short term.

In the specific case of Einstein, I don't know what the web page number displayed as

Quote:
While is BOINC running, fraction of time GPU computing is allowed

means, but it also shows as 0.00% on my latest computer, so I don't think it is a problem for you.

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5842
Credit: 109382626158
RAC: 35967714

RE: I came across this

Quote:
I came across this statement in the Merge Computers section: "Sometimes BOINC assigns separate identities to the same computer by mistake. You can correct this by merging old identities with the newest one."


Here's a fuller description of merging and why you needed to do it.

When you first install and start the BOINC client on your machine, it contacts the project and asks for a host ID. The project assigns the next available one. The client stores the ID in the state file, a file called client_state.xml which is created in the BOINC directory on your machine. Apart from the host ID, that file contains every vital detail about everything to do with tasks, apps, data files, results, etc on your machine. If that file becomes corrupted, BOINC will try to recover what it can, but you will lose something. BOINC will replace missing tasks, apps, data by asking the project for fresh copies, but it can't recover anything if the host ID record is lost. It will just ask the project for the next available ID and start again.

For some reason, your original host ID must have been trashed. In 11 years of running BOINC I've never seen the ID become 'lost' for no good reason. BOINC doesn't ever request a new ID capriciously. There will be a good reason, probably associated with the state file being interfered with in some way. It will not be due to 'normal' operations such as stopping and restarting BOINC or the changing of preferences, either through the manager or the website. If you deleted files, moved files or made other changes outside of BOINC, you may have inadvertently caused BOINC to lose contact with its state file. BOINC actually keeps backup copies (in memory and on disk) of the state file so that if something interferes with the main copy, BOINC can largely recover by just recreating it.

If BOINC has needed to request a new ID, one will be supplied by the project immediately. The old one is still there on the project, so you will be able to see all IDs that have ever been assigned to your account. Inactive IDs are not deleted unless you, the volunteer, request it, or you merge it into a new ID because it was an obvious duplicate. The merge operation, when allowed, will merge the old ID stats into the new ID and delete the old ID.

Your new host ID was created on 25 May 2016, 5:17:19 UTC. If you convert that into your local time, you might be able to remember what you were doing leading up to that time. That's about the only way to figure out exactly what caused your old ID to be lost.

Quote:
Like I stated earlier merging the computers made no difference in my stats.


The merge has obviously been successful since your old ID is no longer visible. Your new ID has a total credit over 3M and a RAC of 94K. It couldn't possibly have those numbers in less than 2 days if the stats from your previous ID hadn't been properly merged in.

Quote:

But, something strange is going on. When the CUDA55 tasks for the 970 first began, the run times decreased to 1 hours 43 min from 2 hours 23 min.

Weirdly, some of the WUs for the 970 are now taking up to 4 hours 55 minutes to complete which is twice as long as the CUDA32 WUs were taking. One of the two GPU tasks is taking 2x longer to complete than the other. Why are the times so erratic?


Archae86 has commented about this but I want to focus on the bit in red. It implies you are running just two GPU tasks. You are supposed to be running 4, two on each GPU. Because of the differing abilities of each GPU, there will be a big time difference between them anyway. If there is only one task running on the 970, this will ultimately hurt your RAC. You need to tell us properly how many tasks of each type are running. For the settings suggested to you, there should be 4 GPU tasks (2 for each GPU) and 6 CPU tasks (use 75% of cores). Two of the 4 GPU tasks (on the 970) should be running quickly and the two on the other GPU should be running more slowly.

Quote:
I was very much looking forward to the speed increase crunching CUDA55s but now I'm dismayed by the erratic nature of the crunching times.


If you concentrate on getting the settings correct and then take a largely hands off approach, everything will come up to expectations in due course. You just need to be patient.

Cheers,
Gary.

Florida Rancher
Florida Rancher
Joined: 4 Oct 13
Posts: 31
Credit: 23998436
RAC: 0

Gary: Once again you came

Gary:

Once again you came to the rescue and explained things in a way a 1st grader (like me) can understand. You are an excellent educator and moderator.

The only time I made any changes to files in the Boinc directory was to (per your advice) delete the app_config.xml file over a week ago. As it turns out it should been in the Einstein directory and not the Boinc directory anyway.

2-3 days ago was the 1st time I made use of the "Computing Preferences" in Boinc Manager per Archae's informative suggestion. I had not used this method to change settings until Archae brought it up. I was only using the web preferences but the number of CPU cores would not change by changing the %.

I inquired as to why I was still running 8 CPU cores even though I had changed "Use at most "n" of the CPUs" to 87.50% which leaves a whole CPU core for GPU tasks. Changing it in Boinc Manager had an immediate effect and reduced the CPU cores from 8 to 7.

Surprisingly, I haven't noticed any change in performance or WU times by releasing this one core for GPU tasks to use. Using 7 or 8 seems to make no difference.

I don't know if Boinc added the second computer at that point. I checked the time it occurred and it happened at 1:17 AM Florida time. That is usually the time of day I read my messages and make changes suggested in my "GPU Upgrade" thread before heading off to bed.

Quote:
Archae86 has commented about this but I want to focus on the bit in red. It implies you are running just two GPU tasks. You are supposed to be running 4, two on each GPU. Because of the differing abilities of each GPU, there will be a big time difference between them anyway. If there is only one task running on the 970, this will ultimately hurt your RAC. You need to tell us properly how many tasks of each type are running. For the settings suggested to you, there should be 4 GPU tasks (2 for each GPU) and 6 CPU tasks (use 75% of cores). Two of the 4 GPU tasks (on the 970) should be running quickly and the two on the other GPU should be running more slowly.

Firstly, any tweaking I'm doing is very minor and mostly it involves correcting mistakes or instituting changes you or Archae suggest. I'm not doing any overclocking and have both cards set to their default settings. It may seem like I'm making many foolish changes but I'm not. I don't change anything unless I'm instructed to do so.

I am very patient but when I see an anomaly I usually try to get an opinion from the two of you. Normally I ask because your answers lead to better understanding for me. Do you think you can teach a 66 year OLD MAN new tricks?

I got my first PC in 1981 (a $3500 dual floppy no hard drive slug) and learned MS-DOS commands backwards and forwards. I still remember them today and use them when I'm at the command prompt.

When I reentered college to do my graduate degree I took a computer course and did some programming in BASIC, Pascal, C++ and eventually, CP/M and MP/M. I've forgot them all.

In 1985 I set up a multi-user, multi-terminal medical billing software program for a large surgical practice using an easily corruptible MP/M system. At the time their account receivables was unmanageable $750,000. Still working on my MBA I started a firm called Physicians Consulting Group.

By the time I finished my MBA at the University of Denver I completely changed horses and became a technical analyst for a large securities brokerage firm. Later I bought a large cattle ranch in Florida to have a place to raise my 4 sons on and give them a wholesome, healthy, family-oriented lifestyle.

Now that I was only operating computers and not programming or installing them the technology left me behind. I still have a great interest in what makes systems tick but it is only a retired old man's hobby.

Whew! What made me get started on that.

Quote:
Archae86 has commented about this but I want to focus on the bit in red. It implies you are running just two GPU tasks. You are supposed to be running 4, two on each GPU. Because of the differing abilities of each GPU, there will be a big time difference between them anyway. If there is only one task running on the 970, this will ultimately hurt your RAC. You need to tell us properly how many tasks of each type are running. For the settings suggested to you, there should be 4 GPU tasks (2 for each GPU) and 6 CPU tasks (use 75% of cores). Two of the 4 GPU tasks (on the 970) should be running quickly and the two on the other GPU should be running more slowly.

To answer your question I have a GTX 745 (it came with my Dell) and a GTX 970 installed. Each video card is crunching 2 GPU tasks. Also, my 8 core I7 6700 CPU is running 7 tasks (using 87.5% of the cores) with one CPU left over for GPU tasks.

What would really help is if I could construct an app_config file that has separate settings for each card; something I would never attempt to do myself. I'd like to see how the 970 performs running 3 tasks while the 745 runs 2.

Yes, my goal is reach a "hands off" approach but right now I feel a few minor tweaks may be required to get my system stable and running smoothly. I calculated tonight that ideally my daily credit should be somewhere around 155,877.

What kind of monster system are you using to achieve an RAC of over 3,000,000?

Thank you again Gary for giving this old dog a few new bones to chew on. I appreciate you stopping by for a visit now and then.

Regards, I guess I should be saying Cheers to our Austrailian friends.
Phil

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5842
Credit: 109382626158
RAC: 35967714

RE: The only time I made

Quote:
The only time I made any changes to files in the Boinc directory was to (per your advice) delete the app_config.xml file over a week ago.


Yes, the app_config.xml file is NOT part of a normal installation so you are quite free to delete it. The purpose of the file is to allow finer control of a particular project's app - hence the 'app' part of the filename.

Because both of your GPUs (residing in the one host) are running the same BRP6 app, you wont be able to use this file to allow one GPU to crunch 3x and the other to crunch 2x. You could, if each GPU was using a different app version but they are not. You might be able to do it using BOINC's anonymous platform mechanism to set up an exact duplicate of the BRP6 cuda55 app to have a different name/version number and then tie the two versions to the particular GPU with the particular settings you wanted to run with.

I have no experience on how to achieve what you want. I don't have any hosts with two working GPUs (yet) so I've never had to explore any of this. The documentation for client configuration and anonymous platform each refer to the other through the option and it looks like it might be possible. It's not something even an advanced user would find easy to get right. Maybe someone has already tried this and could chime in here about it.

Alternatively, if the GPUs were in separate hosts, you could use two different app_config.xml files, one in each host. You could also achieve this without any files by putting each host in a separate 'location' or 'venue' and setting the different preferences for these. Of course, this is also not appropriate without a 2nd host. I'm certainly not suggesting you should purchase and set up a 2nd host just to achieve this :-).

Quote:
Surprisingly, I haven't noticed any change in performance or WU times by releasing this one core for GPU tasks to use. Using 7 or 8 seems to make no difference.


I'm not surprised for a number of reasons. I run a number of low to mid-range NVIDIA GPUs, 550Ti, 650, 650Ti, 750Ti, some at 2x and some at 3x. Limiting the number of CPU tasks on any of these made so little difference that I decided the extra CPU task was more important to me than any slight improvement in GPU crunch time. I don't have any experience with more powerful NVIDIA models.

Another point to consider is the true number of cores - in your case 4. You have 8 virtual cores so if you only use 7 of these, your 4 real cores are largely still all loaded. This is why, in an earlier message (in the bullet points list at the end) I advised to use 75% as the setting. Now that it's clear that you are running 4 GPU tasks concurrently (2 on each GPU) and 7 CPU tasks, going to 75% may make a worthwhile difference. You will only find out by trying the experiment and running quite a number of tasks to get a reasonable average for the new crunch time. Don't base it on just a couple of tasks. Run that way for a couple of days to be sure.

Quote:
... any tweaking I'm doing is very minor and mostly it involves correcting mistakes or instituting changes you or Archae suggest. I'm not doing any overclocking and have both cards set to their default settings. It may seem like I'm making many foolish changes but I'm not. I don't change anything unless I'm instructed to do so.


It really doesn't matter now what it was that caused BOINC to request a new host ID. It seems all is good now and both total credit and RAC are on nice upward trends. Because of timezone differences, it's not always possible to get instant feedback when something arises that raises further questions. Different people quite often genuinely have different opinions so if you start with one suggestion and then decide to follow something different, things can get a bit confused and mistakes can occur.

From past experience, I've found it's best to wait, possibly up to 24 hours, to get the range of responses and then to try to evaluate the suitability of each response for your particular circumstances. Some responses may take you in different directions and be counter productive, if you try to follow them all. If multiple people are making suggestions, I tend to bow out from presenting alternatives so as not to 'muddy the waters' any further. I will always come in later if I know of a 'better' solution. I try to give reasons why I think it's 'better'. I can quite easily be wrong so you need to be cautious about whatever is being suggested.

Quote:
I am very patient but when I see an anomaly I usually try to get an opinion from the two of you. Normally I ask because your answers lead to better understanding for me. Do you think you can teach a 66 year OLD MAN new tricks?


I'm well into my 70's - you're just a spring chicken :-). We all learn by trying things and observing the results - particularly the failed ones :-). From your impressive CV, I'm sure you'll do just fine. I reckon I'll end up learning from you. I learn a lot from what others write.

Quote:
What would really help is if I could construct an app_config file that has separate settings for each card; something I would never attempt to do myself. I'd like to see how the 970 performs running 3 tasks while the 745 runs 2.


As mentioned earlier, I don't think you can do this easily. Maybe someone who knows of some 'trick' to employ or has perfected the anonymous platform mechanism might share the details. I tend to be wary of anonymous platform because it's very easy to trash an entire work cache if you get it wrong. Also, it's not that trivial to fix things if there is a new version of the project supplied app.

You could try running 3 on both. Maybe the low end card will still work without locking up or suffering too much slowdown. The 970 will likely show an improvement and nothing too bad should happen - I suspect but can't promise :-). Another option (which involves spending money) would be to replace the existing card with a single slot 750 or 750Ti, if such a narrow card was available. I run 3x on a 750Ti without problems. Such a card should give a nice further increase to your daily credit. You'd just need to make sure it would fit the available space :-).

Quote:
Yes, my goal is reach a "hands off" approach but right now I feel a few minor tweaks may be required to get my system stable and running smoothly. I calculated tonight that ideally my daily credit should be somewhere around 155,877.


If you go to one of the stats sites (links on your account page) you can find things like your daily credit increase - Boincstats says you increased by 136,200 in the last day. In the statistics tab of BOINC Manager, you can see separate plots for total credit and RAC. The RAC plot gives a good idea (from its shape) of where your RAC might be headed. It will take about 30 days for the value to reach some sort of plateau.

Quote:
What kind of monster system are you using to achieve an RAC of over 3,000,000?


Lots of pretty ordinary systems. My most productive one is a recent ebay purchase of a HD7950. It's just reaching its plateau of around 135K or so. When I was finding your daily production on Boincstats, I noticed that I've just had a 4M day (so it says) - so yes, I'm a bit over 3M :-).

Quote:
Thank you again Gary for giving this old dog a few new bones to chew on. I appreciate you stopping by for a visit now and then.


You're most welcome! It's not so much a matter of "now and then" since I've been following the thread quite closely. When others are giving help I tend to stay out rather than risk confusing things. I look forward to following your progress as you power on up the leader boards :-).

Cheers,
Gary.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.