Observations on FGRPB1 1.16 for Windows

WhiteWulfe

Joined: 3 Mar 15

Posts: 31

Credit: 62249506

RAC: 0

Gary Roberts wrote:A quick

17 Dec 2016 16:24:32 UTC

Message 152994 in response to message 152965

(moderation:

)

Gary Roberts wrote:

A quick look at some of your validated tasks shows elapsed/cpu times in two groups - 290s/255s for one and 450s/390s for the other. Are you sure your 7:20 (440s) represents running tasks 1x and not 2x??

As far as I can tell, it's set to only run the one work unit, and completion times appear to show this

Gary Roberts wrote:

Do you remember what the estimate was for the very first GPU tasks you received? If it was a lot lower than the 7:20 you mentioned, I could imagine the BOINC client requesting (and continuing to request) lots of tasks until the first tasks were completed. Then, the estimate of all those fetched tasks would be adjusted upwards to the true value giving you the excess over your 0.5 day setting. Other than that, I can't think of a reason why BOINC would over-fetch so dramatically. Are you sure BOINC is still requesting even more work? I haven't seen anything like that on the rather old (7.2.42) BOINC version I use. I don't know if it's anything to do with BOINC version.

Initial estimates were around the 3h39min mark if I remember correctly (it was estimating something like 66.5 days to do 446 work units). BOINC is now estimating around 36m42s per work unit, and after sending the units that had queued up overnight it's now downloaded another 140 since then - in seven separate batches of 20. At 7m20s, my rig can only do about 192 or so of them in a day, so why it's keeping a queue higher than 96-100 of them is something I don't quite understand. I use BOINC Manager 7.6.22 alongside BOINCTasks 1.69.

Only reason I could see it downloading so many is it thinks that I have Gamma-ray pulsar set for four WU's (like I did with Parkes and Arecibo) but even then it should have stopped around the 384-400 work units mark. I had to set it to No More Work as it was still snagging work units even at 530 already in the queue (with an estimate of 12.5 days!)

But the incredibly annoying part is since BOINC sees that there's 12 days of work units for the GPU it won't let any of my other projects even try to pull GPU work... Yet Einstein@Home keeps grabbing more and more.

Gary Roberts wrote:

Because of the high CPU time component of each GPU task - getting up towards 90% of the elapsed time - and because of the fact that you are using HT, I'm not surprised that allowing BOINC to run six CPU threads on your four real cores is loading all of them. I don't think BOINC accounts for the CPU component of GPU tasks so your two available virtual cores will be providing the GPU support. Perhaps you might like to try fewer CPU threads to see if that improves overall performance. I notice you support other projects. Do you always have six CPU tasks (from any project) running concurrently?

With GPU work units from most projects you simply "park" a thread or two to allow the GPU "breathing room" (aka keeping it fed with work), and then set BOINC to use whatever is left for CPU projects. In my case, I have BOINC set to use 75% of the CPU (so six threads), and even high CPU usage projects like the now retired POEM@Home would use those "parked" threads - even Einstein@Home running four Parkes or Arecibo work units would only have an overall load of approximately 93-95% on the CPU. These new Einstein@Home work units are being treated by BOINC as if they're a CPU ~and~ GPU work unit, so it's not only using the two threads that are specifically parked for a GPU but also one of the ones that's supposed to be used for other projects.

Most CPU projects don't see any real differences between running on a hyperthread or the actual core itself, although a work unit running on such might take a tiny bit longer. Setting fewer CPU threads just means I'm cannibalizing work for the other projects I'm running, and the time it takes to complete one of the Einstein@Home GPU work units appears to remain consistent.

EDIT: And yup, I pretty much always have six CPU threads running for various other projects, and then the other two threads dedicated for the GPU to do it's thing with ^_^

Fifteen minutes later edit: Turns out part of the cpu usage problem was related to Chrome being silly (aka memory leak that steadily eats up more CPU), so restarting that solved that particular problem. Averaging 77-80% cpu load now, so that's always good.... BOINC's still treating the Einstein@Home work units as if they're a CPU one, which isn't as good in my eyes, but we'll see what fixes come in over the next few versions. ^_^

Edit the third, aka about 20 mins after initial post: It probably also isn't helping that the work units will reach 11% completion and then finish.

Ace Casino

Joined: 25 Feb 05

Posts: 36

Credit: 1513521051

RAC: 598631

One of my computers (The one

17 Dec 2016 16:45:02 UTC

Message 152996

(moderation:

)

One of my computers (The one with the GTX 970) has switched over to the new Gamma-Ray Pulsar 1.16 but is only running one GPU (WU) at a time. Any reason why? It's set to run 3. I've read where others are running at least 2 at a time.

Thanks

archae86

Joined: 6 Dec 05

Posts: 3164

Credit: 7369561687

RAC: 2233481

Ace Casino wrote:One of my

17 Dec 2016 16:55:08 UTC

Message 152997 in response to message 152996

(moderation:

)

Ace Casino wrote:

One of my computers (The one with the GTX 970) has switched over to the new Gamma-Ray Pulsar 1.16 but is only running one GPU (WU) at a time. Any reason why? It's set to run 3. I've read where others are running at least 2 at a time.

Do you have a restriction on CPU core usage? 1.16 work is downloaded with a parameter value which has the scheduler on your machine assuming it will use up one CPU core for support (not a fraction of one, as with most Einstein GPU work). So if you have restricted available cores, or if boinc on your machine is running some tasks "high priority" to avert a deadline problem, you may have the reason.

archae86

Joined: 6 Dec 05

Posts: 3164

Credit: 7369561687

RAC: 2233481

WhiteWulfe wrote:BOINC sees

17 Dec 2016 16:59:27 UTC

Message 152998 in response to message 152994

(moderation:

)

WhiteWulfe wrote:

BOINC sees that there's 12 days of work units for the GPU it won't let any of my other projects even try to pull GPU work... Yet Einstein@Home keeps grabbing more and more.

It is important to understand that the relationship is that boinc running on your machine chooses which project it requests work from and how much. All the project does is honor each request or deny it.

Mad_Max

Joined: 2 Jan 10

Posts: 165

Credit: 2257672265

RAC: 631799

Here my stat on v.1.16 for

19 Dec 2016 14:49:01 UTC

Message 152999

(moderation:

)

Here my stat on v.1.16 for AMD platform

Radeon HD7870 2 GB (@1.1Ghz) + FX 8320(@4.0Ghz)

1 task = 847 sec average from 20 WUs
2 task = 1096 sec average (6 WUs) or 548 sec per WU or ~55% faster

Interesting thing - my HD7850 2 GB have almost same run-times (may be 2-3% slower) despite the fact that GPU is 30% slower (7870 = 1280 shader @1100 Mhz, 7850 = 1024 shaders@950 Mhz).
Looks like app performance purely limited by VRAM speed? (both my 7850 and 7870 have exactly same RAM = 2 GB 256 bit GDDR5 @ 1200(4800) MHz)

Also I check runtimes of 2 tasks on 7850 1 Gb card - its in 5500-7000 s range, ~5-6 times slower compared to 2 GB card with same GPU.
So running >1 task per Gb of VRAM is very bad idea. And another indication about app performance heavily depends on VRAM speed, not GPU core speed.

P.S.
May be we need lower estimated computation size for FGRPB1G tasks? They now fast and DCF goes down. So other E@H taks get very unrealistic runtime estimation. After few dozens of FGRPB1G Wus completed in 14-16 mins each my DCF fall to 0.2. It bring run-time estimation for FGRPB1G to normal level.
But at same time Multi-Directed Continuous Gravitational Wave search CPU tasks get very optimistic run-time estimation about 1.5 hour. While real runtime is in 7-10 hours range. So after few GW Wus DFC jumps back to >1 level. It return GW WUs estimated runtimes back to normal, but FGRPB1G get wrong estimation (~1.5 hours instead of 14-16 min). With such mix BOINC simply can not calculate correct DFC.
With such large misestimation and DCF jumps BOINC can go completely nut after some time - like downloading hundreds of GW WUs without a chance to finish them in time.

Keith Myers

Joined: 11 Feb 11

Posts: 5055

Credit: 19175941250

RAC: 5497748

Gary Roberts wrote:Keith

17 Dec 2016 17:49:10 UTC

Message 153000 in response to message 152969

(moderation:

)

Keith Myers wrote:

Can anyone help me decode the log messages?

Gary Roberts wrote:

It's a bit more complicated than that because the server seems to give up after checking only 72 possible candidate tasks to send. Maybe there are still lots of allowed GPU tasks but the server can't get to them because it is giving up too early. The difference between tasks 'allowed for the GPU' that have been issued and those 'allowed for the CPU' is likely to be very large because of the speed of GPUs causing those tasks to be consumed much more quickly.

As I said, this is just conjecture.

Thanks Gary. I have over 660 tasks on each computer now with only 2 week deadline to clear them. Once again Einstein has overestimated how much work to send a computer. No way they will finish in time since Einstein is not my major project and only gets 10% resource allocation. Have set all computers to NNT and will hope for the best I guess.

Logforme

Joined: 13 Aug 10

Posts: 332

Credit: 1714373961

RAC: 0

Ace Casino wrote:One of my

17 Dec 2016 19:11:30 UTC

Message 153002 in response to message 152996

(moderation:

)

Ace Casino wrote:

One of my computers (The one with the GTX 970) has switched over to the new Gamma-Ray Pulsar 1.16 but is only running one GPU (WU) at a time. Any reason why? It's set to run 3. I've read where others are running at least 2 at a time

I found that the FGRP GPU app ignored the web project settings to run multiple tasks at once. I had to add an app_config.xml file in the project data directory (C:\ProgramData\BOINC\projects\einstein.phys.uwm.edu for me). The file contains:

<app_config>
<app>
<name>hsgamma_FGRPB1G</name>
<gpu_versions>
<gpu_usage>.5</gpu_usage>
<cpu_usage>1</cpu_usage>
</gpu_versions>
</app>
</app_config>

Gary Roberts

Moderator

Joined: 9 Feb 05

Posts: 5887

Credit: 119345786070

RAC: 25804010

archae86 wrote:Gary Roberts

17 Dec 2016 22:34:00 UTC

Message 153006 in response to message 152988

(moderation:

)

archae86 wrote:

Gary Roberts wrote:
Are you saying there was a deferral (somewhere in the range of 0-24 hours) that took you to something like 20 mins after midnight UTC and once that deferral had run down to zero there was a further 24 hour deferral without any further tasks being acquired at the expiry of the first deferral?

Yes, there was.

However when I woke up this morning, with the 24-hour deferral ticking down toward a still-distant midnight UTC, a manual update request was honored. So it appears that the real daily task quota limit deferral was to some other boundary than midnight UTC, not known correctly to BOINC on my system. Maybe midnight my time, US west coast time, or ...

I reckon I now understand what's happening. There seem to be two different causes for the deferral to the next midnight UTC. One is the task limit exceeded. The other is not to do with a daily limit but rather some number of consecutive work requests where the server is unable to supply. Each unfilled request generates an increasing length of deferral until (at some magic number of them) BOINC decides it can't get work and so defers further requests until the next day. This is being triggered by the current beta task allocation restrictions.

There are a couple of possible ways to circumvent this. You have discovered one of them - manual update resulting in at least one task being fetched. Another way is a script or scheduled process of some sort to automate the update using boinccmd. If an 'update can be generated periodically, say once per hour, it would have no harmful effect if not needed but would cancel a deferral if one happened to be in place. This is very easy in Linux but I have no knowledge about Windows.

The problem should be a temporary one and should disappear once the beta restrictions are removed.

Cheers,
Gary.

Ace Casino

Joined: 25 Feb 05

Posts: 36

Credit: 1513521051

RAC: 598631

Thanks for the help. The

18 Dec 2016 14:52:46 UTC

Message 153028

(moderation:

)

Thanks for the help.

The 1.15's run 3 at a time. It's the 1.16's that don't want to run 3 at a time, only 1 at a time.

I updated BOINC Manager and Drivers.

I'll sit back and see if anything changes before I start adding code.

Nothing running High priority, or any restrictions.

Thanks

Zalster

Joined: 26 Nov 13

Posts: 3117

Credit: 4050672230

RAC: 0

Keith Myers wrote:Keith Myers

18 Dec 2016 16:24:42 UTC

Message 153032 in response to message 153000

(moderation:

)

Keith Myers wrote:

Can anyone help me decode the log messages?

Some would love to have that problem Keith...lol..

I have over 900 each on 2 different machines. Not that I'm complaining. It makes sure I have a steady supply. Seems that error that no one has been able to pin down continues to send me more than I can finish by the deadline but that is ok with me. At least the rooms are keeping heated by the GPUs now....

Observations on FGRPB1 1.16 for Windows

Forums › Cruncher's Corner

Comment viewing options

Forums › Cruncher's Corner