Observations on FGRPB1 1.15 for Windows

Der Mann mit der Ledertasche
Der Mann mit de...
Joined: 12 Dec 05
Posts: 151
Credit: 302594178
RAC: 0

I totally agree with you. If

I totally agree with you.

If the Project isn't able to provide us usefull apps until tomorrow, I will abort all WU's and leave this Project.

To much Trouble in the past 3 month.

BR

Der Mann mit der Ledertasche

Greetings from the North

Christian Beer
Christian Beer
Joined: 9 Feb 05
Posts: 595
Credit: 188154612
RAC: 332884

Please bear in mind that the

Please bear in mind that the FGRPB1G GPU application for Windows is currently still marked as Beta because it doesn't utilize the GPU fully (as you noticed). A lot of the things that the Linux app is doing on the GPU are still done on the CPU on Windows. Bernd is working on this and this has his top priority.

ravenigma
ravenigma
Joined: 20 Aug 10
Posts: 69
Credit: 80558758
RAC: 320

My setup: OS:  Win 7

My setup:

OS:  Win 7 x64

CPU: i7-4790K (4.2GHz)

GPU: GTX 1080 (1708MHz core, 4513MHz memory)

I have been running two tasks concurrently since I started receiving these. The first two I ran were actually a combo of 1 FGRPB1G and 1 GPUGrid task and these only took a few minutes longer to complete than all the others which have been two of these tasks together. I am averaging a little over 2.5 hours per task. 

For now I will disable the option in my account to run Beta apps. I'll keep an watch on this thread and the one in the News section regarding this app, though, and re-enable when the next Beta version is released.

Mad_Max
Mad_Max
Joined: 2 Jan 10
Posts: 154
Credit: 2211514758
RAC: 323031

Gary Roberts wrote: As an

Gary Roberts wrote:


As an example, if you had at least a 2GB card (based on Linux experience - may be different for Windows), you could try running 2 GPU tasks supported by 1 CPU core (instead of the default 2)  by using the following app_config.xml file, placed in the Einstein project directory.  Note that I'm posting this from a borrowed computer in a hospital room so it's from (possibly faulty) memory.  Check the above link for documentation on app_config.xml

<app_config>
    <app>
        <name>hsgamma_FGRPB1G</name>
        <gpu_versions>
            <gpu_usage>0.5</gpu_usage>
            <cpu_usage>0.5</cpu_usage>
       </gpu_versions>
    </app>
</app_config>

If 0.5 CPUs per GPU task is not enough CPU support, you could try (with sufficient GPU memory) running 3 GPU tasks supported by 2 CPU cores.  That would effectively reserve 0.67 CPUs per GPU task.  To do this, just set gpu_usage to 0.33 and cpu_usage to 0.67 in the above example, followed by a 're-read config files' in BOINC Manager.  Please realise that I don't have access to a Windows machine to try any of this so anyone following this will need to experiment to find the optimum conditions.  It may well be that one core per GPU task really is the best option anyway.


Yes, it is different for Windows. At least for current windows version - i monitor GPU VRAM consumption and it is about 300 MB per hsgamma_FGRPB1G taks. So 1GB card can safely run 2 tasks in parallel and 2GB cards can run 4-5 tasks. I thinks windows VRAM usage is much less due to the fact that only part of computations transferred to GPU compared to *nix and Mac versions.
Default one CPU core per GPU task really is far from optimal. At least for AMD GPUs (may be different for NVidia).
CPU load jumps in 30-100% range. I did few tests and looks like optimum point is 0.5 or 0.67 CPU cores per taks (2 WU one 1 CPU core or 3 WUs on 2 CPU cores)
With 4 WUs per 1 GPU running in parallel (and 2 CPU cores to support CPU part of computation) I was able to achieve ~90% GPU load on HD 7870 2 GB (1280 shaders @ 1100 Mhz).

But even with such optimizations current app is very slow and inefficient.

cliff
cliff
Joined: 15 Feb 12
Posts: 176
Credit: 283452444
RAC: 0

Hi Christian, I'm happy to

Hi Christian,

I'm happy to know its being worked on, I don't like to abort tasks because they wont even start before the deadline.

When the app is sorted out I'll be happy to run it if it actually used the GPU properly and doesn't need 1 x CPU,

as you are probably aware I run 3 projects here E@H being one of them, and prior to the current windows app everything ran pretty well here. Hopefully it will do so again when the app is fixed and updated.

I'll monitor the forums to see when its ok to remove NNT.

Regards,

Cliff,

Been there, Done that, Still no damm T Shirt.

Betreger
Betreger
Joined: 25 Feb 05
Posts: 992
Credit: 1587349079
RAC: 752346

Cliff I was going to make

Cliff I was going to make that comment but you beat me to it. I'll run these thru Friday and if it's not fixed I'll find something else for my GPU's to do. I'll keep an eye on the boards to see when it will be productive for me to come back.

Richie
Richie
Joined: 7 Mar 14
Posts: 656
Credit: 1702989778
RAC: 0

GTX 780 + Q9550 @ 3.7GHz,

GTX 780 (347.88) + Intel Q9550 @ 3.7GHz, running 1 task : 13600s (3h 47min)

GTX 760 (376.33), 2 cards, Intel X56xx @ 4GHz, running 3 tasks per card : 14600s (4h 03min)

AMD R9 270X (16.12.1), Intel X56xx @ 4GHz, running 1 task : 20400s (5h 40min)


I think DRAM speed and overall memory speed (motherboard + CPU) has a notable impact on completion times with these tasks.

System with Q9550 has the oldest and slowest DDR2, but that CPU has about the same power to deliver for 1 GPU task as the others. If I compare my observations with BRP4G tasks with these completion times, I believe if it wasn't the system memory speed thing having an effect here, then that difference between GTX 780 and GTX 760 completion times would be easily wider.

The system in the middle has fastest DRAM of those three. I'm not sure if Nvidia and AMD OpenCL can be compared at all, but I believe also those two hosts with DDR3 memory could theoretically end up with much more similar completion times with each other, if they had same DRAM speed. That's because those two GPU models should have pretty much same computational power (that AMD could be even the fastest of those two),

cliff
cliff
Joined: 15 Feb 12
Posts: 176
Credit: 283452444
RAC: 0

Hi Betreger Betreger

Hi Betreger

Betreger wrote:
Cliff I was going to make that comment but you beat me to it. I'll run these thru Friday and if it's not fixed I'll find something else for my GPU's to do. I'll keep an eye on the boards to see when it will be productive for me to come back.

Well now and then I manage to get in a goodish response:-)

Trouble is I don't think that a lot of folks realise what the crunchers put into projects, its not just time and cpu/gpu cycles, its cost of kit, plus cost of the electricity used to run the kit, and the amount of money paid out for upgrades etc.

I've spent far more this qtr on electricity bills  than I'd planned on [extra £256] and also 2 2nd hand intel based workstations at very near £900. I find I cant actually afford another bill of over £200, so I'm trying to keep my computers offline until after midnight when its a lower rate. That said I still need to power up 1 to do the usual web prowling or to read an e-book or watch a video. Local cable TV is a wee bit of a joke, endless repeats dating back to the 1930's 40'2 etc  or if someone in the acting community dies, their films proliferate across both DTV and Cable, and after having already seen almost all of then at least 3 times the PC becomes almost irresistible:-)

Well that's got that mega whinge off my keyboard:-)

Regards

 

Cliff,

Been there, Done that, Still no damm T Shirt.

archae86
archae86
Joined: 6 Dec 05
Posts: 3157
Credit: 7219544931
RAC: 982253

As Bernd has posted in the

As Bernd has posted in the Technical News forum, a new Windows64 version 1.16 has superseded 1.15.

Initial reports and my own observations show completion times with 1.16 to be very greatly reduced compared to 1.15.

I suggest that users consider aborting 1.15 tasks in their cache.  Ordinarily this is considered a bit anti-social, but here the loss of host productivity seems to warrant it.  A bit of initial caution may be warranted, perhaps by suspending 1.15 tasks until some assurance of 1.16 success on one's own host is found.

A word of caution: the progress indication is pretty misleading, with initial progress suggesting only modest improvement.  But unlike 1.16 which greatly slowed in reported rate of progress toward the end, some of us have observed 1.16 to leap from about 13% complete to finishing.  On the other hand I have some 1.16 tasks still running at current reported progress over 13%, so this may not be consistent either.

My most productive host was consuming about 9400 elapsed time seconds to complete a 1.15 task.  The first two 1.16 completion took 495 and 839 seconds, and one of those has validated.

archae86
archae86
Joined: 6 Dec 05
Posts: 3157
Credit: 7219544931
RAC: 982253

While GPU utilization is way

While GPU utilization is way up compared to 1.15, 1.16 GPU utilization is still pretty low when running 1X.  If you have sufficient CPU cores to provide adequate support, I suspect running somewhat higher X will help more than usual.  But the CPU support required while less than the virtually 100% needed for 1.15, is still very high, so a bit of caution may be indicated.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.