Times (Elapsed/CPU) for BRP6-Beta-cuda55 compared to BRP6-cuda32 - Results and Discussion Thread

archae86
archae86
Joined: 6 Dec 05
Posts: 3145
Credit: 7059444931
RAC: 1248010

AgentB wrote:archae86

AgentB wrote:
archae86 wrote:
There is a hint of modes perhaps suggesting my combination of the three hosts is a bit rough and ready

This chart also seems to show three distinct 1.54 clusters, are they one for each host or just a co-incidence?


My comment was about combining the three 750s, not about the 760. In a couple of days, with more results, I could produce separate graphs for each of the three 750s. But I think it will be more interesting to see data from more different hosts than to torture the data from these five hosts more. The basic message on these five seems clear enough already.

AgentB
AgentB
Joined: 17 Mar 12
Posts: 915
Credit: 513211304
RAC: 0

RE: the kink near the 50th

Quote:
the kink near the 50th percentile of AgentB's 1.54 data plot probably arises from mismatch of the two cards on his host, which on his account differ in the grade of PCIe service employed.

OK that is a good spot, and i had noticed something unusual when the first tasks finished and didn´t think to dig any further, this reminded me.

GPU1(PCIe-x4) finishes its tasks FASTER than GPU0(PCIex16)! The cards are identical except GPU0 support a monitor.

I have never seen this before, but i need to look closer at the 1.52 to see if GPU0 has started to fade in the warm weather (typically running at 70C compared with 66 for GPU1)

archae86
archae86
Joined: 6 Dec 05
Posts: 3145
Credit: 7059444931
RAC: 1248010

I have data in similar form

I have data in similar form for two more Fermi hosts, both showing very little productivity impact from the 1.52 to 1.54 change. While both are apparently Fermi generation chips, one uses the GF104, and the other the GF110, which from web sources appear to have been mid-life enhanced Fermi offerings.

A GTX570 belonging to MountKidd show a nominal productivity improvement just over 2%

A GTX 460 belonging to Przemek Wisialski actually showed a degradation of about 2%. One possible reason for different results than on the GTX 460s belonging to AgentB posted here is that this 460 appears to be running 1X (my inference from completion times). Beware the scale--the minimum to maximum range on this graph is a small percentage compared to the other graphs, all of which appear to have recorded data from systems running more than one Parkes job at a time per GPU.

While the data in hand is rather slight for such grand conclusions, I'll still say that as of now the Linux version 1.54 distribution of CUDA55 Parkes PMPS work appears to give a break-even result compared to 1.52 CUDA32 on Fermi parts, with about a 25% improvement on Kepler and Maxwell1 parts. I'm not aware of observations on any Maxwell2 parts, nor on older generation than Fermi parts.

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 2141
Credit: 2773527019
RAC: 865579

RE: ... nor on older

Quote:
... nor on older generation than Fermi parts.


I have a working 9800GT which I still mount up occasionally for testing, and which can handle up to cuda65. Though I won't be able to test this app until there's a Windows version.

Its most recent outing has been at SETI Beta. There, the performance sweetspot is a rough tie between cuda23 and cuda32: cuda42 is worse, and cuda50 is very poor indeed. OpenCL has a different performance curve, and is better than CUDA for some types of task, but in general worse.

Manuel Palacios
Manuel Palacios
Joined: 18 Jan 05
Posts: 40
Credit: 224259334
RAC: 0

Archae86 and others, I

Archae86 and others,

I have been following this thread closely with great interest. Thank you for collecting and sifting through this preliminary data, Archae86, and displaying it here for us. I would be more than happy to test some of these workunits on my GTX970's to see what sort of improvement the cuda55 app generates for Maxwell2 architecture cards. The preliminary observations do tend to show a very significant improvement in processing times, all things considered.

I have set my cache to remain very low, so that the turnover to 1.54 is quick once there is a Windows app available.

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5849
Credit: 110011547112
RAC: 23458194

All the posts earlier than

All the posts earlier than this one have been moved here from "Technical News" - see opening post in this new thread for the explanation.

I don't have any Maxwell series GPUs. I have a couple of GTX550Tis, a bunch of GTX650s and a single GTX650Ti. The 550Tis are showing a modest improvement. I haven't really had time to look at the 650s but I've just spent a bit of time getting the available data so far (just 23 results) for the 650Ti into LibreOffice. I've produced the stats and graphs along the lines of what I did for the BRP5 -> BRP6 transition but I want to play around and see if I can also do the 'probplots' that Peter has produced in his excellent series of posts. So far, the 650Ti is showing the 20-25% improvement that Bernd suggested, so it looks like Kepler GPUs like this one (or better) will really benefit.

I'm late for other commitments right now so it will be a job for tomorrow or Sunday. There'll be some more results by then - 23 isn't really enough to work on anyway.

Cheers,
Gary.

archae86
archae86
Joined: 6 Dec 05
Posts: 3145
Credit: 7059444931
RAC: 1248010

I noticed this morning that

I noticed this morning that one of my Windows 7 hosts received 1.54 CUDA55 work, so those of us hoping for a Windows version are in luck now.

I'm working to download a supply for all three candidate hosts, then intend to suspend CUDA32 work to get a faster look. An extra complication for performance comparison in my case is that I have been using some thermal throttling, so I've turned that off, but will need to process some CUDA32 work without throttling to get a proper comparison population, assuming initial CUDA55 Windows success here.

Before hitting the "post" button I moved on to suspend my pending CUDA32 word and one running CUDA32 task on one host. Sadly the first CUDA55 job promptly errored out, and as I had failed to take the precaution of suspending all save one of the CUDA55s, another 13 errored out before I stopped things.

The exit status shows as -1073741515 (0xffffffffc0000135)

I used a more cautious initial trial technique on my other two Windows 7 GPU hosts, allowing a single CUDA55 task, and in both of those cases that task also errored out promptly, also with exit status -1073741515 (0xffffffffc0000135).

While I certainly agree with the moderation action moving my performance comparison results to this thread, I've thought this initial error result worthy of a short post in the Technical News thread.

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 2141
Credit: 2773527019
RAC: 865579

RE: I noticed this morning

Quote:

I noticed this morning that one of my Windows 7 hosts received 1.54 CUDA55 work, so those of us hoping for a Windows version are in luck now.

I'm working to download a supply for all three candidate hosts, then intend to suspend CUDA32 work to get a faster look. An extra complication for performance comparison in my case is that I have been using some thermal throttling, so I've turned that off, but will need to process some CUDA32 work without throttling to get a proper comparison population, assuming initial CUDA55 Windows success here.

Before hitting the "post" button I moved on to suspend my pending CUDA32 word and one running CUDA32 task on one host. Sadly the first CUDA55 job promptly errored out, and as I had failed to take the precaution of suspending all save one of the CUDA55s, another 13 errored out before I stopped things.

The exit status shows as -1073741515 (0xffffffffc0000135)

I used a more cautious initial trial technique on my other two Windows 7 GPU hosts, allowing a single CUDA55 task, and in both of those cases that task also errored out promptly, also with exit status -1073741515 (0xffffffffc0000135).

While I certainly agree with the moderation action moving my performance comparison results to this thread, I've thought this initial error result worthy of a short post in the Technical News thread.


Could you post, or PM me, the segment of client_state.xml referencing the v1.54 Beta work, please?

Error code 0xc0000135 (as it's usually written) means "The application failed to initialize properly", and that's usually because of a missing DLL. I'd guess a problem with the CUDA runtime files in this case. It should be possible to test a bit further by specifying the correct files in an app_info.xml file.

archae86
archae86
Joined: 6 Dec 05
Posts: 3145
Credit: 7059444931
RAC: 1248010

RE: Could you post, or PM

Quote:

Could you post, or PM me, the segment of client_state.xml referencing the v1.54 Beta work, please?

Error code 0xc0000135 (as it's usually written) means "The application failed to initialize properly", and that's usually because of a missing DLL. I'd guess a problem with the CUDA runtime files in this case. It should be possible to test a bit further by specifying the correct files in an app_info.xml file.

Is this what you are looking for?
[pre]
einsteinbinary_BRP6
154
windows_x86_64
0.200000
1.000000
37487392642.033318
BRP6-Beta-cuda55
7.2.4

einsteinbinary_BRP6_1.54_windows_x86_64__BRP6-Beta-cuda55.exe



cudart_xp64_55_22.dll
cudart64_55.dll



cufft_xp64_55_22.dll
cufft64_55.dll



einsteinbinary_BRP6_1.54_windows_x86_64__BRP6-Beta-cuda55.exe-db.dev
db.dev



einsteinbinary_BRP6_1.54_windows_x86_64__BRP6-Beta-cuda55.exe-dbhs.dev
dbhs.dev



einsteinbinary_BRP4_1.00_graphics_windows_intelx86.exe
graphics_app


EULA.txt
EULA.txt


NVIDIA
0.500000

314572800.000000

[/pre]
Sadly I aborted the tasks I had downloaded, and as Bernd's post suggests further downloads are inhibited, I probably can't be the one to test your suggestions.

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 2141
Credit: 2773527019
RAC: 865579

Well, that's allowed me to

Well, that's allowed me to download the executable, and Dependency Walker is happy with the CUDA DLL naming - but it is showing an unresolved dependency on LIBWINPTHREAD-1.DLL. einsteinbinary_BRP6_1.52_windows_intelx86__BRP6-Beta-cuda32-nv301.exe didn't need that.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.