Information about the new S5 workunits

Bikeman (Heinz-Bernd Eggenstein)
Bikeman (Heinz-...
Moderator
Joined: 28 Aug 06
Posts: 3522
Credit: 692162060
RAC: 1812

To give an impression what

Message 37797 in response to message 37795

To give an impression what VTune is about:
He's a screenshot from a 20 second analysis run I did with VTune near the end of this result:

The columns have the following meaning:

1) name of a function in the einstein app, sorted in desceding order for

2) the function's share of the overall runtime of the einstein app.

3)estimation of percentage of branches predicted correctly by the CPU while in this function

4)estimation of average CPU cycles per assembly instruction processed for that function

There are tons of other metrics to choose from.

CU

BRM

Annika
Annika
Joined: 8 Aug 06
Posts: 720
Credit: 494410
RAC: 0

Okay, then I'll try to get

Okay, then I'll try to get this kind of measurement under both OSs. Maybe it'll show sth significant.

Ziran
Ziran
Joined: 26 Nov 04
Posts: 194
Credit: 446394
RAC: 1060

Have anyone had any

Have anyone had any experience with this, can they be used to check for cache misses?

AMD CodeAnalyst™ Performance Analyzer for Windows®

http://developer.amd.com/cawin.jsp windows

http://developer.amd.com/calinux.jsp linux

Then you're really interested in a subject, there is no way to avoid it. You have to read the Manual.

Bikeman (Heinz-Bernd Eggenstein)
Bikeman (Heinz-...
Moderator
Joined: 28 Aug 06
Posts: 3522
Credit: 692162060
RAC: 1812

Hi! Have an eye on the

Message 37800 in response to message 37799

Hi!

Have an eye on the relative share of cycles of the modf* functions, I suspect that this might have an impact after all.

There is no quick way to disable SSE2?

CU

BRM

Bikeman (Heinz-Bernd Eggenstein)
Bikeman (Heinz-...
Moderator
Joined: 28 Aug 06
Posts: 3522
Credit: 692162060
RAC: 1812

Hi all! I think the

Hi all!

I think the mysterious performance gap for some (not all) systems under windows is partly resolved.

As I wrote before, the math library that is used with the app under Windows, compiled with the microsoft compiler, uses SSE2 instructions for some functions, one of them, modf ,is heavily used by einstein@home.

If the library things that SSE2 is not supported on that host (e.g. at least on Pentium III, older intels, and Athlons w/o SSE2), it will fall back to a version of modf that seems to be slow as hell. Without SSE2 support, up to one third of the overall runtime can be spent on running this slow modf function and the functon it is calling in turn (modf is computing the integer and fraction parts of a floating point number, something that is numerically not as easy as it sounds).

The GLIBC math library used for the Linux app must have a radically faster implementation, even without SSE2 support.

This does not explain the poor performance for Annika's T2060 Pentium M Dual Core which DOES support SSE2, or on newer AMD CPUs. Maybe the detection of SSE2 support isn't perfect.

If this is confirmed independently by someone else, I think we can begin to beg for a new Win build with some workaround for the ******* modf. Pleeaaaaaase :-).

CU

BRM
P.S.: Profiling w/o SSE2. Consistent with a 30% ... 40% performance loss.

Annika
Annika
Joined: 8 Aug 06
Posts: 720
Credit: 494410
RAC: 0

I seem to remember that

I seem to remember that modf_p4 was in use on my box, not modf_default. I'll test it later to be sure, but that would mean SSE2 was being used, wouldn't it? Btw, my box should even support SSE3...

EDIT: This is what it looks like on my box:

Bikeman (Heinz-Bernd Eggenstein)
Bikeman (Heinz-...
Moderator
Joined: 28 Aug 06
Posts: 3522
Credit: 692162060
RAC: 1812

RE: I seem to remember that

Message 37803 in response to message 37802

Quote:
I seem to remember that modf_p4 was in use on my box, not modf_default. I'll test it later to be sure, but that would mean SSE2 was being used, wouldn't it? Btw, my box should even support SSE3...

Yup, if you see modf_pentium4 in your profiling output, the lib will use the faster modf variant, which would mean that your notebook has a separate problem. Maybe it's grafics related again. In my first profiling output you can see functions that seem to deal with "drawing" lines and circles, even tho no graphics were displayed, maybe there's some pre-processing going on for the graphics. It's only a very small fraction of overall runtime on my system, maybe it's more on your's?

CU

BRM

Annika
Annika
Joined: 8 Aug 06
Posts: 720
Credit: 494410
RAC: 0

I added a screenie to my last

I added a screenie to my last post... I can see nothing to explain my performance loss, can you?

Bikeman (Heinz-Bernd Eggenstein)
Bikeman (Heinz-...
Moderator
Joined: 28 Aug 06
Posts: 3522
Credit: 692162060
RAC: 1812

RE: I added a screenie to

Message 37805 in response to message 37804

Quote:
I added a screenie to my last post... I can see nothing to explain my performance loss, can you?

No :-(, looks completely normal and very similar to what my Pent M is doing. Still, the overall performance is poor so there must be something wrong.

Since it's a dedicated mobile processor, the next suspect on my list would be power management. VTune allows to detect transitions of frequency and voltage as well as thermal trip events.

CU

BRM

Annika
Annika
Joined: 8 Aug 06
Posts: 720
Credit: 494410
RAC: 0

I doubt it, since the box is

I doubt it, since the box is doing okay on all other BOINC projects. Besides I'm using Notebook Hardware Control (dunno if you know it, very nice tool imo) and that always shows constant CPU voltage, clock speed and temperature when I have the AC cord plugged in and the notebook doing BOINC.
Still, it might be worth a try (at least lacking better ideas). But first I'm going to take some samplings under Linux and compare the two.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.