Information about the new S5 workunits

Bikeman (Heinz-Bernd Eggenstein)
Bikeman (Heinz-...
Moderator
Joined: 28 Aug 06
Posts: 3522
Credit: 692302872
RAC: 1648

RE: I doubt it, since the

Message 37807 in response to message 37806

Quote:
I doubt it, since the box is doing okay on all other BOINC projects. Besides I'm using Notebook Hardware Control (dunno if you know it, very nice tool imo) and that always shows constant CPU voltage, clock speed and temperature when I have the AC cord plugged in and the notebook doing BOINC.
Still, it might be worth a try (at least lacking better ideas). But first I'm going to take some samplings under Linux and compare the two.

I use notebook hardware contral myself, and like it a lot. I've set the power profile to "max performance" on AC and dynamic switching on bat. Seems to work perfect for me.

I'm curious about your Linux results. I guess the "cycles pe retired instruction" metric would be cool as well to have in the output, jsut as a reality check to make sure we are not overlooking something.

BTW, is the other core idle while you test this? Wondering about core affinity... Not that Einstein is somehow hopping between the cores on win.

CU

BRM

Annika
Annika
Joined: 8 Aug 06
Posts: 720
Credit: 494410
RAC: 0

RE: [I'm curious about your

Message 37808 in response to message 37807

Quote:
[I'm curious about your Linux results. I guess the "cycles pe retired instruction" metric would be cool as well to have in the output, jsut as a reality check to make sure we are not overlooking something.


Yeah, as soon as I have figured out how to actually use the thing... the howto is not really helpful; it seems to point to directories that don't actually exist...

Quote:
BTW, is the other core idle while you test this? Wondering about core affinity... Not that Einstein is somehow hopping between the cores on win.


Well, as the other core didn't have any Einstein WUs left under Windows (I want to let it run dry, planning to use Linux more) I used it to make Rainbow Tables... if you think that could be an issue I could re-run the tests while the other core is idle...

CU Annika

Bikeman (Heinz-Bernd Eggenstein)
Bikeman (Heinz-...
Moderator
Joined: 28 Aug 06
Posts: 3522
Credit: 692302872
RAC: 1648

Might be a good idea to have

Might be a good idea to have the other core idle.

I haven't een tried to install VTune under Linux, will do that next. Nice tool anyway and free for non-comm use under Linux.

Maybe Akos will join us here and shed some light on this question, maybe he knows all the answers already.

CU

BRM

Akos Fekete
Akos Fekete
Joined: 13 Nov 05
Posts: 561
Credit: 4527270
RAC: 0

RE: Maybe Akos will join us

Message 37810 in response to message 37809

Quote:
Maybe Akos will join us here and shed some light on this question, maybe he knows all the answers already.


The SSE2 routines are parts of a (math) library. These parts don't run on some SSE2 capable CPUs ( not only AMD ) because of a wrong CPU identification.

Bikeman (Heinz-Bernd Eggenstein)
Bikeman (Heinz-...
Moderator
Joined: 28 Aug 06
Posts: 3522
Credit: 692302872
RAC: 1648

RE: RE: Maybe Akos will

Message 37811 in response to message 37810

Quote:
Quote:
Maybe Akos will join us here and shed some light on this question, maybe he knows all the answers already.

The SSE2 routines are parts of a (math) library. These parts don't run on some SSE2 capable CPUs ( not only AMD ) because of a wrong CPU identification.

I guess this could be corrected rather easily. But then those hosts that are really not SSE2 capable (but SSE capable) will still have a problem under windows, and even more so after the main hot-loop will be SSE optimized by you (or will it be SSE2 optimzed??). Because the fallback modf function will then be the new bottleck. Right?

And for Annika's T2060 Core Duo there must be a different problem because it definitely uses the SSE2 routines and still shows poor performance compared to the when run under Linux. Strange.

CU

BRM

M. Schmitt
M. Schmitt
Joined: 27 Jun 05
Posts: 478
Credit: 15872262
RAC: 0

RE: The SSE2 routines are

Message 37812 in response to message 37810

Quote:
The SSE2 routines are parts of a (math) library. These parts don't run on some SSE2 capable CPUs ( not only AMD ) because of a wrong CPU identification.

LOL
And this for sure is only by mistake. ;-)
Well this is afaik known form the(some) Intel compilers, but Bernd uses the Microsoft Compiler. Anyway this really makes me wonder.

Btw. what the hell did you do with your X2 3600+ respectively what app is running there?

28,387.48 sec for 516.79 credits(pending)!!!

cu,
Michael

Bikeman (Heinz-Bernd Eggenstein)
Bikeman (Heinz-...
Moderator
Joined: 28 Aug 06
Posts: 3522
Credit: 692302872
RAC: 1648

RE: RE: The SSE2 routines

Message 37813 in response to message 37812

Quote:
Quote:
The SSE2 routines are parts of a (math) library. These parts don't run on some SSE2 capable CPUs ( not only AMD ) because of a wrong CPU identification.

LOL
And this for sure is only by mistake. ;-)
Well this is afaik known form the(some) Intel compilers, but Bernd uses the Microsoft Compiler. Anyway this really makes me wonder.

Btw. what the hell did you do with your X2 3600+ respectively what app is running there?

28,387.48 sec for 516.79 credits(pending)!!!

cu,
Michael

Hi Michael!

See Akos' message here

Quote:

Quote:

My Core2 runs an SSE optimised version of XLALComputeFaFb subroutine. It shows about 70% performance improvement. Bernd will implement it into the source code and compile for all x86 based platforms.

Excellent news, isn't it?

M. Schmitt
M. Schmitt
Joined: 27 Jun 05
Posts: 478
Credit: 15872262
RAC: 0

RE: And for Annika's T2060

Message 37814 in response to message 37811

Quote:

And for Annika's T2060 Core Duo there must be a different problem because it definitely uses the SSE2 routines and still shows poor performance compared to the when run under Linux. Strange.

CU

BRM

Some runtime differences can be explained by different WUs. One of my VMs jumped up form ~22 cr/h to ~28 cr/h. I have never before seen such big differences(usualy �±5%), so it might be just a problem with the time measurement of VMWare Player.

cu,
Michael

M. Schmitt
M. Schmitt
Joined: 27 Jun 05
Posts: 478
Credit: 15872262
RAC: 0

RE: Hi Michael! See Akos'

Message 37815 in response to message 37813

Quote:

Hi Michael!

See Akos' message here

Quote:

Quote:

My Core2 runs an SSE optimised version of XLALComputeFaFb subroutine. It shows about 70% performance improvement. Bernd will implement it into the source code and compile for all x86 based platforms.

Excellent news, isn't it?


Hi Bikeman,

yeah great news and I had already read that, but the difference in Speed on his X2 is far more than 70 %. Even if I add the "AMD penalty", this damn thing is still much faster.

His credits/hour rose from ~12 to ~65!!!

Running Windows and getting about 12 cr/h on that X2 3600+ is already pretty much, 'cause my fathers X2 5000+(Win) getts about 14. So Akos's host is probably oc'd. His results are not validated yet, but if they are going to be, then the speed increase will be very much bigger. :-)

cu,
Michael

Annika
Annika
Joined: 8 Aug 06
Posts: 720
Credit: 494410
RAC: 0

I don't really think so. From

I don't really think so. From what I heard from other crunchers C/H is normally fairly constant, and all WUs give about equal credit in relation to crunching time. Maybe it really has to do with running BOINC in a VM... never tried it out but it sounds plausible to me.
Really great news about the new science app :-D very nice work from Akos (again). Am I correct that the performance increase will be for all platforms?

EDIT: I was referring to the post about different WU sizes and differences in C/H.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.