Gravitational Wave search O1 all-sky tuning (O1AS20-100T)

Mumak
Joined: 26 Feb 13
Posts: 325
Credit: 1,869,250,174
RAC: 693,632

RE: Mumak hi, I missed

Quote:


Mumak hi, I missed this post earlier.

My i5-4690K (yes it is a K and yes i confess, it is a little over clocked) had it's first panic after stopping all the usual GPU tasks and running 4 AVX tasks. The CPU temp reported 70C just before the panic, which is close to the limit suggested here Intel i5-4690K

You mentioned some generations with more likely a problem, can you give a reference list?

That 72 C is the TCASE max for the CPU, so the temperature on top of the IHS. You haven't said which temperature was 70C - the external diode temperature (TCASE) which is usually measured by a mainboard SIO/LPC chip, or the internal Digital Thermal Sensor (DTS) value. TCASE is not relevant for CPU thermal management, only DTS which usually starts to throttle the CPU ~90-100 C depending on model.
Note that even if you noticed 70 C before, it doesn't have to mean much since when you put load on the CPU the temps can spike within a couple milliseconds.
I'd suggest you to run HWiNFO (what else ;-)) and check+log temperatures, Distance to Tj,max, throttling status and CPU power usage. But you seem to be running Linux machine, so I'm not sure what to recommend there...

-----

floyd
floyd
Joined: 12 Sep 11
Posts: 133
Credit: 186,323,751
RAC: 0

RE: I'm not sure i

Quote:
I'm not sure i understand - There are three 64 bit Linux apps (typo maybe)?
Quote:

[pre]Linux running on an AMD x86_64 or Intel EM64T CPU 1.04 19 Feb 2016, 16:30:35 UTC
Linux running on an AMD x86_64 or Intel EM64T CPU 1.04 (AVX) 19 Feb 2016, 16:30:35 UTC
Linux running on an AMD x86_64 or Intel EM64T CPU 1.04 (X64) 19 Feb 2016, 16:53:47 UTC[/pre]

The first and third apps I am referring to.


We can't browse the download directory anymore but for other applications in the past those versions featured identical binaries with possibly different configuration.

DanNeely
DanNeely
Joined: 4 Sep 05
Posts: 1,315
Credit: 1,758,705,041
RAC: 932,138

I'm not sure if it's

I'm not sure if it's intentional or not, but the shorter deadlines on these tasks have resulted in my client running them in preference to any Fermi tasks that haven't been issued at least a week ago.

Mark Henderson
Mark Henderson
Joined: 19 Feb 05
Posts: 34
Credit: 34,279,992
RAC: 759

Speedstep maybe, disable in

Speedstep maybe, disable in bios

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5,241
Credit: 44,874,328,449
RAC: 37,039,575

RE: I'm not sure i

Quote:
I'm not sure i understand - There are three 64 bit Linux apps (typo maybe)?


Of course there are :-). Unfortunately, this whole conversation is not about the three Linux apps. My original message was in response to a comment about how slow a Windows app seemed to be. That response was directed at Daniels_Parents who has Windows 7 64bit but is running what seems to be a 32bit app (which is fine but perhaps slow).

For Windows, there are only two apps available, the 32bit app and a 64bit app labeled (AVX) - these two.

Windows/x86 							1.04 (SSE2)
Microsoft Windows running on an AMD x86_64 or Intel EM64T CPU 	1.04 (AVX)

I was just surmising that a non-AVX version of the Windows 64bit app seemed to be missing. I have no idea if the (AVX) Windows app is supposed to handle both types of 64bit CPUs (with or without AVX instructions). Perhaps that might be why. If so, why didn't the DP machine get the 64bit (AVX) app?

Cheers,
Gary.

Robert
Robert
Joined: 5 Nov 05
Posts: 42
Credit: 296,738,268
RAC: 13,003

RE: Why some hosts take 6h

Quote:
Why some hosts take 6h and some 24h we don't know yet. I will dig into that when there are more successful results available to make a proper statistic.

I think I have a clue on the long runtimes, hyperthreading.

I decided to recheck that running more jobs than there are real cores resulted in additional work output. I setup my i7-4770K with Windows 7 that has hyperthreading enabled in the BIOS thus showing 8 cores when one looks at my computers to run 8 GW v1.04 AVX jobs. This is an Intel architecture so this particular machine has 4 real cores and 2 threads per core for a total of 8 cores. I’ve disabled GPU processing for this test.

I’ve been running 3 GW jobs with 1 core free to support two GPU jobs on a single AMD 7970. Hyperthreading was enabled but basically unused. The AVX jobs have been completing on average in 23,400 seconds (or 6.5 hours). I’m sitting here staring at the BOINC task list and all 8 jobs for this hyperthreading test just hit the 50% complete mark in 7.75 hours which projects out to 15.5 hours to complete.

In a nutshell that means running more jobs than real cores means reduced work output, certainly not what we have seen in the past.

This is but 1 data point, that is why I’m putting this out so that others can try with their setups. We are seeing different AVX results from the Linux side and there are a variety of AVX capabilities, including AMD’s different take on virtual cores. Not to mention SSE2 jobs and 32 / 64 bit executables.

Let me try to answer questions that are bound to come up. Memory was not an issue, Windows task manager has continued to report a steady 2.9 GB of RAM in use. CPU temp’s are running consistent with the previous setup and CPU frequencies seem to be holding steady at the same 3.9 GHz, so I don’t think it is a throttling issue. The machine has an SSD plus I’ve had check pointing set to 10 minute intervals for some time, so should be no disk issues. Also the recent addition of version 1.4, I ran a batch of 1.04 jobs on my other machine and saw no real difference in runtime with version 1.03.

Generally I would wait for my machine to report back to the server but at this rate it is going to be very early in the morning and I’m hoping others can setup tests to verify what I’m seeing. I know everyone is anxious to get this out of beta test.

Zalster
Zalster
Joined: 26 Nov 13
Posts: 3,059
Credit: 3,341,604,897
RAC: 0

RE: In a nutshell that

Quote:

In a nutshell that means running more jobs than real cores means reduced work output,

I'm going to agree and disagree with Robert.

Agree that the number of actual core vs hypterthreaded is going to affect how fast the GW work gets processed.

I run 6 core HT to 12 as well as a 8 HT to 16.

It's better to run only as many actual core than HT ones. Running with HT will significantly increase the time to complete.

GPU loads will also prolong time to complete as well but not as drastically.

Quote:
certainly not what we have seen in the past.

Disagree with this part, I've seen it with Gamma ray searches on the CPU. So much so that I restrict how many are run while also crunching on the GPUs. It followed that the GW would behave the same way, which is why I restricted the number I was running at the beginning to only the number of actual cores.

My times now have increased (by about 35 minutes) only because I have restarted GPU work at the same time.

Mike Hewson
Mike Hewson
Moderator
Joined: 1 Dec 05
Posts: 6,126
Credit: 127,335,222
RAC: 9,352

For this host there were two

For this host there were two v1.03 tasks that recently took 44 hours ( ~ 175K seconds CPU time ) each to complete. Alas I can't show them because they have slipped out of web access timewise. In any case they did so without error. It's a Win 64 v8.1 install and is allegedly well blessed in the performance department. Compared to my Linux box ( ~ 40K seconds CPU time ) that is a four-fold difference.

Cheers, Mike.

I have made this letter longer than usual because I lack the time to make it shorter. Blaise Pascal

archae86
archae86
Joined: 6 Dec 05
Posts: 2,844
Credit: 3,391,781,923
RAC: 2,993,554

To mix in one post some

To mix in one post some comments on the "slow" results reported by daniels_parents, and the more recent comments here on hyperthreading with some data:

First, the data. The host mentioned by
Daniels_Parents
is represented as an i7 CPU 920 @ 2.67GHz. That makes it a Bloomfield quad core which in turn means it is a rather modestly modified Nehalem at heart. I believe Arthur runs this with maximum loading, which may mean 8 CPU tasks (as it is running hyperthreaded) and the support CPU tasks for two Nvidia GPUs, all elbowing each other for the resources of four physical cores and the memory resources, both dedicated and shared. An eyeball average suggests he was getting roughly 79,000 second average elapsed times on V1.03/V1.04 x86 SSE2 O1AS20 work.

Meanwhile the host was also running Parkes PMPS work on two GPUS, a GTX 560 Ti and a GTX 760

As it happens I run one host with a pretty similar CPU, which is reported as a E5620 @ 2.40GHz, while nominally a Westmere, this, too, is a lightly modified Nehalem architecturally, though it gets power efficiency advantage from running on the next generation process.

Mine, however, runs a much different load configuration. During O1AS20 testing mine was running a single BOINC CPU task, but was supporting two GPUs (one GTX 750, and one GTX 660), both running at multiplicity 2X. As with the DP machine, the executable was labelled for SSE2 exploitation, and x86 (32-bit) character. An eyeball average of elapsed time is just under 52,000 seconds. This would certainly come down some if I suspended the GPU work, as I give the GPU support tasks privileged access by both priority and affinity means.

For power efficiency reasons, I've not been running CPU BOINC jobs at all for some months, and only recently admitted a single CPU task to each of my three GPU-supporting hosts during recent beta activities. So I lack recent carefully compared productivity on current applications for hyperthreading benefit.

However I have done careful, properly controlled, comparisons in the past, and have found the throughput improvement from HT on those cases to range between a little and a lot. The one plausible reported contrary case I recall came quite some time ago when msattler at SETI reported that on his (then heavily overclocked) rig, HT cost enough in maximum clock rate for correct operation that the gain in productivity per clock was overbalanced by the loss in clock rate. My own CPU overclocking days are rather far in the past, so this result, while compelling for him, does not affect me.

If people are actually seeing HT impairing output here, I doubt the reason is a simple failure of HT itself, but perhaps a congestion in some shared resource, most likely cache or external memory. Possibly this may be confounded by mixing in the GPU support tasks so many of us run these days.

I think I can define some controlled tests to investigate a little better the effects of HT itself vs. those of congestion, but lack the interest to take the time to execute them. To reduce the impact of shared memory resource congestion, I'd first make a comparison between a single task restricted to a single core, run non-HT, with all other important active tasks banished from that core by affinity means, and not other BOINC, interactive work, and significant background tasks running. The other leg of that first comparison would enable HT, running just two tasks, constrained by affinity to the two logical instances of a single physical core, with the same constraints on other work.

A further pair of tests might help promote or denigrate the shared resource congestion notion: run 4 tasks running non-HT then 8 tasks running HT. If the 8 to 2 comparison is inferior to the 4 to 1 comparison, then congestion might reasonably be suspected.

Mike Hewson
Mike Hewson
Moderator
Joined: 1 Dec 05
Posts: 6,126
Credit: 127,335,222
RAC: 9,352

Ah. Deja vu as regards HT

Ah. Deja vu as regards HT which we looked in detail at a few years ago and had a vigorous sticky pie fight as well. The basic deduction was that 4 x 2 < 8 ( total host throughput ) and not too infrequently 4 x 2 ~ 6 or even less !! Apart from expected software/hardware overhead, as you say one still has to have minimal other resource contention to open up the throttle fully for these designs.

{ BTW my timings weren't influenced by playing Sid Meiers Civilisation V !! That uses only a couple of virtual cores IIRC. I currently use onboard graphics for this rig and do CPU-only WU's. }

Cheers, Mike.

I have made this letter longer than usual because I lack the time to make it shorter. Blaise Pascal

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.