Multi-Directed Gravitational Wave Search

Jonathan Jeckell
Jonathan Jeckell
Joined: 11 Nov 04
Posts: 114
Credit: 1341945207
RAC: 1

One of my machines, 12425652

One of my machines, 12425652 (Intel Core i3-2100 CPU running Linux 4.4.0-42-generic (Ubuntu 14.04)) has been consistently crapping out the Multi-Directed Continuous Gravitational Wave search Tuning run G v1.01 (AVX) x86_64-pc-linux-gnu with errors after only a few seconds.

Only one other machine (12057966, running Mac OS Sierra) has picked up one of these tasks, but hasn't run it yet.  Keeping my fingers crossed.

 

Sergey Kovalchuk
Sergey Kovalchuk
Joined: 22 Jan 05
Posts: 2
Credit: 4805392
RAC: 0

Two of my old host (Intel

Two of my old host (Intel P4 & DUO E2160, Linux x86_64) get only AVX version of the application with a constant error in a few seconds.

How can I choose SSE2 version?

Christian Beer
Christian Beer
Joined: 9 Feb 05
Posts: 595
Credit: 118588380
RAC: 110482

Regarding the errors we see

Regarding the errors we see right now:

There are two types. One comes from faulty datafiles where I already canceled the tasks that are affected and we are investigating if other tasks are also affected as I write.
The other type of errors are signal 8 terminations (floating point exceptions) like the ones Jonathan Jeckel and Sergey Kovalchuk experience. I tested some of the tasks on my own PC and they are working. So it does not seem to be a data problem. We will have to wait if other hosts also throw exceptions for those.

Regarding AVX and SSE2:

The app automatically downgrades to SSE2 if the computer does not have AVX capability.

Edit: I disabled the work generators until we know what other datafiles are affected. That's also the reason why the app is still in beta. The work generated so far should suffice until tomorrow.

Zalster
Zalster
Joined: 26 Nov 13
Posts: 3117
Credit: 4050672230
RAC: 0

Christian Beer wrote:As I

Christian Beer wrote:

As I wrote yesterday I just started the job generation for CasA and VelaJr. Right now we don't see an alarming rate of errors. There are several errors but only from a handful of computers and other computers are uploading successful results.

The application names can be found as usual on the application overview page but are in general not needed.

The task names have a similar structure as for the all-sky run. You can see what your computer is doing by this name. Here is a short explanation:

h1_0163.60_O1C02Cl1In1G__O1MD1TG_G3473_163.70Hz_0
|          1            |   2   |  3  |   4    |

1: locality scheduling information
2: application
3: target (CasA, VelaJr, G3473)
4: search frequency

 

Thanks Christian but what I was looking for was this

einstein_O1MD1TCV

Since I run 20 and 16 core machines, Einstein decides it wants to run as many as possible on those machines and I end up with almost 100% of all cores running. By use of a app_config.xml I can turn down how many are running and the computer become usable again.

Results are being returned but not seeing any valid yet. On the plus side, they are not erroring either.

Thanks again
Zalster

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 2139
Credit: 2752781592
RAC: 1441907

Zalster wrote:By use of a

Zalster wrote:
By use of a app_config.xml I can turn down how many are running and the computer become usable again.

Enable the <cpu_sched> diagnostic log flag from the Options menu in BOINC Manager, Advanced View (or by Ctrl+Shift+F).

As each task starts, you will see a line like

13/10/2016 17:16:04 | Einstein@Home | [cpu_sched] Starting task p2030.20141120.G192.71+00.73.S.b3s0g0.00000_544_0 using einsteinbinary_BRP4G version 152 (BRP4G-Beta-opencl-intel_gpu) in slot 4

in the event log. The section I've emphasised (immediately following the keyword 'using') is what you want to use in the <app> section of app_config.xml.

If you heed to refine it further, the plan_class needed to identify the <app_version> is also logged on the same line.

Note that the diagnostic log you need for this is plain <cpu_sched>. You wouldn't want to even see <cpu_sched_debug>.

Zalster
Zalster
Joined: 26 Nov 13
Posts: 3117
Credit: 4050672230
RAC: 0

Thanks Richard, I was able to

Thanks Richard, I was able to locate the name and listed in my last post. 

My worst fear was correct. Einstein saw my 20 core and is using all of them to crunch the Gravity waves.

 Forgive the image but I find people believe you more when they can see what you are talking about...

I've managed to decrease that down to 10 but another issue is now around and I should post that in the server thread. (Einstein has since the change ignored local preferences to limit work units to 1 days worth, I now have near 200 GW work units and no way am I going to be able to complete all of them before deadline. It was doing the same for gamma rays but those didn't have the higher level of importance that the GW do)

 

Zalster

Betreger
Betreger
Joined: 25 Feb 05
Posts: 987
Credit: 1421661902
RAC: 795881

The box I am running GW on I

The box I am running GW on I set to NNT until I see if the work I've done validates. I see no reason to waste time if the work doesn't validate. It will have run 10 of these little gems, that should be enough of a test for the app on that host.

AgentB
AgentB
Joined: 17 Mar 12
Posts: 915
Credit: 513211304
RAC: 0

Christian Beer wrote: The

Christian Beer wrote:

The other type of errors are signal 8 terminations (floating point exceptions) like the ones Jonathan Jeckel and Sergey Kovalchuk experience. I tested some of the tasks on my own PC and they are working. So it does not seem to be a data problem. We will have to wait if other hosts also throw exceptions for those.

<waves>

Thanks for the updates Christian, some more here on  Xeon(R) CPU E5-2660 CV v1.01 (AVX) with stack trace  .... https://einsteinathome.org/task/581023434

and  i5-4690K G v1.01 (AVX) https://einsteinathome.org/task/581002922

Conan
Conan
Joined: 19 Jun 05
Posts: 172
Credit: 7099171
RAC: 2415

According to the Server there

According to the Server there are plenty of work units available but I am unable to get any, getting the message that "no work is available".

What are the requirements for getting work?

What computer types?

Are they all taken and none left for me?

If I allow Gamma Ray work units I will get as many as I want.

 

Conan

 

 

 

Logforme
Logforme
Joined: 13 Aug 10
Posts: 332
Credit: 1714373961
RAC: 0

Conan_4 wrote:According to

Conan_4 wrote:
According to the Server there are plenty of work units available but I am unable to get any, getting the message that "no work is available".

Same for me. I only get the gamma-ray ones. I have tried to disable that, but then I don't get any CPU work at all.

I tried to look in the contact logs but for two of my machines I see no references to gravity wave stuff:

https://einsteinathome.org/host/12238215/log

https://einsteinathome.org/host/11707278/log

On the third machine I get "No work is available for Multi-Directed Continuous Gravitational Wave search Tuning run G", even though there seems to be lots of work available.

https://einsteinathome.org/host/7562651/log

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.