Gravitational Wave search O2 Multi-Directional ("O2MD1")

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4267
Credit: 244930831
RAC: 16511
Topic 219700

Hi!

Today the next search for Gravitational Waves has been launched on Einstein@Home. Scientifically it's similar to the previous "Multi-Directed serach" and will basically aim for the same targets (G374.3, CasA, VelaJr) but use the more sensitive data from LIGO's second observation run "O2".

Technically we will start this as a "CPU-only" run, and will continue to validate our GW GPU application with the results from the CPU versions.

There is a long weekend ahead in Germany where we have very limited resources for watching over this new run. So until Monday:

- only rather few workunits will be available

- validation will be started on Monday on the results that have been reported until then

- no GPU App versions (on top of the problem that is fixed in the 1.09 O2AS App version we found another possible one in the code that certainly won't affect O2AS calculation, but might O2MD1 - we need to check this first)

BM

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4267
Credit: 244930831
RAC: 16511

Update:- the possible

Update:

- Validation has started and looks good so far.

- The possible problem in the GPU app code will affect the O2MD1 results, so we'll need to develop a fix or workaround and build new App binaries.

- Most of the client errors we got back so far originate from the old 'compatibility' Linux Apps running on new (libc >= glibc 2.15) systems, producing a segfaut. As the libc version is something we can't automatically detect before running an app, for this case we added a project-specific preference ("Run Linux app versions built with LIBC 2.15"). However, you manually had to opt-in for that to work. This was ok as long as the hosts with newer libcs were a rare minority. But now they aren't anymore. Furthermore the GW search is still the most demanding of our searches in terms of memory and computation time, such that older hosts (that would run older Linux systems) can hardly finish these workunits within the deadline. So bottom line: we'll drop compatibility for pre-libc 2.15 Linux hosts in O2MD1 alltogether, and make the "LIBC215" Linux app the only Linux App. People that run older Linux systems and have trouble with the new Apps (segfaults etc.) should de-select the "O2MD1" application. FGRP5 (Gamma-Ray pulsar search) will still run on their machines.

BM

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4267
Credit: 244930831
RAC: 16511

Another update: There are

Another update:

There are now GPU App versions for O2MD1. Preliminary tests show that the speedup compared to CPU versions is even larger in O2MD1 than it was in O2AS. Internal tests show pretty good validation, but can't cover more than a few data points (workunits).

BM

robl
robl
Joined: 2 Jan 13
Posts: 1709
Credit: 1454483596
RAC: 8522

Bernd Machenschalk

Bernd Machenschalk wrote:

Another update:

There are now GPU App versions for O2MD1. Preliminary tests show that the speedup compared to CPU versions is even larger in O2MD1 than it was in O2AS. Internal tests show pretty good validation, but can't cover more than a few data points (workunits).

Have the GPU versions been released to the general community?  If so I am not seeing them.

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4267
Credit: 244930831
RAC: 16511

We found a problem in the app

We found a problem in the app code that will negatively impact the sensitivity of the search. We will re-start that run with new apps shortly.

BM

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4267
Credit: 244930831
RAC: 16511

BTW: GPU app version will

BTW: GPU app version will remain 'beta test' versions until further notice.

BM

Logforme
Logforme
Joined: 13 Aug 10
Posts: 332
Credit: 1714373961
RAC: 0

Bernd Machenschalk wrote:We

Bernd Machenschalk wrote:
We found a problem in the app code that will negatively impact the sensitivity of the search. We will re-start that run with new apps shortly.

So I should abort any pending O2MD tasks since they won't produce any science?

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5842
Credit: 109404064642
RAC: 35486094

Bernd Machenschalk wrote:...

Bernd Machenschalk wrote:
... negatively impact the sensitivity

Does this mean that crunch times are likely to increase if the next app does a more sensitive search?

 

Bernd Machenschalk wrote:
...re-start that run with new apps shortly.

Do the large data files remain unchanged or do the existing ones need to be scrapped?

Thanks.

Cheers,
Gary.

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4267
Credit: 244930831
RAC: 16511

Our internal test showed a

Our internal test showed a runtime increase by about 20% (both CPU and GPU). We will use the same data files, only the application will change.

BM

Mr Anderson
Mr Anderson
Joined: 28 Oct 17
Posts: 37
Credit: 136078356
RAC: 275682

A number of mine over the

A number of mine over the weekend took a very long time to execute and then failed outright. I also aborted one this morning because it was still at 2% after 6 hours. Don't know if they are the CPU or GPU versions. One did validate which had "GWnew" in the name.

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5842
Credit: 109404064642
RAC: 35486094

Mr Anderson wrote:A number of

Mr Anderson wrote:
A number of mine over the weekend took a very long time to execute and then failed outright.

If you click on the task ID link for each failed task will see the reason for the failure.  In your case it is "197 (0x000000C5) EXIT_TIME_LIMIT_EXCEEDED".  In other words, your old GPU is not able to crunch the tasks in what is regarded as a reasonable time.  They were taking way longer than they should so BOINC pulled the plug on them.

Mr Anderson wrote:
I also aborted one this morning because it was still at 2% after 6 hours. Don't know if they are the CPU or GPU versions. One did validate which had "GWnew" in the name.

One way of telling is to look at the tasks tab in BOINC Manager.  A GPU task (which the failed ones were) will show both CPU and GPU resources being used.  Another way is to look at the application name.  The version 2.01 app has (GW-opencl-ati) which shows the use of an AMD/ATI GPU.  The CPU version is different (2.00) and has (GWnew) attached to the name instead.  If you look at how long your CPU tasks were taking compared to the GPU tasks (they were about 3 times shorter), you can understand why a "time limit exceeded" was invoked.

If you've successfully used your GPU for the FGRPB1G search you should continue to use it there.  Unless you have access to a more modern GPU, you probably should just use CPU cores for the new GW tasks.

Cheers,
Gary.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.