One of my machines, 12425652 (Intel Core i3-2100 CPU running Linux 4.4.0-42-generic (Ubuntu 14.04)) has been consistently crapping out the Multi-Directed Continuous Gravitational Wave search Tuning run G v1.01 (AVX) x86_64-pc-linux-gnu with errors after only a few seconds.
Only one other machine (12057966, running Mac OS Sierra) has picked up one of these tasks, but hasn't run it yet. Keeping my fingers crossed.
There are two types. One comes from faulty datafiles where I already canceled the tasks that are affected and we are investigating if other tasks are also affected as I write.
The other type of errors are signal 8 terminations (floating point exceptions) like the ones Jonathan Jeckel and Sergey Kovalchuk experience. I tested some of the tasks on my own PC and they are working. So it does not seem to be a data problem. We will have to wait if other hosts also throw exceptions for those.
Regarding AVX and SSE2:
The app automatically downgrades to SSE2 if the computer does not have AVX capability.
Edit: I disabled the work generators until we know what other datafiles are affected. That's also the reason why the app is still in beta. The work generated so far should suffice until tomorrow.
As I wrote yesterday I just started the job generation for CasA and VelaJr. Right now we don't see an alarming rate of errors. There are several errors but only from a handful of computers and other computers are uploading successful results.
The application names can be found as usual on the application overview page but are in general not needed.
The task names have a similar structure as for the all-sky run. You can see what your computer is doing by this name. Here is a short explanation:
1: locality scheduling information
2: application
3: target (CasA, VelaJr, G3473)
4: search frequency
Thanks Christian but what I was looking for was this
einstein_O1MD1TCV
Since I run 20 and 16 core machines, Einstein decides it wants to run as many as possible on those machines and I end up with almost 100% of all cores running. By use of a app_config.xml I can turn down how many are running and the computer become usable again.
Results are being returned but not seeing any valid yet. On the plus side, they are not erroring either.
By use of a app_config.xml I can turn down how many are running and the computer become usable again.
Enable the <cpu_sched> diagnostic log flag from the Options menu in BOINC Manager, Advanced View (or by Ctrl+Shift+F).
As each task starts, you will see a line like
13/10/2016 17:16:04 | Einstein@Home | [cpu_sched] Starting task p2030.20141120.G192.71+00.73.S.b3s0g0.00000_544_0 using einsteinbinary_BRP4G version 152 (BRP4G-Beta-opencl-intel_gpu) in slot 4
in the event log. The section I've emphasised (immediately following the keyword 'using') is what you want to use in the <app> section of app_config.xml.
If you heed to refine it further, the plan_class needed to identify the <app_version> is also logged on the same line.
Note that the diagnostic log you need for this is plain <cpu_sched>. You wouldn't want to even see <cpu_sched_debug>.
Thanks Richard, I was able to locate the name and listed in my last post.
My worst fear was correct. Einstein saw my 20 core and is using all of them to crunch the Gravity waves.
Forgive the image but I find people believe you more when they can see what you are talking about...
I've managed to decrease that down to 10 but another issue is now around and I should post that in the server thread. (Einstein has since the change ignored local preferences to limit work units to 1 days worth, I now have near 200 GW work units and no way am I going to be able to complete all of them before deadline. It was doing the same for gamma rays but those didn't have the higher level of importance that the GW do)
The box I am running GW on I set to NNT until I see if the work I've done validates. I see no reason to waste time if the work doesn't validate. It will have run 10 of these little gems, that should be enough of a test for the app on that host.
The other type of errors are signal 8 terminations (floating point exceptions) like the ones Jonathan Jeckel and Sergey Kovalchuk experience. I tested some of the tasks on my own PC and they are working. So it does not seem to be a data problem. We will have to wait if other hosts also throw exceptions for those.
On the third machine I get "No work is available for Multi-Directed Continuous Gravitational Wave search Tuning run G", even though there seems to be lots of work available.
One of my machines, 12425652
)
One of my machines, 12425652 (Intel Core i3-2100 CPU running Linux 4.4.0-42-generic (Ubuntu 14.04)) has been consistently crapping out the Multi-Directed Continuous Gravitational Wave search Tuning run G v1.01 (AVX) x86_64-pc-linux-gnu with errors after only a few seconds.
Only one other machine (12057966, running Mac OS Sierra) has picked up one of these tasks, but hasn't run it yet. Keeping my fingers crossed.
Two of my old host (Intel
)
Two of my old host (Intel P4 & DUO E2160, Linux x86_64) get only AVX version of the application with a constant error in a few seconds.
How can I choose SSE2 version?
Regarding the errors we see
)
Regarding the errors we see right now:
There are two types. One comes from faulty datafiles where I already canceled the tasks that are affected and we are investigating if other tasks are also affected as I write.
The other type of errors are signal 8 terminations (floating point exceptions) like the ones Jonathan Jeckel and Sergey Kovalchuk experience. I tested some of the tasks on my own PC and they are working. So it does not seem to be a data problem. We will have to wait if other hosts also throw exceptions for those.
Regarding AVX and SSE2:
The app automatically downgrades to SSE2 if the computer does not have AVX capability.
Edit: I disabled the work generators until we know what other datafiles are affected. That's also the reason why the app is still in beta. The work generated so far should suffice until tomorrow.
Christian Beer wrote:As I
)
Thanks Christian but what I was looking for was this
einstein_O1MD1TCV
Since I run 20 and 16 core machines, Einstein decides it wants to run as many as possible on those machines and I end up with almost 100% of all cores running. By use of a app_config.xml I can turn down how many are running and the computer become usable again.
Results are being returned but not seeing any valid yet. On the plus side, they are not erroring either.
Thanks again
Zalster
Zalster wrote:By use of a
)
Enable the <cpu_sched> diagnostic log flag from the Options menu in BOINC Manager, Advanced View (or by Ctrl+Shift+F).
As each task starts, you will see a line like
13/10/2016 17:16:04 | Einstein@Home | [cpu_sched] Starting task p2030.20141120.G192.71+00.73.S.b3s0g0.00000_544_0 using einsteinbinary_BRP4G version 152 (BRP4G-Beta-opencl-intel_gpu) in slot 4
in the event log. The section I've emphasised (immediately following the keyword 'using') is what you want to use in the <app> section of app_config.xml.
If you heed to refine it further, the plan_class needed to identify the <app_version> is also logged on the same line.
Note that the diagnostic log you need for this is plain <cpu_sched>. You wouldn't want to even see <cpu_sched_debug>.
Thanks Richard, I was able to
)
Thanks Richard, I was able to locate the name and listed in my last post.
My worst fear was correct. Einstein saw my 20 core and is using all of them to crunch the Gravity waves.
Forgive the image but I find people believe you more when they can see what you are talking about...
I've managed to decrease that down to 10 but another issue is now around and I should post that in the server thread. (Einstein has since the change ignored local preferences to limit work units to 1 days worth, I now have near 200 GW work units and no way am I going to be able to complete all of them before deadline. It was doing the same for gamma rays but those didn't have the higher level of importance that the GW do)
Zalster
The box I am running GW on I
)
The box I am running GW on I set to NNT until I see if the work I've done validates. I see no reason to waste time if the work doesn't validate. It will have run 10 of these little gems, that should be enough of a test for the app on that host.
Christian Beer wrote: The
)
<waves>
Thanks for the updates Christian, some more here on Xeon(R) CPU E5-2660 CV v1.01 (AVX) with stack trace .... https://einsteinathome.org/task/581023434
and i5-4690K G v1.01 (AVX) https://einsteinathome.org/task/581002922
According to the Server there
)
According to the Server there are plenty of work units available but I am unable to get any, getting the message that "no work is available".
What are the requirements for getting work?
What computer types?
Are they all taken and none left for me?
If I allow Gamma Ray work units I will get as many as I want.
Conan
Conan_4 wrote:According to
)
Same for me. I only get the gamma-ray ones. I have tried to disable that, but then I don't get any CPU work at all.
I tried to look in the contact logs but for two of my machines I see no references to gravity wave stuff:
https://einsteinathome.org/host/12238215/log
https://einsteinathome.org/host/11707278/log
On the third machine I get "No work is available for Multi-Directed Continuous Gravitational Wave search Tuning run G", even though there seems to be lots of work available.
https://einsteinathome.org/host/7562651/log