S5GCE, was: Beyond S5R6

archae86
archae86
Joined: 6 Dec 05
Posts: 2,949
Credit: 3,943,197,217
RAC: 4,989,326

RE: Still the point

Message 96863 in response to message 96861

Quote:
Still the point remains, even my current dual core 64 bit is under performing due to software limitations.

Is that really so clear? The one really clear-cut advantage of 64-bit operation is large address range for single segments. But as that limitation is already well past a gigabyte in 32-bit implementations, it is not actually so very common just yet for that advantage to apply at the application level (you already get the benefit of being allowed to install larger physical RAM than 4 G regardless).

The 32 vs. 64-bit choice does not affect the format options for floating point. Indeed the 80-bit floating point representation supported by the original 8087 (which indeed is the direct progenitor of the IEEE floating point spec) is not only adequate, but considered overkill for most real-world floating point work--and that was available for 16-bit software on the 8086 by the early 1980s.

M. Schmitt
M. Schmitt
Joined: 27 Jun 05
Posts: 478
Credit: 15,872,262
RAC: 0

RE: S5GCE allways end with

Message 96864 in response to message 96858

Quote:
S5GCE allways end with Compute error on both my machines (tasks 170334428, 170524895, 170556252, 170531847, 170451886, 170439512, 170439508). With similar message in kernel log file:
kernel: grsec: (default:D:/) denied load of writable library /SYSV01022003 by /var/lib/boinc/projects/einstein.phys.uwm.edu/einstein_S5GCE_1.04_i686-pc-linux-gnu__S5GCESSE2[einstein_S5GCE_:26521] uid/euid:114/114 gid/egid:1022/1022, parent /usr/bin/boinc_client[boinc_client:11869] uid/euid:114/114 gid/egid:1022/1022


Hm, don't know about grseced kernels, but it could be the libs(which lib?) should not be writable for your boinc user.
If I do a ldd on the einstein apps, I don't get dependencies on other but system libs. So what is '/SYSV01022003'?

Quote:

Arecibo tasks finnishing successfully.

Both machines running on grsec patched linux-2.6.33

It's possible compute only Arecibo tasks during problem with S5GCE will be solved?


Without some "dirty" tricks: No

martin
martin
Joined: 3 Apr 10
Posts: 5
Credit: 4,908,716
RAC: 0

RE: RE: S5GCE allways

Message 96865 in response to message 96864

Quote:
Quote:
S5GCE allways end with Compute error on both my machines (tasks 170334428, 170524895, 170556252, 170531847, 170451886, 170439512, 170439508). With similar message in kernel log file:
kernel: grsec: (default:D:/) denied load of writable library /SYSV01022003 by /var/lib/boinc/projects/einstein.phys.uwm.edu/einstein_S5GCE_1.04_i686-pc-linux-gnu__S5GCESSE2[einstein_S5GCE_:26521] uid/euid:114/114 gid/egid:1022/1022, parent /usr/bin/boinc_client[boinc_client:11869] uid/euid:114/114 gid/egid:1022/1022

Hm, don't know about grseced kernels, but it could be the libs(which lib?) should not be writable for your boinc user.
If I do a ldd on the einstein apps, I don't get dependencies on other but system libs. So what is '/SYSV01022003'?

Recompilation of glibc solved problem with messages from grsec. But S5GCE tasks still fail.

So 17. apríl 2010, 12:04:21 CEST Einstein@Home Starting h1_0444.15_S5R4__207_S5GCEa_1
So 17. apríl 2010, 12:04:21 CEST Einstein@Home Starting task h1_0444.15_S5R4__207_S5GCEa_1 using einstein_S5GCE version 104
So 17. apríl 2010, 12:05:03 CEST Einstein@Home Sending scheduler request: To fetch work.
So 17. apríl 2010, 12:05:03 CEST Einstein@Home Requesting new tasks
So 17. apríl 2010, 12:05:08 CEST Einstein@Home Scheduler request completed: got 0 new tasks
So 17. apríl 2010, 12:05:08 CEST Einstein@Home Message from server: BOINC will delete file h1_0393.95_S5R7 when no longer needed
So 17. apríl 2010, 12:05:08 CEST Einstein@Home Got server request to delete file h1_0393.95_S5R7
So 17. apríl 2010, 15:20:55 CEST Einstein@Home Computation for task h1_0444.15_S5R4__207_S5GCEa_1 finished
So 17. apríl 2010, 15:20:55 CEST Einstein@Home Output file h1_0444.15_S5R4__207_S5GCEa_1_0 for task h1_0444.15_S5R4__207_S5GCEa_1 absent
So 17. apríl 2010, 16:05:09 CEST Einstein@Home Sending scheduler request: To fetch work.

boinc user has write premissions to directory projects/einstein.phys.uwm.edu/

Gundolf Jahn
Gundolf Jahn
Joined: 1 Mar 05
Posts: 1,079
Credit: 341,280
RAC: 0

RE: Recompilation of glibc

Message 96866 in response to message 96865

Quote:
Recompilation of glibc solved problem with messages from grsec. But S5GCE tasks still fail.


Task h1_0444.15_S5R4__207_S5GCEa_1_0 ended with

ERROR: Stepped outside the coarse grid! 
2010-04-17 15:20:52.3172 (912) [CRITICAL]: ERROR: MAIN() returned with error '13'

but many of the other tasks got

APP DEBUG: Application caught signal 8.

FPU status word ffff80c1, flags: ERR_SUMM STACK_FAULT INVALID

which points to problems with kernel configuration option CONFIG_PREEMPT.

I did a google search for "Application caught signal 8" and found [url=http://old.nabble.com/CONFIG_PREEMPT-causes-corruption-of-application's-FPU-stack-td17293854.htm]CONFIG_PREEMPT causes corruption of application's FPU stack[/url]. One paragraph from there:

Quote:
I tracked this down to a single kernel configuration option. If
CONFIG_PREEMPT is set to 'y' the application will start crashing. If
CONFIG_PREEMPT is replaced by CONFIG_PREEMPT_VOLUNTARY, the application
will run without errors.


There have been threads about CONFIG_PREEMPT on this forum as well.

Gruß,
Gundolf

Computer sind nicht alles im Leben. (Kleiner Scherz)

Ver Greeneyes
Ver Greeneyes
Joined: 26 Mar 09
Posts: 140
Credit: 9,562,235
RAC: 0

Could the application detect

Message 96867 in response to message 96866

Could the application detect the value of CONFIG_PREEMPT and return a useful error message if it's set to the wrong thing? That's assuming this will fix things for him.

Jord
Joined: 26 Jan 05
Posts: 2,952
Credit: 5,733,350
RAC: 247

It's not a case of

Message 96868 in response to message 96867

It's not a case of application or BOINC in this case, so no. It's the Kernel that's doing this.

Quote:

The problem is with a Kernel is compiled with CONFIG_PREEMPT=Y while it should be =N

The CONFIG_PREEMPT option preempts any running task in memory, by permitting a low priority process to be preempted involuntarily even if it is in kernel mode executing a system call and would otherwise not be about to reach a natural preemption point. This wrecks the FPU stack, which in return gives you a signal 8 error outcome.

To check what the status of CONFIG_PREEMPT is in your Kernel, check the .config file in the /usr/src/linux directory, or similar directory, so do:
grep PREEMPT /usr/src/linux/.config

Or using /proc/config.gz do:
cat /proc/config.gz | gunzip - | grep PREEMPT

You'll get a list of PREEMPT options, the ones that your Kernel can do and which ones are set.
You'll have to make sure that CONFIG_PREEMPT isn't set to Yes, but preferably that PREEMPT_VOLUNTARY is set.

You can do this by recompiling your kernel. The how and what on that depends on your distro and your skill to roam around in Linux.

There are also runtime /proc/sys knobs and boot-time flags to turn voluntary preemption (CONFIG_VOLUNTARY_PREEMPT) and kernel preemption (CONFIG_PREEMPT) on/off:

# turn on/off voluntary preemption (if CONFIG_VOLUNTARY_PREEMPT)
echo 1 > /proc/sys/kernel/voluntary_preemption
echo 0 > /proc/sys/kernel/voluntary_preemption

# turn on/off the preemptible kernel feature (if CONFIG_PREEMPT)
/proc/sys/kernel/kernel_preemption
/proc/sys/kernel/kernel_preemption

The 'voluntary-preemption=0/1' and 'kernel-preemption=0/1' boot options can be used to control these flags at boot-time.

Not all distros allow for the latter use of startup flags, though.


Ver Greeneyes
Ver Greeneyes
Joined: 26 Mar 09
Posts: 140
Credit: 9,562,235
RAC: 0

Well, I know. So the

Message 96869 in response to message 96868

Well, I know. So the application doesn't have the necessary rights to query said .config file?

ML1
ML1
Joined: 20 Feb 05
Posts: 339
Credit: 66,354,585
RAC: 0

That's quite an obscure

That's quite an obscure one!

In case of interest, for my Mandriva kernel here, I get:

$ cat /proc/config.gz | gunzip - | grep PREEMPT
# CONFIG_PREEMPT_RCU is not set
# CONFIG_PREEMPT_RCU_TRACE is not set
CONFIG_PREEMPT_NOTIFIERS=y
CONFIG_PREEMPT_NONE=y
# CONFIG_PREEMPT_VOLUNTARY is not set
# CONFIG_PREEMPT is not set

Hope that might help,

Happy crunchin',
Martin

See new freedom: Mageia Linux
Take a look for yourself: Linux Format
The Future is what We all make IT (GPLv3)

martin
martin
Joined: 3 Apr 10
Posts: 5
Credit: 4,908,716
RAC: 0

RE: RE: Recompilation of

Message 96871 in response to message 96866

Quote:
Quote:
Recompilation of glibc solved problem with messages from grsec. But S5GCE tasks still fail.

Task h1_0444.15_S5R4__207_S5GCEa_1_0 ended with
ERROR: Stepped outside the coarse grid! 
2010-04-17 15:20:52.3172 (912) [CRITICAL]: ERROR: MAIN() returned with error '13'
but many of the other tasks got
APP DEBUG: Application caught signal 8.

FPU status word ffff80c1, flags: ERR_SUMM STACK_FAULT INVALID

which points to problems with kernel configuration option CONFIG_PREEMPT.

I did a google search for "Application caught signal 8" and found [url=http://old.nabble.com/CONFIG_PREEMPT-causes-corruption-of-application's-FPU-stack-td17293854.htm]CONFIG_PREEMPT causes corruption of application's FPU stack[/url]. One paragraph from there:

Quote:
I tracked this down to a single kernel configuration option. If
CONFIG_PREEMPT is set to 'y' the application will start crashing. If
CONFIG_PREEMPT is replaced by CONFIG_PREEMPT_VOLUNTARY, the application
will run without errors.

There have been threads about CONFIG_PREEMPT on this forum as well.

Gruß,
Gundolf

Thanks for help. Changing of CONFIG_PREEMPT helped. I found there
CONFIG_PREEMPT causes corruption of application's FPU stack, that this bug has been fixed 10 monts ago. Probably the old bug is there again.

Martin

DanNeely
DanNeely
Joined: 4 Sep 05
Posts: 1,339
Credit: 1,906,518,630
RAC: 1,423,821

RE: Thanks for help.

Message 96872 in response to message 96871

Quote:


Thanks for help. Changing of CONFIG_PREEMPT helped. I found there
CONFIG_PREEMPT causes corruption of application's FPU stack, that this bug has been fixed 10 monts ago. Probably the old bug is there again.

Martin

Is this the same bug that was first found when it was discovered to be behind Einstein crashes?

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.