GNU/Linux S5R3 App 4.24 available for Beta test

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5842
Credit: 109376836219
RAC: 35991674

RE: In app_info.xml I see

Message 76872 in response to message 76866

Quote:


In app_info.xml I see the following section:

einstein_S5R3
402

einstein_S5R3_4.02_i686-pc-linux-gnu



einstein_S5R3_4.02_i686-pc-linux-gnu.so

I know it's only a warning, so it does not really matter.

But I think there should be a link to 4.24 app file after

The above snippet (in its entirety) is what I'm saying could be removed. If your statement that I've emboldened means that you think it should be 4.24 after rather than 4.02, then please don't do that as it would be wrong. There was a change in the checkpoint format after 4.02 so that anything partially crunched with 4.02 cannot be continued with 4.24.

The simplest solution for you is simply to remove everything exactly as you have listed it above.

Cheers,
Gary.

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5842
Credit: 109376836219
RAC: 35991674

I have converted two

I have converted two computers that were previously running the 4.14 version app to the new 4.24 beta. Both of these are running the 5.8.15 version of BOINC. Before doing the conversion, I took the original app_info.xml and edited it to remove all references to the 4.02 app and to add 4.20 and 4.21 as versions that were compatible with 4.24. Each machine had a 4.14 branded task in progress at the time of conversion.

The first machine (AMD Sempron) picked up the partially completed task and finished it off with the new version. The result has validated and the machine is now working on a new 4.24 branded task that it downloaded whilst the 4.14 task was being completed. After attempting to account for the variability due to position in the sequence, it would appear that 4.24 is about 6% slower than 4.14. I'll obviously need to look at more results to get a better estimate.

The second machine (AMD Athlon XP) immediately had a problem when BOINC was restarted with the 4.24 science app. Below is the complete log of the restart:-

Quote:

Wed 16 Jan 2008 07:35:07 PM EST||Starting BOINC client version 5.8.15 for i686-pc-linux-gnu
Wed 16 Jan 2008 07:35:07 PM EST||log flags: task, file_xfer, sched_ops
Wed 16 Jan 2008 07:35:07 PM EST||Libraries: libcurl/7.16.0 OpenSSL/0.9.8d zlib/1.2.3
Wed 16 Jan 2008 07:35:07 PM EST||Executing as a daemon
Wed 16 Jan 2008 07:35:07 PM EST||Data directory: /home/gary/BOINC
Wed 16 Jan 2008 07:35:07 PM EST|Einstein@Home|Found app_info.xml; using anonymous platform
Wed 16 Jan 2008 07:35:07 PM EST||Processor: 1 AuthenticAMD AMD Athlon(tm) [fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 mmx fxsr sse syscall mmxext 3dnowext 3dnow up ts]
Wed 16 Jan 2008 07:35:07 PM EST||Memory: 504.12 MB physical, 3.90 GB virtual
Wed 16 Jan 2008 07:35:07 PM EST||Disk: 21.61 GB total, 21.30 GB free
Wed 16 Jan 2008 07:35:07 PM EST|Einstein@Home|URL: http://einstein.phys.uwm.edu/; Computer ID: 948549; location: home; project prefs: home
Wed 16 Jan 2008 07:35:07 PM EST|lhcathome|URL: http://lhcathome.cern.ch/lhcathome/; Computer ID: 9647583; location: home; project prefs: home
Wed 16 Jan 2008 07:35:07 PM EST||General prefs: from Einstein@Home (last modified 2008-01-13 15:52:56)
Wed 16 Jan 2008 07:35:07 PM EST||Host location: home
Wed 16 Jan 2008 07:35:07 PM EST||General prefs: using separate prefs for home
Wed 16 Jan 2008 07:35:07 PM EST|Einstein@Home|[error] Failed to open init file slots/0/init_data.xml
Wed 16 Jan 2008 07:35:07 PM EST|Einstein@Home|Deferring communication for 1 min 0 sec
Wed 16 Jan 2008 07:35:07 PM EST|Einstein@Home|Reason: Unrecoverable error for result h1_0696.20_S5R2__10_S5R3a_1 (Can't write init file)
Wed 16 Jan 2008 07:35:08 PM EST|Einstein@Home|Computation for task h1_0696.20_S5R2__10_S5R3a_1 finished
Wed 16 Jan 2008 07:35:08 PM EST|Einstein@Home|Output file h1_0696.20_S5R2__10_S5R3a_1_0 for task h1_0696.20_S5R2__10_S5R3a_1 absent
Wed 16 Jan 2008 07:36:08 PM EST|Einstein@Home|Sending scheduler request: To fetch work
Wed 16 Jan 2008 07:36:08 PM EST|Einstein@Home|Requesting 8640 seconds of new work, and reporting 1 completed tasks
Wed 16 Jan 2008 07:36:13 PM EST|Einstein@Home|Scheduler RPC succeeded [server version 601]
Wed 16 Jan 2008 07:36:13 PM EST|Einstein@Home|Deferring communication for 1 min 0 sec
Wed 16 Jan 2008 07:36:13 PM EST|Einstein@Home|Reason: requested by project
Wed 16 Jan 2008 07:36:15 PM EST|Einstein@Home|Starting h1_0696.20_S5R2__5_S5R3a_1
Wed 16 Jan 2008 07:36:15 PM EST|Einstein@Home|Starting task h1_0696.20_S5R2__5_S5R3a_1 using einstein_S5R3 version 424

You can see that the error refers to a file in the slots/0 directory. Nothing was touched in any slots directory and the referenced file was there and browseable when I looked just after the new task had commenced. I'm presuming that it would have been there when the 4.14 app was stopped just prior to the changeover to 4.24.

This is the first time I've come across this particular error message. Hopefully someone may have a clue as to what caused this. The only thing I can think of is possible corruption of that file if it were being written to just as BOINC was being stopped for the transition to the new beta app. I presume the file is created when a task is first started and that it may be updated from time to time during the life of the task. From the log snippet, something was trying to write to it for some reason.

As you can see, the machine downloaded a replacement task and started crunching it with 4.24 and with no further issues. The replacement is due to complete in about another hour or so.

Cheers,
Gary.

josep
josep
Joined: 9 Mar 05
Posts: 63
Credit: 1156542
RAC: 0

RE: The above snippet (in

Message 76874 in response to message 76872

Quote:


The above snippet (in its entirety) is what I'm saying could be removed. If your statement that I've emboldened means that you think it should be 4.24 after rather than 4.02, then please don't do that as it would be wrong. There was a change in the checkpoint format after 4.02 so that anything partially crunched with 4.02 cannot be continued with 4.24.

The simplest solution for you is simply to remove everything exactly as you have listed it above.

Thanks, Gary.

I have removed entirely this section, as you suggest, and I don't get any warning message anymore.

tullio
tullio
Joined: 22 Jan 05
Posts: 2118
Credit: 61407735
RAC: 0

Finished a WU with 4.20

Finished a WU with 4.20 getting 236.43 credits. Stopped BOINC, loaded 4.24, restarted BOINC and waiting for a new Einstein WU. Running QMC high priority, deadline is January 28. Waiting for an Opteron CPU, should be faster that my PII Linux box, which never had a computation error, however.
Tullio

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4264
Credit: 244920768
RAC: 17022

RE: This is the first time

Message 76876 in response to message 76873

Quote:
This is the first time I've come across this particular error message. Hopefully someone may have a clue as to what caused this. The only thing I can think of is possible corruption of that file if it were being written to just as BOINC was being stopped for the transition to the new beta app. I presume the file is created when a task is first started and that it may be updated from time to time during the life of the task.


That's entirely true.

If everything is running fine now I wouldn't worry. It might be though that the slots/0 directory has a permission problem that could come up again in one of the next tasks.

BM

PS: as for the app_init.xml file - you're right, I should better strip it of all the references to 4.02. Feel free to do so manually until I found the time. I hope to be able to publish a new App soon anyway, as this one didn't show the speedup I expected, but I want to get the "signal 11" issue fixed.

BM

th3
th3
Joined: 24 Aug 06
Posts: 208
Credit: 2208434
RAC: 0

edit: NVM, thought this was

edit: NVM, thought this was the 4.21 thread... well... =)

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5842
Credit: 109376836219
RAC: 35991674

RE: If everything is

Message 76878 in response to message 76876

Quote:
If everything is running fine now I wouldn't worry. It might be though that the slots/0 directory has a permission problem that could come up again in one of the next tasks.

Two 4.24 results have been completed successfully so it was just a "once off" issue. I'm fairly sure there was no permission problem with the slots/0 directory so it must have been a corrupt file.

In view of your comments in the 4.15 Windows thread, I'm eagerly looking forward to both Linux and Windows versions that do show the expected speedup from the linear SIN/COS code.

Cheers,
Gary.

Mikie Tim T
Mikie Tim T
Joined: 22 Jan 05
Posts: 105
Credit: 263777741
RAC: 0

3 workunits completed on mine

3 workunits completed on mine and all have validated successfully.

KSMarksPsych
KSMarksPsych
Moderator
Joined: 15 Oct 05
Posts: 2702
Credit: 4090227
RAC: 0

All my recent results have

All my recent results have validated, including the one where the network cable got pulled.

Kathryn :o)

Einstein@Home Moderator

Melvyn Bobo Slacke
Melvyn Bobo Slacke
Joined: 22 Jan 05
Posts: 32
Credit: 1692164
RAC: 0

10 units done with Boinc

10 units done with Boinc 5.2.13.
No problems when disabling network, it backoffs fine like it should :)
Speed seems to be about the same as 4.20 on an AMD X2.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.