Windows S5R4 App 6.10 available for Beta Test

Stick
Stick
Joined: 24 Feb 05
Posts: 790
Credit: 31192054
RAC: 415

RE: I guess the trick

Message 89201 in response to message 89199

Quote:

I guess the trick might be that there were two different versions of the 6.10 app, the now official one would then be the one that got uploaded later to the server. If you have that one, it's signature will match the expected one. The original 6.10 beta app had a different checksum and will not match.

CU
Bikeman

This explanation is logical. But having 2 different v6.10's was not - just lazy configuration control. The last one should have been v6.11.

archae86
archae86
Joined: 6 Dec 05
Posts: 3145
Credit: 7023754931
RAC: 1805738

RE: This explanation is

Message 89202 in response to message 89201

Quote:
This explanation is logical. But having 2 different v6.10's was not - just lazy configuration control. The last one should have been v6.11.

I think the truth differs. I think the files are not different--however when the file was installed by ap_info.xml, no checksum is logged, and when you teleport into the center of the planet by removing it, the checksum field is not properly configured, so the check fails.

Standing by to be corrected. I thought I could provide a check from a machine I just converted, but I had been on 6.05, not on 6.10, so no comparison is possible.

Stick
Stick
Joined: 24 Feb 05
Posts: 790
Credit: 31192054
RAC: 415

RE: RE: This explanation

Message 89203 in response to message 89202

Quote:
Quote:
This explanation is logical. But having 2 different v6.10's was not - just lazy configuration control. The last one should have been v6.11.

I think the truth differs. I think the files are not different--however when the file was installed by ap_info.xml, no checksum is logged, and when you teleport into the center of the planet by removing it, the checksum field is not properly configured, so the check fails.

Standing by to be corrected. I thought I could provide a check from a machine I just converted, but I had been on 6.05, not on 6.10, so no comparison is possible.

This message from Bernd (also referenced by Bikeman) indicates that there were 2 different versions.

archae86
archae86
Joined: 6 Dec 05
Posts: 3145
Credit: 7023754931
RAC: 1805738

RE: This message from Bernd

Message 89204 in response to message 89203

Quote:
This message from Bernd (also referenced by Bikeman) indicates that there were 2 different versions.

True, but that is a different matter.

I recall the attempted checksum failing on conversion from ap_info to none on a previous transition where the issue you reference did not apply. At least that was my understanding.

Standing by for correction.

Stick
Stick
Joined: 24 Feb 05
Posts: 790
Credit: 31192054
RAC: 415

RE: RE: This message from

Message 89205 in response to message 89204

Quote:
Quote:
This message from Bernd (also referenced by Bikeman) indicates that there were 2 different versions.

True, but that is a different matter.

I recall the attempted checksum failing on conversion from ap_info to none on a previous transition where the issue you reference did not apply. At least that was my understanding.

Standing by for correction.


All I can say is, I have been using the Einstein Beta Apps (when available) for several years now and this is the first time I have run into this issue. I also checked to see when I had downloaded v6.10. It was on 12/21/08 - 10 days before Bernd posted the update. (Obviously, I didn't have any problems with the older version.) So now, I really don't know what to think. But, I guess it really doesn't matter. The "Reset project" I did fixed the problem.

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5842
Credit: 109396820024
RAC: 35766077

RE: RE: This message from

Message 89206 in response to message 89204

Quote:
Quote:
This message from Bernd (also referenced by Bikeman) indicates that there were 2 different versions.
True, but that is a different matter.


Yes, exactly. The SSE2 component in the 6.10 package had a problem in the original version and just that component was replaced. People who are using the 6.10 beta test version should have upgraded at the time this was anounced and if so they would be using the exact same files as when 6.10 became official.

Quote:
I recall the attempted checksum failing on conversion from ap_info to none on a previous transition where the issue you reference did not apply. At least that was my understanding.


This is precisely my experience as well which is why I have assumed that the problem is actually the failure of the comparison of no checksum with an actual checksum rather than the comparison of two different checksums.

Quote:
Standing by for correction.


As am I :-).

Here is another data point just to confuse the issue. I have a Linux machine running 6.02 version under app_info.xml. Its cache has run dry under NNT. I have just deleted the app_info.xml and restarted BOINC. BOINC was quite happy with the beta test 6.02 files that it found as it simply downloaded a couple of new tasks without replacing any app files. I will now repeat the experiment with a machine whose cache hasn't run dry. I'll watch the current task finish (with more tasks left in the cache) and then stop and remove app_info.xml. I expect that perhaps the remaining tasks may be invalidated.

I'll report back when I have the answer.

Cheers,
Gary.

samuel7
samuel7
Joined: 16 Feb 05
Posts: 34
Credit: 1579363
RAC: 0

RE: RE: This message from

Message 89207 in response to message 89204

Quote:
Quote:
This message from Bernd (also referenced by Bikeman) indicates that there were 2 different versions.

True, but that is a different matter.

I recall the attempted checksum failing on conversion from ap_info to none on a previous transition where the issue you reference did not apply. At least that was my understanding.

Standing by for correction.

I reported a successful transition from anonymous platform 6.10 to stock 6.10 by simply removing app_info.xml here. Only the skygrid file was redownloaded and the signature(s) were added to client_state.xml as shown.

The 6.10 _2 app was updated 31 Dec. The older version would understandably fail signature verification.

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5842
Credit: 109396820024
RAC: 35766077

RE: I reported a successful

Message 89208 in response to message 89207

Quote:
I reported a successful transition from anonymous platform 6.10 to stock 6.10 by simply removing app_info.xml here. Only the skygrid file was redownloaded and the signature(s) were added to client_state.xml as shown.


Unfortunately, this only happens nicely if you first allow your cache of tasks to completely drain. I have just verified this by performing the extra experiment on a host that had partly drained its cache. There were two tasks left and both were trashed just as archae86 had recalled they would be if the app_info file was removed and BOINC was restarted.

In planning this experiment, I had taken the precaution of disabling comms so that the trashed tasks couldn't be reported to the server. Then I stopped BOINC and performed some extra editing on the state file (client_state.xml). I had two objectives. One was to recover the two trashed tasks and the other was to cut and paste the file signatures from another machine that had already been successfully transitioned back to the stock app.

I can report that I've been fully successful with both objectives. It is a fairly simple matter to repair the damage to the state file by completely removing the trashed results from it. Then when comms are reinstated, the server will simply resend the "lost" results. I have actually used this technique to retrieve damaged results previously so I had a good idea it would simply work. The key is that the server will resend "lost" tasks as long as there has been no communication with the server between when the damage actually occurred and when the results are deleted out of the state file. That way the server cannot know that results have been trashed and it will happily resend them to you with a message about "resending lost results".

The insertion of the missing file signatures also worked as expected. The beta test app files, with their signatures added, were happily accepted and crunching has commenced on the lost results that were resent without having to download any fresh copies of the R4 stock app.

I then increased my cache size and immediately was able to get an R5 task and the full suite of R5 apps. On further cache size increases, I even got an R4 resend task to go with the new R5 tasks. At this point, the machine has three R4 tasks and three R5 tasks. I've suspended the R4 tasks to allow an R5 to crunch and so give a clue about crunch times. The host is a tualatin PIII and R4 tasks were taking around 32 hours. The first R5 task is just 1% crunched and on extrapolation this gives a completion estimate of around 31.5 hours. This is pretty disappointing for a projected halving of the crunch time. Maybe it's right at a peak and I'll get a bunch of credits for it :-).

Cheers,
Gary.

samuel7
samuel7
Joined: 16 Feb 05
Posts: 34
Credit: 1579363
RAC: 0

RE: RE: I reported a

Message 89209 in response to message 89208

Quote:
Quote:
I reported a successful transition from anonymous platform 6.10 to stock 6.10 by simply removing app_info.xml here. Only the skygrid file was redownloaded and the signature(s) were added to client_state.xml as shown.

Unfortunately, this only happens nicely if you first allow your cache of tasks to completely drain.


Ah sorry, I didn't realise you were talking about doing this with unfinished tasks still on the machine. Yes, I certainly drained my cache before the operation.

Quote:
I can report that I've been fully successful with both objectives. It is a fairly simple matter to repair the damage to the state file by completely removing the trashed results from it. Then when comms are reinstated, the server will simply resend the "lost" results. I have actually used this technique to retrieve damaged results previously so I had a good idea it would simply work. The key is that the server will resend "lost" tasks as long as there has been no communication with the server between when the damage actually occurred and when the results are deleted out of the state file. That way the server cannot know that results have been trashed and it will happily resend them to you with a message about "resending lost results".


This is good to know in case I manage to trash results without taking a backup first. One learns to do that when crunching for CPDN. So I would edit out the , and fields of the trashed results in the state file, is that right?

[edit] Ooops, no edit![/edit]

Quote:
The insertion of the missing file signatures also worked as expected. The beta test app files, with their signatures added, were happily accepted and crunching has commenced on the lost results that were resent without having to download any fresh copies of the R4 stock app.


And this would be required when removing app_info.xml with unfinished tasks on the machine, right?

Quote:
The first R5 task is just 1% crunched and on extrapolation this gives a completion estimate of around 31.5 hours. This is pretty disappointing for a projected halving of the crunch time. Maybe it's right at a peak and I'll get a bunch of credits for it :-).


I've found that 1% is too early to extrapolate the completion time and it'll always give too high an estimate. It should run through at least 20 skypoints. On S5R5 that would be more than with S5R4, about 5%, I think.

jimoun
jimoun
Joined: 23 Sep 08
Posts: 2
Credit: 6021755
RAC: 0

I'm looking for my results,

I'm looking for my results, and each result has next error:

R5_6.10a_2/extra_sources/lalapps-CVS/src/pulsar/hough/src2/HoughFStatToplist.c:589): doserr:2, ferr:0, errno:2: No such file or directory
c
422, 423, 2009-01-08 00:14:50.9532 [CRITICAL]: ERROR: Couldn't rename
h1_1182.30_S5R4__937_S5R4a_1_0.cpt.tmp: (/home/bema/einsteinathome/HierarchicalSearch/EaH_build_win32_release_einstein_S5R5_6.10a_2/extra_sources/lalapps-CVS/src/pulsar/hough/src2/HoughFStatToplist.c:589): doserr:2, ferr:0, errno:2: No such file or directory
c
.....

838, 839, 840, 2009-01-08 04:04:26.3996 [CRITICAL]: ERROR: Couldn't rename
h1_1182.30_S5R4__937_S5R4a_1_0.cpt.tmp: (/home/bema/einsteinathome/HierarchicalSearch/EaH_build_win32_release_einstein_S5R5_6.10a_2/extra_sources/lalapps-CVS/src/pulsar/hough/src2/HoughFStatToplist.c:589): doserr:2, ferr:0, errno:2: No such file or directory
c
done.
Writing output ...
2009-01-08 04:04:31.9310 [CRITICAL]: Failed to rename HoughFStat file to "../../projects/einstein.phys.uwm.edu/h1_1182.30_S5R4__937_S5R4a_1_0": 2: No such file or directory
done.
FPU status flags: COND_0 PRECISION
2009-01-08 04:04:32.1810 [normal]: done. calling boinc_finish(0).
called boinc_finish

but the valid state is VALID. Is this situation OK and results are corectly computed and processed, or I should do something to correct the situation?

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.