Windows S5R4 App 6.10 available for Beta Test

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 3,915
Credit: 193,354,862
RAC: 24,157

RE: I'm looking for my

Message 89211 in response to message 89210

Quote:
I'm looking for my results, and each result has next error:
...
but the valid state is VALID. Is this situation OK and results are corectly computed and processed, or I should do something to correct the situation?


Thanks for the report. I'm investigating.
This looks like the checkpointing may not work on your machine, for some reason. You're probably ok as long as you don't need the checkpoints, i.e. the BOINC client isn't stopped and the application is kept in memory while suspended. Anyway I'll see if there is a serious bug in our code.

BM

BM

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 3,915
Credit: 193,354,862
RAC: 24,157

RE: Anyway I'll see if

Message 89212 in response to message 89211

Quote:
Anyway I'll see if there is a serious bug in our code.


I couldn't find this problem in the results of the same applicacation version (301) on similar machines (Win XP), so I guess it;s not a problem in the code of the new App.

What installation are you running BOINC in (service, single user?). It coul dbe that there is something wrong with the slots directory on the machine. Could you run a filesystem check and check the permissions?

BM

BM

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5,128
Credit: 36,939,799,012
RAC: 37,800,152

RE: .... So I would edit

Message 89213 in response to message 89209

Quote:
.... So I would edit out the , and fields of the trashed results in the state file, is that right?


In this particular case, I just removed the two trashed ... blocks. In other cases where the damage was more severe I have had to edit or remove in all three places. I particularly look for obvious signs of damage where status flag values have been changed or error messages have been embedded. I find it's pretty obvious what needs to be done to fix things.

Quote:
And this would be required when removing app_info.xml with unfinished tasks on the machine, right?


Yes, absolutely!

Quote:
I've found that 1% is too early to extrapolate the completion time and it'll always give too high an estimate. It should run through at least 20 skypoints. On S5R5 that would be more than with S5R4, about 5%, I think.


My experience has been that crunching behaviour is roughly linear even at only 1% done. I did check again at a later stage and it was still looking like a 31 hour completion time. I'll be checking the host again tomorrow and am hoping to see some sort of a speedup.

I'm running about 150 hosts so I don't have time to babysit any particular host :-). While waiting for caches to drain, I've now converted quite a few cached up machines to allow them to start getting R5 immediately. I haven't had any failures so far :-).

Cheers,
Gary.

samuel7
samuel7
Joined: 16 Feb 05
Posts: 34
Credit: 1,579,363
RAC: 0

RE: I find it's pretty

Message 89214 in response to message 89213

Quote:
I find it's pretty obvious what needs to be done to fix things.


I hope I find it as simple if ever necessary. Will do a backup just in case. Thanks for all the info, Gary!

Quote:
Quote:
I've found that 1% is too early to extrapolate the completion time and it'll always give too high an estimate. It should run through at least 20 skypoints. On S5R5 that would be more than with S5R4, about 5%, I think.

My experience has been that crunching behaviour is roughly linear even at only 1% done. I did check again at a later stage and it was still looking like a 31 hour completion time. I'll be checking the host again tomorrow and am hoping to see some sort of a speedup.


First I must correct my statement about the number of skypoints. Looks like the S5R5 tasks have just 120 or so.

Maybe the platform makes a difference if you're running on Linux for instance. A recent task on my Q9550, Vista64 had done 0.96% at 0:04:40 for a completion estimate of over 8 hours. The task completed in the expected time of 6.5 hours. But let's indeed hope the speedup is there!

Quote:
I'm running about 150 hosts so I don't have time to babysit any particular host :-). While waiting for caches to drain, I've now converted quite a few cached up machines to allow them to start getting R5 immediately. I haven't had any failures so far :-).


That's a nice fleet! And truly requires a method of converting 'on the fly.' May the success continue!

Sami.

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5,128
Credit: 36,939,799,012
RAC: 37,800,152

RE: Maybe the platform

Message 89215 in response to message 89214

Quote:
Maybe the platform makes a difference if you're running on Linux for instance. A recent task on my Q9550, Vista64 had done 0.96% at 0:04:40 for a completion estimate of over 8 hours. The task completed in the expected time of 6.5 hours. But let's indeed hope the speedup is there!


That host is running Linux (more than 90% of my hosts are) but I don't think there is much difference between Windows and Linux these days. I've checked my estimated 31 hour task and it is now 70% complete with a projected total run time of just under 30 hours. It has picked up a little speed on the way, but not much!

On past experience, the first tasks in a new run are always at or near a runtime maximum so I hope to see much lower times in future as the host works down towards a trough in the cycle. Hopefully the new crediting policy (thanks to Bikeman I believe) should adequately reward these long running first tasks.

Cheers,
Gary.

jimoun
jimoun
Joined: 23 Sep 08
Posts: 2
Credit: 6,021,755
RAC: 0

RE: RE: Anyway I'll see

Message 89216 in response to message 89212

Quote:
Quote:
Anyway I'll see if there is a serious bug in our code.

I couldn't find this problem in the results of the same applicacation version (301) on similar machines (Win XP), so I guess it;s not a problem in the code of the new App.

What installation are you running BOINC in (service, single user?). It coul dbe that there is something wrong with the slots directory on the machine. Could you run a filesystem check and check the permissions?

BM

I'm using BOINC 6.2.19. Installation I think is multiuser. FIlesystem is OK, and perissions I changed to full access on folders. But the results still the same. But, it seems, that work is going with no problem - when I restart BOINC/computer, etc. computing is continuing - no partial work is lost. So I will continue to compute.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.