Switching Apps half way through a result

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4332
Credit: 251666565
RAC: 35776
Topic 191256

Hi!

First: If you don't know or understand what I mean, simply ignore this - this is nothing you should or must do.

Second: If you started thinking about this issue only as this thread has shown up - see First.

If you think there really is a need to overcome the standard BOINC client policy of crunching a task with the App it has been assigned to, please use the following procedure:

- stop the client
- remove the files of the "old" App the tasks were already assigned to
- copy the files of the new App, giving the copies the names of the old App files
- start the client again.

Please do not mess around e.g. with the client_state.xml file.

This will help us to keep an eye at the results that have been (exclusively) done with the new Apps.

BM

BM

Dp
Dp
Joined: 27 Aug 05
Posts: 14
Credit: 201564185
RAC: 273552

Switching Apps half way through a result

Hi
I had a disk corruption with a 70 hour unit almost finished.
The e@H files are available and I tried to copy/paste them but they did'nt resume.
Should I just forget it or is there a way to salvage this unit?

Thanks
dp

Annika
Annika
Joined: 8 Aug 06
Posts: 720
Credit: 494410
RAC: 0

From doing CPDN I seem to

From doing CPDN I seem to remember that to have a successful backup you have to copy the whole BOINC folder, not just the project files...

Bikeman (Heinz-Bernd Eggenstein)
Bikeman (Heinz-...
Moderator
Joined: 28 Aug 06
Posts: 3522
Credit: 761948911
RAC: 1106105

Knowing that there are a few

Knowing that there are a few BOINC developers and/or gurus here as well: Wouldn't it be cool to have a backup/restore mechanism integrated into BOINC itself (so it would take care to suspend projects, zip all relevant stuff etc..).

This was never an issue with short WU as for S@H, Rosetta@H and E@H, but for CPDN it would be useful. Is this on the road map?

CU

BRM

Alinator
Alinator
Joined: 8 May 05
Posts: 927
Credit: 9352143
RAC: 0

Generally speaking, BOINC

Generally speaking, BOINC will already recover gracefully from most system crashes, as long as the apps which were running at the time weren't writing something to the disk at the time it happened. They just resume from the last checkpoint.

If you backup your machine on a regular basis, it's theoretically possible to recover from that as well, but there's a couple problems you need to consider;

1.) Depending on the frequency of the backups, your host may have deadline problem due to the lost time from the last backup to the time of the crash.

2.) You have to be careful of the RPC sequence numbers for all the attached projects. If your host has contacted any of the projects since the backup occurred, then when the host makes contact with them again after you restore, the project will see an RPC seq # less than the currently stored one. This will cause a reset on the host and a new Host ID to be generated for it.

Although you do have a point. I wouldn't be all that thrilled if one of my slugs ended up dumping a 1.2 MSec. or higher run 90% into it.

FWIW, I've got a lot of run time with my old timers on BOINC and the number of times I've lost a WU because the crash corrupted the result output and state files is very low (2-3 times maybe in 2 years), but it can and does happen. OTOH, I'm running WinBoxes, so I couldn't even give you a guess how many times they've been restarted due to a BSOD, hung UI, etc. and BOINC recovered without an issue.

I just look at it as an occupational hazard in the BOINC game. ;-)

Alinator

Bikeman (Heinz-Bernd Eggenstein)
Bikeman (Heinz-...
Moderator
Joined: 28 Aug 06
Posts: 3522
Credit: 761948911
RAC: 1106105

RE: Generally speaking,

Message 33568 in response to message 33567

Quote:

Generally speaking, BOINC will already recover gracefully from most system crashes, as long as the apps which were running at the time weren't writing something to the disk at the time it happened. They just resume from the last checkpoint.

If you backup your machine on a regular basis, it's theoretically possible to recover from that as well, but there's a couple problems you need to consider;

1.) Depending on the frequency of the backups, your host may have deadline problem due to the lost time from the last backup to the time of the crash.

2.) You have to be careful of the RPC sequence numbers for all the attached projects. If your host has contacted any of the projects since the backup occurred, then when the host makes contact with them again after you restore, the project will see an RPC seq # less than the currently stored one. This will cause a reset on the host and a new Host ID to be generated for it.

Although you do have a point. I wouldn't be all that thrilled if one of my slugs ended up dumping a 1.2 MSec. or higher run 90% into it.

FWIW, I've got a lot of run time with my old timers on BOINC and the number of times I've lost a WU because the crash corrupted the result output and state files is very low (2-3 times maybe in 2 years), but it can and does happen. OTOH, I'm running WinBoxes, so I couldn't even give you a guess how many times they've been restarted due to a BSOD, hung UI, etc. and BOINC recovered without an issue.

I just look at it as an occupational hazard in the BOINC game. ;-)

Alinator

What seems to happen at CPDN a lot (well, sometimes) is that the science app is running into some error condition (think of it as something similar to the signal 11 stuff here) and then reports the result as error. Those who are cautious enough to make backups are sometimes able to recover from a situation like this by restoring a backup, the CPDN server supposedly is smart enough to "reopen" a result that had previously been declared an error if new tickle messages are sent. If the application error was only transient (maybe a resource shortage, memory leak, file handle leak, whatever) then the WU would make it past the crash-point and complete. I understand that crashed WUs are not immediately re-sent to others at CPDN, so this make some sense.

CU

BRM

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.