Switching Apps half way through a result

Bernd Machenschalk

Moderator

Administrator

Joined: 15 Oct 04

Posts: 4332

Credit: 251666565

RAC: 35776

20 May 2006 15:45:40 UTC

Topic 191256

(moderation:

)

Hi!

First: If you don't know or understand what I mean, simply ignore this - this is nothing you should or must do.

Second: If you started thinking about this issue only as this thread has shown up - see First.

If you think there really is a need to overcome the standard BOINC client policy of crunching a task with the App it has been assigned to, please use the following procedure:

- stop the client
- remove the files of the "old" App the tasks were already assigned to
- copy the files of the new App, giving the copies the names of the old App files
- start the client again.

Please do not mess around e.g. with the client_state.xml file.

This will help us to keep an eye at the results that have been (exclusively) done with the new Apps.

Joined: 27 Aug 05

Posts: 14

Credit: 201564185

RAC: 273552

Switching Apps half way through a result

12 May 2007 10:05:59 UTC

Message 33564

(moderation:

)

Hi
I had a disk corruption with a 70 hour unit almost finished.
The e@H files are available and I tried to copy/paste them but they did'nt resume.
Should I just forget it or is there a way to salvage this unit?

Thanks
dp

Annika

Joined: 8 Aug 06

Posts: 720

Credit: 494410

RAC: 0

From doing CPDN I seem to

12 May 2007 11:20:15 UTC

Message 33565

(moderation:

)

From doing CPDN I seem to remember that to have a successful backup you have to copy the whole BOINC folder, not just the project files...

Bikeman (Heinz-...

Moderator

Joined: 28 Aug 06

Posts: 3522

Credit: 761948911

RAC: 1106105

Knowing that there are a few

12 May 2007 11:38:39 UTC

Message 33566

(moderation:

)

Knowing that there are a few BOINC developers and/or gurus here as well: Wouldn't it be cool to have a backup/restore mechanism integrated into BOINC itself (so it would take care to suspend projects, zip all relevant stuff etc..).

This was never an issue with short WU as for S@H, Rosetta@H and E@H, but for CPDN it would be useful. Is this on the road map?

BRM

Alinator

Joined: 8 May 05

Posts: 927

Credit: 9352143

RAC: 0

Generally speaking, BOINC

12 May 2007 13:12:06 UTC

Message 33567

(moderation:

)

Generally speaking, BOINC will already recover gracefully from most system crashes, as long as the apps which were running at the time weren't writing something to the disk at the time it happened. They just resume from the last checkpoint.

If you backup your machine on a regular basis, it's theoretically possible to recover from that as well, but there's a couple problems you need to consider;

1.) Depending on the frequency of the backups, your host may have deadline problem due to the lost time from the last backup to the time of the crash.

2.) You have to be careful of the RPC sequence numbers for all the attached projects. If your host has contacted any of the projects since the backup occurred, then when the host makes contact with them again after you restore, the project will see an RPC seq # less than the currently stored one. This will cause a reset on the host and a new Host ID to be generated for it.

Although you do have a point. I wouldn't be all that thrilled if one of my slugs ended up dumping a 1.2 MSec. or higher run 90% into it.

FWIW, I've got a lot of run time with my old timers on BOINC and the number of times I've lost a WU because the crash corrupted the result output and state files is very low (2-3 times maybe in 2 years), but it can and does happen. OTOH, I'm running WinBoxes, so I couldn't even give you a guess how many times they've been restarted due to a BSOD, hung UI, etc. and BOINC recovered without an issue.

I just look at it as an occupational hazard in the BOINC game. ;-)

Alinator

Bikeman (Heinz-...

Moderator

Joined: 28 Aug 06

Posts: 3522

Credit: 761948911

RAC: 1106105

RE: Generally speaking,

12 May 2007 15:40:23 UTC

Message 33568 in response to message 33567

(moderation:

)

Quote:

Generally speaking, BOINC will already recover gracefully from most system crashes, as long as the apps which were running at the time weren't writing something to the disk at the time it happened. They just resume from the last checkpoint.

If you backup your machine on a regular basis, it's theoretically possible to recover from that as well, but there's a couple problems you need to consider;

1.) Depending on the frequency of the backups, your host may have deadline problem due to the lost time from the last backup to the time of the crash.

2.) You have to be careful of the RPC sequence numbers for all the attached projects. If your host has contacted any of the projects since the backup occurred, then when the host makes contact with them again after you restore, the project will see an RPC seq # less than the currently stored one. This will cause a reset on the host and a new Host ID to be generated for it.

Although you do have a point. I wouldn't be all that thrilled if one of my slugs ended up dumping a 1.2 MSec. or higher run 90% into it.

FWIW, I've got a lot of run time with my old timers on BOINC and the number of times I've lost a WU because the crash corrupted the result output and state files is very low (2-3 times maybe in 2 years), but it can and does happen. OTOH, I'm running WinBoxes, so I couldn't even give you a guess how many times they've been restarted due to a BSOD, hung UI, etc. and BOINC recovered without an issue.

I just look at it as an occupational hazard in the BOINC game. ;-)

Alinator

What seems to happen at CPDN a lot (well, sometimes) is that the science app is running into some error condition (think of it as something similar to the signal 11 stuff here) and then reports the result as error. Those who are cautious enough to make backups are sometimes able to recover from a situation like this by restoring a backup, the CPDN server supposedly is smart enough to "reopen" a result that had previously been declared an error if new tickle messages are sent. If the application error was only transient (maybe a resource shortage, memory leak, file handle leak, whatever) then the WU would make it past the crash-point and complete. I understand that crashed WUs are not immediately re-sent to others at CPDN, so this make some sense.

BRM

Switching Apps half way through a result

Forums › Cruncher's Corner

Switching Apps half way through a result

From doing CPDN I seem to

Knowing that there are a few

Generally speaking, BOINC

RE: Generally speaking,

Comment viewing options

Forums › Cruncher's Corner