Hi!
First: If you don't know or understand what I mean, simply ignore this - this is nothing you should or must do.
Second: If you started thinking about this issue only as this thread has shown up - see First.
If you think there really is a need to overcome the standard BOINC client policy of crunching a task with the App it has been assigned to, please use the following procedure:
- stop the client
- remove the files of the "old" App the tasks were already assigned to
- copy the files of the new App, giving the copies the names of the old App files
- start the client again.
Please do not mess around e.g. with the client_state.xml file.
This will help us to keep an eye at the results that have been (exclusively) done with the new Apps.
BM
BM
Copyright © 2024 Einstein@Home. All rights reserved.
Switching Apps half way through a result
)
Hi
I had a disk corruption with a 70 hour unit almost finished.
The e@H files are available and I tried to copy/paste them but they did'nt resume.
Should I just forget it or is there a way to salvage this unit?
Thanks
dp
From doing CPDN I seem to
)
From doing CPDN I seem to remember that to have a successful backup you have to copy the whole BOINC folder, not just the project files...
Knowing that there are a few
)
Knowing that there are a few BOINC developers and/or gurus here as well: Wouldn't it be cool to have a backup/restore mechanism integrated into BOINC itself (so it would take care to suspend projects, zip all relevant stuff etc..).
This was never an issue with short WU as for S@H, Rosetta@H and E@H, but for CPDN it would be useful. Is this on the road map?
CU
BRM
Generally speaking, BOINC
)
Generally speaking, BOINC will already recover gracefully from most system crashes, as long as the apps which were running at the time weren't writing something to the disk at the time it happened. They just resume from the last checkpoint.
If you backup your machine on a regular basis, it's theoretically possible to recover from that as well, but there's a couple problems you need to consider;
1.) Depending on the frequency of the backups, your host may have deadline problem due to the lost time from the last backup to the time of the crash.
2.) You have to be careful of the RPC sequence numbers for all the attached projects. If your host has contacted any of the projects since the backup occurred, then when the host makes contact with them again after you restore, the project will see an RPC seq # less than the currently stored one. This will cause a reset on the host and a new Host ID to be generated for it.
Although you do have a point. I wouldn't be all that thrilled if one of my slugs ended up dumping a 1.2 MSec. or higher run 90% into it.
FWIW, I've got a lot of run time with my old timers on BOINC and the number of times I've lost a WU because the crash corrupted the result output and state files is very low (2-3 times maybe in 2 years), but it can and does happen. OTOH, I'm running WinBoxes, so I couldn't even give you a guess how many times they've been restarted due to a BSOD, hung UI, etc. and BOINC recovered without an issue.
I just look at it as an occupational hazard in the BOINC game. ;-)
Alinator
RE: Generally speaking,
)
What seems to happen at CPDN a lot (well, sometimes) is that the science app is running into some error condition (think of it as something similar to the signal 11 stuff here) and then reports the result as error. Those who are cautious enough to make backups are sometimes able to recover from a situation like this by restoring a backup, the CPDN server supposedly is smart enough to "reopen" a result that had previously been declared an error if new tickle messages are sent. If the application error was only transient (maybe a resource shortage, memory leak, file handle leak, whatever) then the WU would make it past the crash-point and complete. I understand that crashed WUs are not immediately re-sent to others at CPDN, so this make some sense.
CU
BRM