2. The checkpoint files are separate from the client_state files. The are also written according to your settings, too, so once a minute per default.
3. The temp files can grow large - several MBs. I think for most users it's not a good idea to keep and write two copies of them. Most would prefer to deal with BOINC like with any other program - prevent the machine from crashing, as it may trash work. At least with BOINC it's not you own work you lose, just a bit of CPU time.
Did not know that NTFS5 journaled... But I've had bad experiences with using NTFS as C: drive file system, so I avoid it. I would redirect Einstein to put its files on another drive with NTFS if BOINC allowed it...
>
> 2. The checkpoint files are separate from the client_state files. The are also
> written according to your settings, too, so once a minute per default.
That brings up another challange with windows systems that have "deleted file recovery"... HUNDREDS of old copies of "client_state_prev.xml"... Can you just reuse the alternate instead of deleting the old copy and creating a new one??
>
> 3. The temp files can grow large - several MBs. I think for most users it's
> not a good idea to keep and write two copies of them. Most would prefer to
> deal with BOINC like with any other program - prevent the machine from
> crashing, as it may trash work. At least with BOINC it's not you own work you
> lose, just a bit of CPU time.
Perhaps, but that's what started this thread. Ned
>
> BM
>
> > 1. NTFS5 should be journaling, isn't it?
>
> Did not know that NTFS5 journaled... But I've had bad experiences with using
> NTFS as C: drive file system, so I avoid it. I would redirect Einstein to put
> its files on another drive with NTFS if BOINC allowed it...
> >
> > 2. The checkpoint files are separate from the client_state files. The are
> also
> > written according to your settings, too, so once a minute per default.
>
> That brings up another challange with windows systems that have "deleted file
> recovery"... HUNDREDS of old copies of "client_state_prev.xml"... Can you just
> reuse the alternate instead of deleting the old copy and creating a new one??
>
> >
> > 3. The temp files can grow large - several MBs. I think for most users
> it's
> > not a good idea to keep and write two copies of them. Most would prefer
> to
> > deal with BOINC like with any other program - prevent the machine from
> > crashing, as it may trash work. At least with BOINC it's not you own work
> you
> > lose, just a bit of CPU time.
>
> Perhaps, but that's what started this thread. Ned
> >
> > BM
> >
>
Good news: my system crashed again but this time einstein recovered well and no processing time was lost (as a workunit takes 13 hours on my system losing 6 hours of CPU would be very annoying). from this i infer that the previous failure to resume was a timing problem, with the crash occurring during some critical disk write operation.
Bad news: both crashes locked the system hard, but left the HD running, forcing a power reset. twice may be a coincidence, but this has only ever happened since einstein was installed two days ago.
Quite a lot of people (literally thousands by now) are running E@H without problems, so it's unlikely that BOINC/E@H causes the crashes by itself. It may, however, trigger other problems you have on your system and remained unnoticed before. Frequent issues include the graphics driver (E@H makes much more use of OpenGL than most other programs) and, as it is mainly CPU-bound, overheating / cooling problems.
> Quite a lot of people (literally thousands by now) are running E@H without
> problems, so it's unlikely that BOINC/E@H causes the crashes by itself. It
> may, however, trigger other problems you have on your system and remained
> unnoticed before. Frequent issues include the graphics driver (E@H makes much
> more use of OpenGL than most other programs) and, as it is mainly CPU-bound,
> overheating / cooling problems.
>
> BM
>
I agree totally. I have had episodes of crashing in the past which seemed to coincide with a new version of Einstein, but it turned out to be a (relatively new) graphics card that was drawing too much power. I had borrowed it from work, to play with some of its fancy new features, so I just put back the card I had before. Once I swapped that out, everything started running smoothly again. Both were nVidia cards, using the same driver version, so I can't blame the driver.
> 1. NTFS5 should be journaling, isn't it?
>
> 2. The checkpoint files are separate from the client_state files. The are also
> written according to your settings, too, so once a minute per default.
>
> 3. The temp files can grow large - several MBs. I think for most users it's
> not a good idea to keep and write two copies of them. Most would prefer to
> deal with BOINC like with any other program - prevent the machine from
> crashing, as it may trash work. At least with BOINC it's not you own work you
> lose, just a bit of CPU time.
>
> BM
>
Win98SE wouldn't be running NTFS of any variety, but FAT32 as I recall. I'm not aware of FAT32 having journalling capabilities, but please correct me if I'm wrong. Anyway, the real fix is to prevent the crashing in the first place, which alleviates the need to come up with workarounds for the current checkpointing scheme.
> Quite a lot of people (literally thousands by now) are running E@H without
> problems, so it's unlikely that BOINC/E@H causes the crashes by itself. It
> may, however, trigger other problems you have on your system and remained
> unnoticed before. Frequent issues include the graphics driver (E@H makes much
> more use of OpenGL than most other programs) and, as it is mainly CPU-bound,
> overheating / cooling problems.
Thanks. I'm investigating some possible causes; one of which may be an interaction with the Java 2 runtime environment.
Does einstein save all the checkpoint files when a normal exit is performed? On restart it continues with almost no progress loss, hopefully if another crash occurs it will resume from the last exit state.
1. NTFS5 should be
)
1. NTFS5 should be journaling, isn't it?
2. The checkpoint files are separate from the client_state files. The are also written according to your settings, too, so once a minute per default.
3. The temp files can grow large - several MBs. I think for most users it's not a good idea to keep and write two copies of them. Most would prefer to deal with BOINC like with any other program - prevent the machine from crashing, as it may trash work. At least with BOINC it's not you own work you lose, just a bit of CPU time.
BM
BM
> 1. NTFS5 should be
)
> 1. NTFS5 should be journaling, isn't it?
Did not know that NTFS5 journaled... But I've had bad experiences with using NTFS as C: drive file system, so I avoid it. I would redirect Einstein to put its files on another drive with NTFS if BOINC allowed it...
>
> 2. The checkpoint files are separate from the client_state files. The are also
> written according to your settings, too, so once a minute per default.
That brings up another challange with windows systems that have "deleted file recovery"... HUNDREDS of old copies of "client_state_prev.xml"... Can you just reuse the alternate instead of deleting the old copy and creating a new one??
>
> 3. The temp files can grow large - several MBs. I think for most users it's
> not a good idea to keep and write two copies of them. Most would prefer to
> deal with BOINC like with any other program - prevent the machine from
> crashing, as it may trash work. At least with BOINC it's not you own work you
> lose, just a bit of CPU time.
Perhaps, but that's what started this thread. Ned
>
> BM
>
Ol' Retired IT Geezer
> > 1. NTFS5 should be
)
> > 1. NTFS5 should be journaling, isn't it?
>
> Did not know that NTFS5 journaled... But I've had bad experiences with using
> NTFS as C: drive file system, so I avoid it. I would redirect Einstein to put
> its files on another drive with NTFS if BOINC allowed it...
> >
> > 2. The checkpoint files are separate from the client_state files. The are
> also
> > written according to your settings, too, so once a minute per default.
>
> That brings up another challange with windows systems that have "deleted file
> recovery"... HUNDREDS of old copies of "client_state_prev.xml"... Can you just
> reuse the alternate instead of deleting the old copy and creating a new one??
>
> >
> > 3. The temp files can grow large - several MBs. I think for most users
> it's
> > not a good idea to keep and write two copies of them. Most would prefer
> to
> > deal with BOINC like with any other program - prevent the machine from
> > crashing, as it may trash work. At least with BOINC it's not you own work
> you
> > lose, just a bit of CPU time.
>
> Perhaps, but that's what started this thread. Ned
> >
> > BM
> >
>
Good news: my system crashed again but this time einstein recovered well and no processing time was lost (as a workunit takes 13 hours on my system losing 6 hours of CPU would be very annoying). from this i infer that the previous failure to resume was a timing problem, with the crash occurring during some critical disk write operation.
Bad news: both crashes locked the system hard, but left the HD running, forcing a power reset. twice may be a coincidence, but this has only ever happened since einstein was installed two days ago.
einstein 4.79
boinc 4.19
Win98SE
--
searching for gravitational waves since 2005
Quite a lot of people
)
Quite a lot of people (literally thousands by now) are running E@H without problems, so it's unlikely that BOINC/E@H causes the crashes by itself. It may, however, trigger other problems you have on your system and remained unnoticed before. Frequent issues include the graphics driver (E@H makes much more use of OpenGL than most other programs) and, as it is mainly CPU-bound, overheating / cooling problems.
BM
BM
> Quite a lot of people
)
> Quite a lot of people (literally thousands by now) are running E@H without
> problems, so it's unlikely that BOINC/E@H causes the crashes by itself. It
> may, however, trigger other problems you have on your system and remained
> unnoticed before. Frequent issues include the graphics driver (E@H makes much
> more use of OpenGL than most other programs) and, as it is mainly CPU-bound,
> overheating / cooling problems.
>
> BM
>
I agree totally. I have had episodes of crashing in the past which seemed to coincide with a new version of Einstein, but it turned out to be a (relatively new) graphics card that was drawing too much power. I had borrowed it from work, to play with some of its fancy new features, so I just put back the card I had before. Once I swapped that out, everything started running smoothly again. Both were nVidia cards, using the same driver version, so I can't blame the driver.
> 1. NTFS5 should be
)
> 1. NTFS5 should be journaling, isn't it?
>
> 2. The checkpoint files are separate from the client_state files. The are also
> written according to your settings, too, so once a minute per default.
>
> 3. The temp files can grow large - several MBs. I think for most users it's
> not a good idea to keep and write two copies of them. Most would prefer to
> deal with BOINC like with any other program - prevent the machine from
> crashing, as it may trash work. At least with BOINC it's not you own work you
> lose, just a bit of CPU time.
>
> BM
>
Win98SE wouldn't be running NTFS of any variety, but FAT32 as I recall. I'm not aware of FAT32 having journalling capabilities, but please correct me if I'm wrong. Anyway, the real fix is to prevent the crashing in the first place, which alleviates the need to come up with workarounds for the current checkpointing scheme.
> Quite a lot of people
)
> Quite a lot of people (literally thousands by now) are running E@H without
> problems, so it's unlikely that BOINC/E@H causes the crashes by itself. It
> may, however, trigger other problems you have on your system and remained
> unnoticed before. Frequent issues include the graphics driver (E@H makes much
> more use of OpenGL than most other programs) and, as it is mainly CPU-bound,
> overheating / cooling problems.
Thanks. I'm investigating some possible causes; one of which may be an interaction with the Java 2 runtime environment.
Does einstein save all the checkpoint files when a normal exit is performed? On restart it continues with almost no progress loss, hopefully if another crash occurs it will resume from the last exit state.
--
searching for gravitational waves since 2005
> Does einstein save all the
)
> Does einstein save all the checkpoint files when a normal exit is performed?
Sure. That's what checkpointing is for.
BM
BM