how often does einstein checkpoint workunits?

Bernd Machenschalk

Moderator

Administrator

Joined: 15 Oct 04

Posts: 4312

Credit: 250720461

RAC: 35768

1. NTFS5 should be

21 Feb 2005 22:57:10 UTC

Message 3970

(moderation:

)

1. NTFS5 should be journaling, isn't it?

2. The checkpoint files are separate from the client_state files. The are also written according to your settings, too, so once a minute per default.

3. The temp files can grow large - several MBs. I think for most users it's not a good idea to keep and write two copies of them. Most would prefer to deal with BOINC like with any other program - prevent the machine from crashing, as it may trash work. At least with BOINC it's not you own work you lose, just a bit of CPU time.

Ned

Joined: 22 Jan 05

Posts: 18

Credit: 24493621

RAC: 0

> 1. NTFS5 should be

22 Feb 2005 4:25:03 UTC

Message 3971 in response to message 3970

(moderation:

)

> 1. NTFS5 should be journaling, isn't it?

Did not know that NTFS5 journaled... But I've had bad experiences with using NTFS as C: drive file system, so I avoid it. I would redirect Einstein to put its files on another drive with NTFS if BOINC allowed it...
>
> 2. The checkpoint files are separate from the client_state files. The are also
> written according to your settings, too, so once a minute per default.

That brings up another challange with windows systems that have "deleted file recovery"... HUNDREDS of old copies of "client_state_prev.xml"... Can you just reuse the alternate instead of deleting the old copy and creating a new one??

>
> 3. The temp files can grow large - several MBs. I think for most users it's
> not a good idea to keep and write two copies of them. Most would prefer to
> deal with BOINC like with any other program - prevent the machine from
> crashing, as it may trash work. At least with BOINC it's not you own work you
> lose, just a bit of CPU time.

Perhaps, but that's what started this thread. Ned
>
> BM
>

Ol' Retired IT Geezer

cIclops

Joined: 19 Feb 05

Posts: 26

Credit: 450

RAC: 0

> > 1. NTFS5 should be

22 Feb 2005 5:19:47 UTC

Message 3972 in response to message 3971

(moderation:

)

> > 1. NTFS5 should be journaling, isn't it?
>
> Did not know that NTFS5 journaled... But I've had bad experiences with using
> NTFS as C: drive file system, so I avoid it. I would redirect Einstein to put
> its files on another drive with NTFS if BOINC allowed it...
> >
> > 2. The checkpoint files are separate from the client_state files. The are
> also
> > written according to your settings, too, so once a minute per default.
>
> That brings up another challange with windows systems that have "deleted file
> recovery"... HUNDREDS of old copies of "client_state_prev.xml"... Can you just
> reuse the alternate instead of deleting the old copy and creating a new one??
>
> >
> > 3. The temp files can grow large - several MBs. I think for most users
> it's
> > not a good idea to keep and write two copies of them. Most would prefer
> to
> > deal with BOINC like with any other program - prevent the machine from
> > crashing, as it may trash work. At least with BOINC it's not you own work
> you
> > lose, just a bit of CPU time.
>
> Perhaps, but that's what started this thread. Ned
> >
> > BM
> >
>

Good news: my system crashed again but this time einstein recovered well and no processing time was lost (as a workunit takes 13 hours on my system losing 6 hours of CPU would be very annoying). from this i infer that the previous failure to resume was a timing problem, with the crash occurring during some critical disk write operation.

Bad news: both crashes locked the system hard, but left the HD running, forcing a power reset. twice may be a coincidence, but this has only ever happened since einstein was installed two days ago.

einstein 4.79
boinc 4.19
Win98SE

--
searching for gravitational waves since 2005

Bernd Machenschalk

Moderator

Administrator

Joined: 15 Oct 04

Posts: 4312

Credit: 250720461

RAC: 35768

Quite a lot of people

22 Feb 2005 11:11:20 UTC

Message 3973

(moderation:

)

Quite a lot of people (literally thousands by now) are running E@H without problems, so it's unlikely that BOINC/E@H causes the crashes by itself. It may, however, trigger other problems you have on your system and remained unnoticed before. Frequent issues include the graphics driver (E@H makes much more use of OpenGL than most other programs) and, as it is mainly CPU-bound, overheating / cooling problems.

genes

Joined: 10 Nov 04

Posts: 41

Credit: 2874299

RAC: 8957

> Quite a lot of people

22 Feb 2005 13:38:40 UTC

Message 3974 in response to message 3973

(moderation:

)

> Quite a lot of people (literally thousands by now) are running E@H without
> problems, so it's unlikely that BOINC/E@H causes the crashes by itself. It
> may, however, trigger other problems you have on your system and remained
> unnoticed before. Frequent issues include the graphics driver (E@H makes much
> more use of OpenGL than most other programs) and, as it is mainly CPU-bound,
> overheating / cooling problems.
>
> BM
>

I agree totally. I have had episodes of crashing in the past which seemed to coincide with a new version of Einstein, but it turned out to be a (relatively new) graphics card that was drawing too much power. I had borrowed it from work, to play with some of its fancy new features, so I just put back the card I had before. Once I swapped that out, everything started running smoothly again. Both were nVidia cards, using the same driver version, so I can't blame the driver.

Mikie Tim T

Joined: 22 Jan 05

Posts: 105

Credit: 263777741

RAC: 0

> 1. NTFS5 should be

22 Feb 2005 15:14:23 UTC

Message 3975 in response to message 3970

(moderation:

)

> 1. NTFS5 should be journaling, isn't it?
>
> 2. The checkpoint files are separate from the client_state files. The are also
> written according to your settings, too, so once a minute per default.
>
> 3. The temp files can grow large - several MBs. I think for most users it's
> not a good idea to keep and write two copies of them. Most would prefer to
> deal with BOINC like with any other program - prevent the machine from
> crashing, as it may trash work. At least with BOINC it's not you own work you
> lose, just a bit of CPU time.
>
> BM
>

Win98SE wouldn't be running NTFS of any variety, but FAT32 as I recall. I'm not aware of FAT32 having journalling capabilities, but please correct me if I'm wrong. Anyway, the real fix is to prevent the crashing in the first place, which alleviates the need to come up with workarounds for the current checkpointing scheme.

cIclops

Joined: 19 Feb 05

Posts: 26

Credit: 450

RAC: 0

> Quite a lot of people

22 Feb 2005 15:33:35 UTC

Message 3976 in response to message 3973

(moderation:

)

Thanks. I'm investigating some possible causes; one of which may be an interaction with the Java 2 runtime environment.

Does einstein save all the checkpoint files when a normal exit is performed? On restart it continues with almost no progress loss, hopefully if another crash occurs it will resume from the last exit state.

--
searching for gravitational waves since 2005

Bernd Machenschalk

Moderator

Administrator

Joined: 15 Oct 04

Posts: 4312

Credit: 250720461

RAC: 35768

> Does einstein save all the

22 Feb 2005 17:21:47 UTC

Message 3977

(moderation:

)

> Does einstein save all the checkpoint files when a normal exit is performed?

Sure. That's what checkpointing is for.

how often does einstein checkpoint workunits?

Forums › Cafe Einstein

Comment viewing options

Forums › Cafe Einstein