Unrecoverable Error...

LEX LETHAL
LEX LETHAL
Joined: 22 Jan 05
Posts: 4
Credit: 0
RAC: 0
Topic 187791

These are the messages I got:

1. Einstein@Home - 2005-02-20 17:42:15 - Unrecoverable error for result H1_0806.9__0807.3_0.1_T03_Test02_0 (CreateProcess() failed - The process cannot access the file because it is being used by another process. (0x20))

2. Einstein@Home - 2005-02-20 17:42:15 - CreateProcess() failed - The process cannot access the file because it is being used by another process. (0x20)

3. Einstein@Home - 2005-02-20 17:42:15 - Deferring communication with project for 1 minutes and 0 seconds

4. Einstein@Home - 2005-02-20 17:42:15 - Computation for result H1_0806.9__0807.3_0.1_T03_Test02 finished

There is only one project for E@H running. Was the project deleted? It's not under the WORK tab or the TRANSFERS tab. It looks like it's gone. Since the time of the attempted file transfer, E@H has been repeatedly asking for more work and not getting it:

5. Einstein@Home - 2005-02-20 17:45:34 - Message from server: No work available (daily quota exceeded)

LEX

Bruce Allen
Bruce Allen
Moderator
Joined: 15 Oct 04
Posts: 1119
Credit: 172127663
RAC: 0

Unrecoverable Error...

> These are the messages I got:
>
> 1. Einstein@Home - 2005-02-20 17:42:15 - Unrecoverable error for result
> H1_0806.9__0807.3_0.1_T03_Test02_0 (CreateProcess() failed - The process
> cannot access the file because it is being used by another process. (0x20))
>
> 2. Einstein@Home - 2005-02-20 17:42:15 - CreateProcess() failed - The process
> cannot access the file because it is being used by another process. (0x20)
>
> 3. Einstein@Home - 2005-02-20 17:42:15 - Deferring communication with project
> for 1 minutes and 0 seconds
>
> 4. Einstein@Home - 2005-02-20 17:42:15 - Computation for result
> H1_0806.9__0807.3_0.1_T03_Test02 finished

Lex, this is a known problem with BOINC. The next version of the BOINC core client should incorporate a fix for it, although the problem itself is *not* well understood.

> There is only one project for E@H running. Was the project deleted? It's not
> under the WORK tab or the TRANSFERS tab. It looks like it's gone. Since the
> time of the attempted file transfer, E@H has been repeatedly asking for more
> work and not getting it:
>
> 5. Einstein@Home - 2005-02-20 17:45:34 - Message from server: No work
> available (daily quota exceeded)

Your system has (unfortunately) generated errors for all the WU that it downloaded. Please wait a day and you'll get some more work. Hopefully this error won't recurr.

Is there anything odd about your system? Are you using anti-virus software? If so, what type?

Cheers,
Bruce

Director, Einstein@Home

LEX LETHAL
LEX LETHAL
Joined: 22 Jan 05
Posts: 4
Credit: 0
RAC: 0

Thanks, Bruce. Here's my

Thanks, Bruce. Here's my system spec:

Win2k, SP4
Intel P4 1.5
Memory: 1,048,052
Available: 510,456
System Cache: 648,040
Total Virtual: 3,569,596
Available: 2,576,168
Page File: 2,521,544
Kernal Total Memory: 109,548
Paged: 75,676
Nonpaged: 33,864
Processes: 44
CPU Usage: 100%
Memory Usage: 450,020 / 2,521,544
Kerio Firewall
AVG 7.0 Pro Anti-virus
DSL Connection

My system is stable.

LEX

Walt Gribben
Walt Gribben
Joined: 20 Feb 05
Posts: 219
Credit: 1645393
RAC: 0

> Thanks, Bruce. Here's my

Message 3879 in response to message 3878

> Thanks, Bruce. Here's my system spec:
>
> Win2k, SP4
> Intel P4 1.5
> Memory: 1,048,052
> Available: 510,456
> System Cache: 648,040
> Total Virtual: 3,569,596
> Available: 2,576,168
> Page File: 2,521,544
> Kernal Total Memory: 109,548
> Paged: 75,676
> Nonpaged: 33,864
> Processes: 44
> CPU Usage: 100%
> Memory Usage: 450,020 / 2,521,544
> Kerio Firewall
> AVG 7.0 Pro Anti-virus
> DSL Connection
>
> My system is stable.
>
> LEX

Is the file indexing service running? I think thats the default in Win2k. You can either stop it or change the properties for the BOINC folder to not index it - its part of the advanced properties. When you change the option, select "applies to this folder, subfolders and files".

Undelete programs also hold on to "deleted" files while they get renamed and moved the the "undelete" bin.

If this happens regularly, get FileMon from System Internals. Set it to trace file activity for the einstein project and run directories by setting the filter to: "einstein*;slots*" (without the quotes) - you get all the file activity for setting up and running each WU. And change the options to set "advanced output" and "show milliseconds".

Bruce Allen
Bruce Allen
Moderator
Joined: 15 Oct 04
Posts: 1119
Credit: 172127663
RAC: 0

> Is the file indexing

Message 3880 in response to message 3879

> Is the file indexing service running? I think thats the default in Win2k.
> You can either stop it or change the properties for the BOINC folder to not
> index it - its part of the advanced properties. When you change the option,
> select "applies to this folder, subfolders and files".
>
> Undelete programs also hold on to "deleted" files while they get renamed and
> moved the the "undelete" bin.
>
> If this happens regularly, get href="https://einsteinathome.org/%3Ca%20href%3D"http://www.sysinternals.com/ntw2k/source/filemon.shtml">http://www.sysinternals.com/ntw2k/source/filemon.shtml">FileMon[/url] from
> System Internals. Set it to trace file activity for the einstein project and
> run directories by settingto he filter to: "einstein*;slots*" (without the
> quotes) - you get all the file activity for setting up and running each WU.
> And change the options to set "advanced output" and "show milliseconds".

Walter, would the 'file indexing service running' explain these 'CreateProcess()' failures? Our guess was that this was some virus scanning program that had locked the executable or otherwise made it (at least temporarily) unusuable. Our solution was just to retry CreateProcess() a few times with short random sleep(random) in between. If you can provide some theory or explaination for the CreateProcess() failures that would be very helpful.

Bruce

Director, Einstein@Home

Walt Gribben
Walt Gribben
Joined: 20 Feb 05
Posts: 219
Credit: 1645393
RAC: 0

> > > Walter, would the

Message 3881 in response to message 3880

>
>
> Walter, would the 'file indexing service running' explain these
> 'CreateProcess()' failures? Our guess was that this was some virus scanning
> program that had locked the executable or otherwise made it (at least
> temporarily) unusuable. Our solution was just to retry CreateProcess() a few
> times with short random sleep(random) in between. If you can provide some
> theory or explaination for the CreateProcess() failures that would be very
> helpful.
>
> Bruce

It might. Thats why I suggested running FileMon - the trace shows who does what to each file. Its not the indexing service by itself, it appear to be the indexing service along with something else that also intercepts filesystem calls.

I looked into a similar problem with create_file.xml, where writing a new one didn't work because the old one was still there. Even though a delete file call was made, the file wasn't actually deleted until a few milliseconds later. More detail is in one of the BOINC forums for this problem: Couldn't Write State file: -109.

In the case of create_file.xml, the following happens - from the programs view:

-write client state to client_state_next.xml
-delete client_state_prev.xml
-rename client_state.xml to client_state_prev.xml
-rename client_state_next.xml to client_state.xml

With the indexing service active, trace showed:

-Program wrote client_state_next.xml
-program deleted client_state_prev.xml
-system intercepted the call and returned success to the program. But the file was not deleted at this time.
-Program renamed client_state.xml to client_state_prev.xml. This failed with a "new name exists" or something like that.
-system finished deleting the client_state_prev.xml file.

With the indexing service inactive, the trace showed what was expected - the call to "delete file" completed before the rename occured.

And by "system", I don't mean Windows doing system level calls on BOINC's behalf, I mean that the call is intercepted by another process - system process ID 4 - and new filesystem operations performed on that file before it completes the "delete". On my system the interceped calls are still performed properly, but I don't have virus scanners intercepting everything either. Its apparent from the differences in the two traces (mine and the one with the problem) that the calls are intercepted twice - once by the indexing service and most likey the second time by the virus scanner.

Suggestion for getting traces and interpreting them:

Get FileMon from the System Internals site, install it and set filtering as in my past message. When you get the problem, save the FileMon trace and make a note of the timestamps for the problem.

Disable the indexing service and the virus scanner. Or run the trace on another system that doesn't have any of those services running, and no undelete, auto backup, add-blockers or anything like that. Run Filemon again to see what "normal" file operations look like.

Bruce Allen
Bruce Allen
Moderator
Joined: 15 Oct 04
Posts: 1119
Credit: 172127663
RAC: 0

> > > > > > Walter, would

Message 3882 in response to message 3881

> >
> >
> > Walter, would the 'file indexing service running' explain these
> > 'CreateProcess()' failures? Our guess was that this was some virus
> scanning
> > program that had locked the executable or otherwise made it (at least
> > temporarily) unusuable. Our solution was just to retry CreateProcess() a
> few
> > times with short random sleep(random) in between. If you can provide
> some
> > theory or explaination for the CreateProcess() failures that would be
> very
> > helpful.
> >
> > Bruce
>
> It might. Thats why I suggested running FileMon - the trace shows who does
> what to each file. Its not the indexing service by itself, it appear to be
> the indexing service along with something else that also intercepts filesystem
> calls.
>
> I looked into a similar problem with create_file.xml, where writing a new one
> didn't work because the old one was still there. Even though a delete file
> call was made, the file wasn't actually deleted until a few milliseconds
> later. More detail is in one of the BOINC forums for this problem: href="https://einsteinathome.org/%3Ca%20href%3D"http://setiweb.ssl.berkeley.edu/forum_thread.php?id=10873">http://setiweb.ssl.berkeley.edu/forum_thread.php?id=10873">Couldn't
> Write State file: -109[/url].
>
> In the case of create_file.xml, the following happens - from the programs
> view:
>
> -write client state to client_state_next.xml
> -delete client_state_prev.xml
> -rename client_state.xml to client_state_prev.xml
> -rename client_state_next.xml to client_state.xml
>
> With the indexing service active, trace showed:
>
> -Program wrote client_state_next.xml
> -program deleted client_state_prev.xml
> -system intercepted the call and returned success to the program. But the
> file was not deleted at this time.
> -Program renamed client_state.xml to client_state_prev.xml. This failed with
> a "new name exists" or something like that.
> -system finished deleting the client_state_prev.xml file.
>
> With the indexing service inactive, the trace showed what was expected - the
> call to "delete file" completed before the rename occured.
>
> And by "system", I don't mean Windows doing system level calls on BOINC's
> behalf, I mean that the call is intercepted by another process - system
> process ID 4 - and new filesystem operations performed on that file before it
> completes the "delete". On my system the interceped calls are still performed
> properly, but I don't have virus scanners intercepting everything either. Its
> apparent from the differences in the two traces (mine and the one with the
> problem) that the calls are intercepted twice - once by the indexing service
> and most likey the second time by the virus scanner.
>
> Suggestion for getting traces and interpreting them:
>
> Get FileMon from the System Internals site, install it and set filtering as in
> my past message. When you get the problem, save the FileMon trace and make a
> note of the timestamps for the problem.
>
> Disable the indexing service and the virus scanner. Or run the trace on
> another system that doesn't have any of those services running, and no
> undelete, auto backup, add-blockers or anything like that. Run Filemon again
> to see what "normal" file operations look like.

Walter, thank you for the suggestions. I am hoping that we can reproduce these problems or alternatively have a user who sees these errors do some detective work as you describe.

Cheers,
Bruce

Director, Einstein@Home

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.