High frequency of client errors

David Worton
David Worton
Joined: 22 Feb 05
Posts: 20
Credit: 45824
RAC: 0
Topic 188177

I've currently processed eleven complete work units and of these all but the first one are shown as "client errors" with no credit awarded. I'm using BOINC 4.24 and I'm splitting time with all the other open BOINC projects. I'm beginning to wonder if there's some problem with my set up for Einstein@home. Am I just very unlucky or is there a deeper problem? What is likely to cause the "client errrors" and is there anything I cn do about it?

Jord
Joined: 26 Jan 05
Posts: 2952
Credit: 5779100
RAC: 0

High frequency of client errors

Have you tried using BOINC CC 4.19, to see if that client also brings up errors? If it doesn't, it may well be an error in 4.24

Bruce Allen
Bruce Allen
Moderator
Joined: 15 Oct 04
Posts: 1119
Credit: 172127663
RAC: 0

> I've currently processed

> I've currently processed eleven complete work units and of these all but the
> first one are shown as "client errors" with no credit awarded. I'm using BOINC
> 4.24 and I'm splitting time with all the other open BOINC projects. I'm
> beginning to wonder if there's some problem with my set up for Einstein@home.
> Am I just very unlucky or is there a deeper problem? What is likely to cause
> the "client errrors" and is there anything I cn do about it?

If you look at the results on the web, you'll see that stderr_out is showing repeated access violations at a particular address in memory. This is not an address within the Einstein@Home application. So it may be related to a graphics driver misbehaving, bad memory in your system, or some other similar system problem. Any idea what was different for the successful workunit?

Cheers,
Bruce

Director, Einstein@Home

Carl Johansson
Carl Johansson
Joined: 24 Feb 05
Posts: 6
Credit: 27501
RAC: 0

You are using version 4.24 ov

You are using version 4.24 ov BOINC wich is a beta version. This perhaps is a possible reason to your problem. Have you tried to use the recommended version 4.19 to see if you get better results? Are you using any special hardware, i.e. overclocked processors, unusual memory componetnts or anything that might be incompatible with the software? Do you have any sugestions by your self of what might cause the problem?

Walt Gribben
Walt Gribben
Joined: 20 Feb 05
Posts: 219
Credit: 1645393
RAC: 0

> If you look at the results

Message 6521 in response to message 6519

> If you look at the results on the web, you'll see that stderr_out is showing
> repeated access violations at a particular address in memory. This is not an
> address within the Einstein@Home application. So it may be related to a
> graphics driver misbehaving, bad memory in your system, or some other similar
> system problem. Any idea what was different for the successful workunit?
>
> Cheers,
> Bruce
>

David, whats needed is the module at 0x77F580DB. On my system thats ntdll.dll, but that might be different in your system.

Grab Process Explorer from the System Internals web site and run it when Einstein@home is running.

In the top window, click on the einstein_4.79_windows_intelx86.exe line and check the DLL's shown in the bottom window. If it doesn't show DLL's, click "view", "lower pane view" and "DLLs" so it does.

Sort the DLL's in base address order by clicking on the "Base" heading and look thru the list for the DLL that starts right before 0x77F580DB.

Reply back here with the details on that line: the name, description, company name, version, base and size. If you don't see all those columns, right click the header, click "select columns", switch to "DLL" and add the missing columns.

Walt

David Worton
David Worton
Joined: 22 Feb 05
Posts: 20
Credit: 45824
RAC: 0

Thanks for all that

Thanks for all that information. I suspect that BOINC 4.24 may be the guilty party. I moved on to that version of BOINC on Feb 24th just after processing my 1st successful work unit so the circumstantial evidence looks strong against it. Probably I've jumped on to the Beta too early. For what it may be worth to anyone by way of information I've downloaded Process Explorer as suggested. This is a neat utility by the way, I wish I'd known about it a long time ago!

I used the suspend project facility of BOINC 4.24 to halt all my other BOINC projects and force Einstein@Home to run.

It appears that ntdll.dll is the DLL immediately below the exception (starts at 0x77F50000)

The following is extracted from the Process Explorer saved log with all DLL column information selected :-

ntdll.dll
NT Layer DLL
Microsoft Corporation
5.01.2600.1217
0x77F50000
0xA7000
01/05/2003 23:56
C:\WINDOWS\system32\ntdll.dll
0x77F50000

Not sure where to go from here but maybe it should be BOINC 4.19 for a while...

Walt Gribben
Walt Gribben
Joined: 20 Feb 05
Posts: 219
Credit: 1645393
RAC: 0

> Thanks for all that

Message 6523 in response to message 6522

> Thanks for all that information. I suspect that BOINC 4.24 may be the guilty
> party. I moved on to that version of BOINC on Feb 24th just after processing
> my 1st successful work unit so the circumstantial evidence looks strong
> against it. Probably I've jumped on to the Beta too early. For what it may be
> worth to anyone by way of information I've downloaded Process Explorer as
> suggested. This is a neat utility by the way, I wish I'd known about it a long
> time ago!
>
> I used the suspend project facility of BOINC 4.24 to halt all my other BOINC
> projects and force Einstein@Home to run.
>
> It appears that ntdll.dll is the DLL immediately below the exception (starts
> at 0x77F50000)
>
> The following is extracted from the Process Explorer saved log with all DLL
> column information selected :-
>
> ntdll.dll
> NT Layer DLL
> Microsoft Corporation
> 5.01.2600.1217
> 0x77F50000
> 0xA7000
> 01/05/2003 23:56
> C:WINDOWSsystem32ntdll.dll
> 0x77F50000
>
> Not sure where to go from here but maybe it should be BOINC 4.19 for a
> while...

Thanks for the info, it helps. Also, its the Einstein@home application that gets the error, not BOINC. BOINC just reports it.

It would be great if the Einstein@home application ran the stackwalker automatically, as it would trace the calls from the application up thru the bad call to ntdll.dll (the app doesn't make any calls to ntdll.dll directly). It should be part of the handling exception, but for some reason its turned off. It does intercept the 0xc0000005 exception, as you see the error message in the result output.

Maybe the developers can reply with a method for turning stackwalker back on.

Meanwhile I'll look up the NTDLL routine - it'll take a while, don't have a SP1 machine handy.

Bruce Allen
Bruce Allen
Moderator
Joined: 15 Oct 04
Posts: 1119
Credit: 172127663
RAC: 0

> > Thanks for all that

Message 6524 in response to message 6523

> > Thanks for all that information. I suspect that BOINC 4.24 may be the
> guilty
> > party. I moved on to that version of BOINC on Feb 24th just after
> processing
> > my 1st successful work unit so the circumstantial evidence looks strong
> > against it. Probably I've jumped on to the Beta too early. For what it
> may be
> > worth to anyone by way of information I've downloaded Process Explorer
> as
> > suggested. This is a neat utility by the way, I wish I'd known about it a
> long
> > time ago!
> >
> > I used the suspend project facility of BOINC 4.24 to halt all my other
> BOINC
> > projects and force Einstein@Home to run.
> >
> > It appears that ntdll.dll is the DLL immediately below the exception
> (starts
> > at 0x77F50000)
> >
> > The following is extracted from the Process Explorer saved log with all
> DLL
> > column information selected :-
> >
> > ntdll.dll
> > NT Layer DLL
> > Microsoft Corporation
> > 5.01.2600.1217
> > 0x77F50000
> > 0xA7000
> > 01/05/2003 23:56
> > C:WINDOWSsystem32ntdll.dll
> > 0x77F50000
> >
> > Not sure where to go from here but maybe it should be BOINC 4.19 for a
> > while...
>
> Thanks for the info, it helps. Also, its the Einstein@home application that
> gets the error, not BOINC. BOINC just reports it.
>
> It would be great if the Einstein@home application ran the stackwalker
> automatically, as it would trace the calls from the application up thru the
> bad call to ntdll.dll (the app doesn't make any calls to ntdll.dll directly).
> It should be part of the handling exception, but for some reason its turned
> off. It does intercept the 0xc0000005 exception, as you see the error
> message in the result output.
>
> Maybe the developers can reply with a method for turning stackwalker back on.

Walt, we DO call stackwalker! And we distribute the .pdb file with the app as well. So I'm confused why we don't have a stack trace. Any ideas?

Cheers,
Bruce

Director, Einstein@Home

David Worton
David Worton
Joined: 22 Feb 05
Posts: 20
Credit: 45824
RAC: 0

It sounds as if BIONC isn't

It sounds as if BIONC isn't at fault, then (and especially as on examining the log for the 1st and only successful unit I've processed I see that it reports that I did process that with 4.24 too). So I might as well stick with 4.24.

I've looked at the earlier error logs and I notice that at least one of them has the message:-

"0: Stackwalker not initialized (or was not able to initialize)!"

This doesn't occur on all of the logs.

The machine isn't overclocked. It's an "out of the box" corporate hp/compaq piece of kit without any particularly unusual hardware (i.e. graphics cards etc..) that I'm aware of.

Walt Gribben
Walt Gribben
Joined: 20 Feb 05
Posts: 219
Credit: 1645393
RAC: 0

> It sounds as if BIONC isn't

Message 6526 in response to message 6525

> It sounds as if BIONC isn't at fault, then (and especially as on examining the
> log for the 1st and only successful unit I've processed I see that it reports
> that I did process that with 4.24 too). So I might as well stick with 4.24.
>
> I've looked at the earlier error logs and I notice that at least one of them
> has the message:-
>
> "0: Stackwalker not initialized (or was not able to initialize)!"
>
> This doesn't occur on all of the logs.
>
> The machine isn't overclocked. It's an "out of the box" corporate hp/compaq
> piece of kit without any particularly unusual hardware (i.e. graphics cards
> etc..) that I'm aware of.
>

David, what you can do for now is turn graphics off. That is, making sure the screensaver is not set to BOINC (blank is good), the other is to make sure you don't use "show graphics".

Try that and see if it works.

Looking at your results shows one WU completed without error, the bottom one in the list. Did you change something on your system around then? Install/update software, add another project to BOINC (other than Einstein@home), change users?

And if you check BOINCs log, does it show anything like other applications running or switching to/from another app around the time the Einstein app fails?

David Worton
David Worton
Joined: 22 Feb 05
Posts: 20
Credit: 45824
RAC: 0

OK, thanks Walt. I've stopped

OK, thanks Walt. I've stopped the BOINC screen saver and I won't use the graphics. I can't recall anything I installed around the time of the first failure, but it's possible that I may have taken a windows upgrade. I do this fairly frequently (although I've not moved onto SP2 on XP with this machine as it is company policy not to upgrade to SP2 at the moment)

I've been connected to other BOINC projects for a while (SETI, CPDN, Predictor and LHC). Einstein@Home was the last one I added and apart from the upgrade to the BOINC manager and connecting to Einstein@Home I didn't change settings or users on any of those projects.

Unfortunately, 'cos I didn't spot this problem for a little while, I've not got too much message log information. The only example I've got where there's an unrecoverable error logged for Einstein@Home follows on from a message that the result is being paused and then one that protein predictor is starting. There is also a log message of protein predictor requesting work within the same second that the failure occurs. I suppose this may be significant or just a coincidence. I think I'll now let things run for a while as suggested but pay closer attention to the log and try to see if a pattern emerges from future failures should they occur.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.