Silent Client Errors -- lost over 100 credits -- Now what?

Mike
Mike
Joined: 12 Jan 08
Posts: 6
Credit: 1293131
RAC: 0
Topic 194781

Looking at my stats, I noticed I was not receiving credits for work performed. Digging deeper on the web-site I found a report showing "client errors" with no further information.

Since this discovery I have begun reading the transfer messages in the BIONIC manager log. There is no indication of a 'client error' in those logs. The only indication appears buried in the web-site report, which I have copied below.

I use Bionic Manager 6.10.18 wxWidgets Version 2.8.10 running under Windows XP SP3.

When this happened previously I detached from the project and reloaded bionic and reattached from the project -- loosing even more credits.

As you might imagine it is quite frustrating not only to perform work and not receive credits, but also to not be warned and to not be given a hint about how to resole the problem.

I participate in two other project and this situation has never occurred in those project. Perhaps I would be better off just dedicating all of my resources to those projects.

Thanks for reading.

Mike B

COPY of Web site report.
Task ID
click for details Work unit ID
click for details Sent Time reported
or deadline
explain Server state
explain Outcome
explain Client state
explain CPU time (sec) claimed credit granted credit
162379406 67827956 18 Feb 2010 4:16:08 UTC 4 Mar 2010 4:16:08 UTC In progress --- New --- --- ---
162378918 67827950 18 Feb 2010 4:16:08 UTC 18 Feb 2010 16:01:10 UTC Over Client error Compute error 3,090.91 2.35 ---
162150111 66181163 14 Feb 2010 21:01:06 UTC 16 Feb 2010 16:25:43 UTC Over Client error Compute error 2,003.86 1.34 ---
162097308 67650411 14 Feb 2010 21:01:06 UTC 18 Feb 2010 4:16:08 UTC Over Client error Compute error 37,933.94 42.11 ---
161777865 67389843 12 Feb 2010 11:16:55 UTC 14 Feb 2010 21:01:05 UTC Over Client error Compute error 608.84 0.39 ---
161767284 67382606 12 Feb 2010 11:16:55 UTC 14 Feb 2010 16:41:28 UTC Over Client error Compute error 16,539.82 14.60 ---
161456000 68663728 10 Feb 2010 9:09:52 UTC 12 Feb 2010 20:11:48 UTC Over Client error Compute error 2,444.09 4.11 ---
161332650 68607976 9 Feb 2010 20:09:44 UTC 12 Feb 2010 11:16:55 UTC Over Client error Compute error 3,936.00 3.22 ---
161332637 68607969 9 Feb 2010 20:09:44 UTC 11 Feb 2010 15:42:02 UTC Over Client error Compute error 21,844.01 17.96 ---
160597340 68280174 6 Feb 2010 10:35:52 UTC 9 Feb 2010 20:09:44 UTC Over Client error Compute error 17,518.73 11.13 ---
159715396 67875837 2 Feb 2010 15:21:08 UTC 4 Feb 2010 5:24:11 UTC Over Client error Compute error 3,309.48 2.08 ---
159715391 67875835 2 Feb 2010 15:21:08 UTC 4 Feb 2010 5:19:57 UTC Over Client error Compute error 4,642.84 3.04 ---
159491935 67773020 2 Feb 2010 3:41:58 UTC 2 Feb 2010 15:21:06 UTC Over Client error Compute error 14,853.92 11.32 ---
157492290 66848040 24 Jan 2010 9:12:46 UTC 24 Jan 2010 17:20:44 UTC Over Client error Compute error 3,256.88 2.15 ---
155388630 65871930 14 Jan 2010 2:21:00 UTC 14 Jan 2010 11:59:48 UTC Over Client error Compute error 1,435.83 0.83 ---

Michael Karlinsky
Michael Karlinsky
Joined: 22 Jan 05
Posts: 888
Credit: 23502182
RAC: 0

Silent Client Errors -- lost over 100 credits -- Now what?

Hi Mike,

please unhide your computers or tell us your computer id, so we can have a look.

Or click on one of the task-id links and post the messages there.

Michael

Holmis
Joined: 4 Jan 05
Posts: 1118
Credit: 1055935564
RAC: 0

The error Mike is getting

The error Mike is getting is:

06:01:46 (1624): Can't acquire lockfile (32) - waiting 35s
06:02:22 (1624): Can't acquire lockfile (32) - exiting
06:02:22 (1624): Error: The process cannot access the file because it is being used by another process. (0x20)

and

Exit status -226 (0xffffffffffffff1e)

Here's a link to the affected machine.

Michael Karlinsky
Michael Karlinsky
Joined: 22 Jan 05
Posts: 888
Credit: 23502182
RAC: 0

@Holmis: How did you do

@Holmis: How did you do that?

@Mike: Running an anti-virus tool? If so, try to exclude BOINC directory.

HTH

Michael

Holmis
Joined: 4 Jan 05
Posts: 1118
Credit: 1055935564
RAC: 0

RE: @Holmis: How did you do

Message 97106 in response to message 97105

Quote:

@Holmis: How did you do that?

HTH

Michael

Copied one of the task-IDs in the leftmost column in the initial post and went to my own account -> tasks -> clicked on a task-ID and finally replaced the number at the end of the URL.

Michael Karlinsky
Michael Karlinsky
Joined: 22 Jan 05
Posts: 888
Credit: 23502182
RAC: 0

RE: RE: @Holmis: How did

Message 97107 in response to message 97106

Quote:
Quote:

@Holmis: How did you do that?

HTH

Michael

Copied one of the task-IDs in the leftmost column in the initial post and went to my own account -> tasks -> clicked on a task-ID and finally copied in the number in the URL.

D'oh, that was easy enough.

Michael

Holmis
Joined: 4 Jan 05
Posts: 1118
Credit: 1055935564
RAC: 0

RE: D'oh, that was easy

Message 97108 in response to message 97107

Quote:

D'oh, that was easy enough.

Michael

Your welcome! =)

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4305
Credit: 248473691
RAC: 29784

Throtteling enabled? Please

Throtteling enabled? Please try that (even if no throtteling). We've been hunting this "Can't acquire lockfile" problem for quite a while now.

BM

BM

Mike
Mike
Joined: 12 Jan 08
Posts: 6
Credit: 1293131
RAC: 0

Thanks to each of you that

Thanks to each of you that replied to my post.

To Michael Karlinsky: I am not sure how I would un-hide my computer. However, it looks like that is not necessary since Hilmis figured out how to access my records.

Regarding anti-virus tools. Yes, I use Trend-Micro Internet Security (version 17.1.1365). I allow Boinc full access (specifically boinc.exe, boinccmd and boincmgr). Also, I do not encounter this problem with Rosetta Stone and Seti@home


To Bernd Machenschalk: Yes, I have some processing restrictions. I allow processing only after the computer is idle for 10 minutes and I only allow at most 70% of CPU time (otherwise CPU temp rises to over 50C). I switch between applications every 60 minutes. I rarely power-down so most processing occurs overnight.

I have downloaded the windows application you pointed to and will follow the instructions. At the moment, it is not clear to me what is meant by "please report Tasks finished with this Application (in particular in case of a Client Error) to the server manually." However maybe this will become clearer when I actually run the application.

I will post back here if I have any problems or if this leads to a solution.

Again, thanks for your help.

Mike Behar

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5870
Credit: 115486988890
RAC: 33757652

RE: I am not sure how I

Message 97111 in response to message 97110

Quote:
I am not sure how I would un-hide my computer.


You go to your account page and click on the link for your Einstein preferences. You change it there. The default is for your computers to be shown. This is a good thing if you ever need help from anybody. You must have chosen to hide them at some point. There are no security concerns with showing your hosts on the website as only non-sensitive (but very useful to those trying to help) information is visible. In the end, the choice is yours.

Quote:
Yes, I use Trend-Micro Internet Security (version 17.1.1365).


The problem you have has apparently been caused by some security suites under some conditions. I'm not sure about your particular one. In the light of your next answer, I would think the 70% CPU use might be the more likely cause, though.

Quote:
I allow processing only after the computer is idle for 10 minutes and I only allow at most 70% of CPU time (otherwise CPU temp rises to over 50C).


The science applications which consume the bulk of your CPU cycles run at the lowest priority. In general, they are very good at releasing the CPU when it's needed for something else. As an example, I regularly burn CDs on an old machine (Athlon XP 2000+) and that machine runs E@H full bore. Many years ago, I used to stop BOINC when I wanted to burn a CD. For at least the last 5 years I haven't bothered and I've never had a problem with a bad burn. Of course you may well be doing things that are impacted by BOINC running in the background but please realise that BOINC's idea of an active computer is all based on keyboard and mouse activity. In other words BOINC won't run when you are surfing, word-processing, emailing, etc., but it potentially will run if you have a compute intensive task that doesn't require frequent keyboard/mouse activity. The normal 'office' type tasks that do make extensive use of keyboard/mouse are likely to be the tasks that 'waste' CPU cycles and therefore least need protection. By forcing BOINC to wait for 10 minutes after activity, you are probably guaranteeing that nothing much gets done when you are even only occasionally using your machine. There are now ways to stop BOINC when specified programs are running which may be a better option than just general keyboard/mouse activity.

Your preference choice to limit your CPU use to 70% is probably what is causing all your tasks to error out. As Bernd said, this is something they have been chasing for a while and the exact cause (or combination of causes) is not known. Your willingness to help with this is really appreciated. However it means that you are likely to suffer more of these same errors until something is resolved.

You may be being a little too strict in trying to limit temps to less than 50C. In my experience even 60C or so is quite OK. I have a machine (Northwood P4) that has operated continuously at 60C or above for the last 6 years without apparent heat related trouble. I did replace swollen caps on the motherboard and also a dry fan in the PSU but after these repairs the machine runs as good as ever. The other thing to consider is that BOINC throttle function doesn't run the CPU at the % value set. In each 10 second period, your CPU would have run at 100% for a total of 7 seconds and 0% for a total of 3 seconds. Personally I worry that the thermal cycling caused by continuously ramping up and down (0% - 100% - 0% .... ) might have undesirable consequences as well.

Quote:
I have downloaded the windows application you pointed to and will follow the instructions.


This 'Windows application' is just an alpha test version of BOINC that has some new features. To install it (after you have downloaded it) just completely stop your current BOINC, install the new version 'over the top', construct (with a text editor like notepad) the cc_config.xml file with the given contents, make sure it is saved into your BOINC Data folder with the precise name given and then restart BOINC. The new version of BOINC will fire up and will read the contents of cc_config.xml and will configure in the new feature required. The filename (and location) must be precise for BOINC to 'see' the file. The name stands for 'Core Client Configuration' and you can read more about this extra configuration here, in the BOINC Wiki. The new BOINC will continue on with whatever tasks were in progress at the time the old version was stopped.

Quote:
At the moment, it is not clear to me what is meant by "please report Tasks finished with this Application (in particular in case of a Client Error) to the server manually." However maybe this will become clearer when I actually run the application.


When the new BOINC is running, things should go pretty much as before - tasks should still error out. When you see one of these, it may sit on you machine for a while before it actually gets reported. You can speed things up by going to the 'Project' tab of Boinc Manager and highlighting the E@H project. The 'Update' button will then become 'live' and you can click it to return the error result to the project immediately. This is called 'manual reporting'. If you get to this stage of actually reporting a client error result, post a note here and I'm sure there will be great activity to go examine the details :-).

The other thing that was mentioned in the linked message that Bernd provided, was the stopping of BOINC after an error had occurred to see if there were any residual E@H app instances still running after BOINC had stopped. Normally there wouldn't be but if there are it might help in the diagnosis if you also post that fact.

Please ask if anything in the above is not crystal clear and thanks once again for your willingness to assist.

Cheers,
Gary.

nils
nils
Joined: 6 Sep 09
Posts: 2
Credit: 68489
RAC: 0

Hello Mike. Thanks for

Message 97112 in response to message 97110

Hello Mike.

Thanks for reporting this. But to further debug this problem, I kindly ask you to follow the Instructions in the post mentioned by Bernd Machenschalk. Your help would really be appreciated.

Thank you in advance.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.