Client error on Results

Joe
Joe
Joined: 14 Oct 05
Posts: 3
Credit: 763292
RAC: 0
Topic 191104

Yesterday I noticed I was getting a lot of client errors on my WU's

Everything had been running fine, but when I checked the machines I noticed they had exceeded their limit for the day.

Anyone else having issues or could there be something odd with my machines?

A little info, I have one machine that was running E@H 24/7 http://einsteinathome.org/host/531087

When it started hitting errors I split another machine that is running S@H fully with E@H to see if it would work ok http://einsteinathome.org/host/583917 but it didn't, hit or miss on results.

The odd thing is, both of these machines are hitting errors but the third machine I have on the project is not hitting any errors as of yet. http://einsteinathome.org/host/583955

For the time being I have suspended all E@H work on the main machines and redirected their work to other projects. No sense working on units that will just error.

All three have optimized clients on them, when I noticed there were errors I switched back to the normal client to see if that was the issue. It was not.

Looking back though the logs, this seems to have started teh 17th

Anyone have any ideas?

Michael Karlinsky
Michael Karlinsky
Joined: 22 Jan 05
Posts: 888
Credit: 23502182
RAC: 0

Client error on Results

Quote:
Yesterday I noticed I was getting a lot of client errors on my WU's

Complete error message:

5.2.2
The environment is incorrect. (0xa) - exit code 10 (0xa)

2006-04-18 14:53:40.5720 [normal]: Optimised by akosf (S-39) --> 'projects/einstein.phys.uwm.edu/albert_4.37_windows_intelx86.exe'
2006-04-18 14:53:40.5720 [normal]: Started search at lalDebugLevel = 0
2006-04-18 14:53:42.0095 [normal]: Checkpoint-file 'Fstat.out.ckp' not found.
2006-04-18 14:53:42.0095 [normal]: No usable checkpoint found, starting from beginning.
2006-04-18 15:00:54.6814 [normal]: Fstat file reached MaxFileSizeKB ==> compactifying ...2006-04-18 15:01:00.6814 [CRITICAL]: Couldn't write compacted toplist to '../../projects/einstein.phys.uwm.edu/z1_1395.5__2356_S4R2a_2_0'

Quote:


Looking back though the logs, this seems to have started teh 17th

Anyone have any ideas?

What did you do exactly? It looks like file or directory permissions are messed up.

Michael

Joe
Joe
Joined: 14 Oct 05
Posts: 3
Credit: 763292
RAC: 0

Thats the odd thing, these

Thats the odd thing, these boxes have been running on auto since they started up, with the exception of switching projects when needed.

I saw the "Couldn't write" error and it got me thinking, so I detached E@H which removed the directory, I then waited a while and reattached it. After about 2hrs of CPU time it errors out.

Couldnt be any permission issues, its running on a local account with sys admin rights to the machine.

I was thinking it might need to be rebooted, but that doesnt make sense b/c the other machine that was erroring on was just rebooted.

The only thing I have changed as of late, but it was way before the 17th if I recall was my pref to "keep in memory" But I doubt that would cause this since my 3rd machine is still running fine.

I am at a loss.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.