CPU tasks error out after 12 seconds.

halfempty

Joined: 3 Apr 20

Posts: 14

Credit: 37595576

RAC: 0

20 Dec 2020 16:34:22 UTC

Topic 224263

(moderation:

)

In addition to the congestion issues everybody has been having, I can't seem to get any CPU tasks because all of mine started to error out. The error message seems to be:

"The name limit for the local computer network adapter card was exceeded.

 (0x44) - exit code 68 (0x44)"

This has me totally confused. Would appreciate any suggestions.

Here's a link to the error task list:

https://einsteinathome.org/host/12820614/tasks/6/0

Richard de Lhorbe

Joined: 15 Dec 05

Posts: 46

Credit: 9512991622

RAC: 1054520

I am getting similar

20 Dec 2020 18:26:32 UTC

Message 181778

(moderation:

)

I am getting similar problems with all CPU tasks for Gamma Ray Pulsar search now failing after about 12 seconds, but with a different error message than the original poster ..... a partial cut-and-paste here

13:56:56 (23269): [debug]: Set up communication with graphics process.
Line 1 in inputfile ../../projects/einstein.phys.uwm.edu/JPLEPH.405 seems to be damaged.
13:56:56 (23269): [CRITICAL]: ERROR: MAIN() returned with error '4'

Of course I now can’t get any more WUs due to not being able to upload anything .... but, I have confidence this will gradually work itself out as it always does ....

San-Fernando-Valley

Joined: 16 Mar 16

Posts: 401

Credit: 10144893455

RAC: 25825681

He has the same error as you,

20 Dec 2020 18:39:59 UTC

Message 181779 in response to message 181778

(moderation:

)

He has the same error as you, can be seen further down in the ouput. I guess he did not see that.

halfempty

Joined: 3 Apr 20

Posts: 14

Credit: 37595576

RAC: 0

You're right, same error

20 Dec 2020 20:47:06 UTC

Message 181784 in response to message 181779

(moderation:

)

You're right, same error further down. Guess I just have to wait for them to work it out. Thanks.

Gary Roberts

Moderator

Joined: 9 Feb 05

Posts: 5872

Credit: 117540030138

RAC: 35315738

halfempty wrote:"The name

20 Dec 2020 21:58:00 UTC

Message 181786

(moderation:

)

halfempty wrote:

"The name limit for the local computer network adapter card was exceeded."

Stupidly, Windows intervenes and uses the error code which is specific to the app as if it were a Windows error code - which it's not. You need to look elsewhere for the real problem.

Pick any one of your failed tasks and click on its Task ID link. Scroll down and look through what was returned to the project. In this case it isLine 1 in inputfile ../../projects/einstein.phys.uwm.edu/JPLEPH.405 seems to be damaged.The nearby lines give some context.

The file JPLEPH.405 appears to be the problem so that's the first thing to investigate. It's a static data file needed for all GRP tasks which is why they all quickly fail if the file is corrupt. That there is another person reporting the same issue is a concern. However, the best you can do is to see if your copy is corrupt in some way.

I've experienced something similar in the distant past. The first thing I used to do was replace the file. I would rename it to JPLEPH.BAD (so it remained covering the same disk sectors) and replace it with a fresh copy from another machine (or download it afresh). That seemed to work - for a while - but the problem returned. Eventually, by running a memory testing app, I found one of the RAM sticks had a bad location. Replacing that stick permanently fixed the problem.

The three things I would try are, (1) replace file with a fresh copy, (2) check your disk for bad sectors, (3) test your RAM. If more people start reporting problems with the same file, maybe it will be something else.

Cheers,
Gary.

mohavewolfpup

Joined: 8 Mar 20

Posts: 9

Credit: 5768052

RAC: 0

I'm up to 69 failed tasks and

20 Dec 2020 23:05:46 UTC

Message 181788

(moderation:

)

I'm up to 69 failed tasks and counting, so killing the client until it is fixed least I get banned.

<core_client_version>7.16.11</core_client_version>
<![CDATA[
<message>
The name limit for the local computer network adapter card was exceeded.
 (0x44) - exit code 68 (0x44)</message>
<stderr_txt>
02:45:19 (6924): [normal]: This Einstein@home App was built at: Jul 26 2017 09:32:43

02:45:19 (6924): [normal]: Start of BOINC application 'projects/einstein.phys.uwm.edu/hsgamma_FGRP5_1.08_windows_intelx86__FGRPSSE.exe'.
02:45:19 (6924): [debug]: 2.1e+015 fp, 4.2e+009 fp/s, 495478 s, 137h37m58s42
command line: projects/einstein.phys.uwm.edu/hsgamma_FGRP5_1.08_windows_intelx86__FGRPSSE.exe --inputfile ../../projects/einstein.phys.uwm.edu/JPLEPH.405 --alpha 2.1039176188 --delta -0.9808959836 --skyRadius 0.001361356817 --ldiBins 15 --f0start 1080 --f0Band 16 --firstSkyPoint 586670 --numSkyPoints 58 --f1dot -1.0e-13 --f1dotBand 1.0e-13 --df1dot 1.344493449e-15 --ephemdir ..\..\projects\einstein.phys.uwm.edu\JPLEPH --Tcoh 4194304.0 --toplist 10 --cohFollow 10 --numCells 1 --useWeights 1 --Srefinement 1 --CohSkyRef 1 --cohfullskybox 1 --mmfu 0.15 --reftime 56757.0 --f0orbit 0.005 --freeRadiusFactor 2 --mismatch 0.15 --debug 0 -o LATeah1075F_1096.0_586670_0.0_2_0.out
output files: 'LATeah1075F_1096.0_586670_0.0_2_0.out' '../../projects/einstein.phys.uwm.edu/LATeah1075F_1096.0_586670_0.0_2_0' 'LATeah1075F_1096.0_586670_0.0_2_0.out.cohfu' '../../projects/einstein.phys.uwm.edu/LATeah1075F_1096.0_586670_0.0_2_1'
02:45:19 (6924): [debug]: Flags: i386 SSE GNUC X86 GNUX86
02:45:19 (6924): [debug]: Set up communication with graphics process.
Line 1 in inputfile ../../projects/einstein.phys.uwm.edu/JPLEPH.405 seems to be damaged.
02:45:19 (6924): [CRITICAL]: ERROR: MAIN() returned with error '4'
FPU status flags: PRECISION
02:45:30 (6924): [normal]: done. calling boinc_finish(68).
02:45:30 (6924): called boinc_finish

</stderr_txt>
]]>

halfempty

Joined: 3 Apr 20

Posts: 14

Credit: 37595576

RAC: 0

Thanks for the suggestions.

20 Dec 2020 23:58:40 UTC

Message 181790 in response to message 181786

(moderation:

)

Thanks for the suggestions. I'm at work right now, but I'll play around with it when I get home.

By the other people having the same problem I'm thinking it could be a server side update gone awry. I'll see what the time stamp on the file is before I do anything.

Gary Roberts

Moderator

Joined: 9 Feb 05

Posts: 5872

Credit: 117540030138

RAC: 35315738

halfempty wrote:By the other

21 Dec 2020 1:00:15 UTC

Message 181791 in response to message 181790

(moderation:

)

halfempty wrote:

By the other people having the same problem I'm thinking it could be a server side update gone awry.

Yes, I tend to agree that it's server-side or while downloading.

My understanding is that this file (certainly the same name) gets used for both CPU and GPU tasks. I'm not seeing the problem for GPUs but I'm also not getting a new copy so I don't think it's an updated file being sent to everyone.

Perhaps it's just those who ask for the file because they don't have it, ie. just joined this search or starting up a new machine. Maybe some sort of corruption is happening during the download in which case replacing with a known good copy from another machine might work for now.

Cheers,
Gary.

Wedge009

Joined: 5 Mar 05

Posts: 122

Credit: 17378414015

RAC: 7131121

I only found this report just

21 Dec 2020 3:35:59 UTC

Message 181794

(moderation:

)

I only found this report just now (since previously I was only checking Technical News and Cruncher's Corner), but I have observed this problem since about 19:24 UTC on 19th December. It looks like there's something wrong with tasks with ID starting with LATeah1075F.

For me the problem is across multiple hosts, and only for CPU FGRP5 work units. I'm still getting repeat work units (eg ending in _5 indicating the sixth attempt) so I'm quite certain it's the units themselves that are bad, not necessarily the machines processing them or the downloaded data.

Edit: Checking one host's copy of JPLEPH.405, it is dated as 2020-05-27, more than half a year ago. So quite strange that this problem only manifests now.

Edit: Forced a re-download of JPLEPH.405 at 2020-12-21 03:19 UTC - confirmed there are still FGRP5 CPU tasks that are failing with the same error message. I can only conclude thus far that there is something really bad with this batch of work units.

Soli Deo Gloria

Eugene Stemple

Joined: 9 Feb 11

Posts: 67

Credit: 374093942

RAC: 538680

I'm seeing some of these

21 Dec 2020 5:52:24 UTC

Message 181798

(moderation:

)

I'm seeing some of these also. In my account error tasks there are six instances of LATeah1075F with computation error after 13 seconds. Exit code 68. The stdout log file has them tagged with "output file absent." These are all CPU tasks, via v1.08 FGRPSSE (Linux). However, not all work units of that LATeah1075F series are failing, at least not recently. I do see successful work units from December 18 and earlier that completed and validated. Their (CPU) run times are on the order of 10,000 seconds. Browsing backwards through the stdoutdae.txt log it's the last six that failed. They have work unit IDs - after the LATeah1075F common initial string - of 1096.0_220748 / 1096.0_303978 / 1064.0_1143644 / 1080.0_358000 / 1096.0_1261558 / 1096.0_850792 . All the earlier work units (that finished normally) had IDs of 920.0 / 872.0 / 856.0 / etc. all LESS THAN 1000. Probably a coincidence that the 1000 threshold is a boundary between good and bad... but odd anyway. I don't seem to have any of these in my cache, so nothing to monitor more closely.

:^) maybe it's the Jupiter/Saturn conjunction messing things up...

Bernd Machenschalk

Moderator

Administrator

Joined: 15 Oct 04

Posts: 4312

Credit: 250434360

RAC: 34952

Nope, I'm afraid that was us.

21 Dec 2020 6:51:26 UTC

Message 181799

(moderation:

)

Nope, I'm afraid that was us. The idea was to move the FGRP5 workunit generator away from the overloaded upload server, but something went wrong there unnoticed. Sorry for that.

CPU tasks error out after 12 seconds.

Forums › Problems and Bug Reports

Comment viewing options

Forums › Problems and Bug Reports