Linux CUDA validation errors

M. Schmitt
M. Schmitt
Joined: 27 Jun 05
Posts: 478
Credit: 15872262
RAC: 0

RE: RE: The switch should

Quote:
Quote:
The switch should not be the problem, cause Win 7 crunches the BRP tasks without any error, or do you think Windows uses a different package size?

This is my suspicion, judging from the symptoms.

Something else to try would be to connect the computers directly, without a switch, just a cable.

BM

I had tried a direct connection to the Internet router(Fritzbox) with the DHCP-Server. Network activity was suspended before and again 50% of all WU's were invalid. I have a crossover cable that I will try and also I can again try a direct connect to the Fritzbox.
Anyway I did not get any errors doing work for Milkyway, Primegrid and GPUGRID.
I still get the 'transient upload errors' with h1 and BUCKET tasks, but they have been all valid.
So what is the difference to the BRP tasks?

Regards,
Michael

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4273
Credit: 245267071
RAC: 12450

RE: I still get the

Quote:
I still get the 'transient upload errors' with h1 and BUCKET tasks, but they have been all valid. So what is the difference to the BRP tasks?

There are a few:

- the files are uploaded to different servers (einstein.phys.uwm.edu and einstein-dl.aei.uni-hannover.de). These servers differ in hard- and software.

- previously two different versions of the file upload handler were running on the two servers (not anymore, though)

- the result of a GW task (S5* or S6Bucket) is a single file of a few hundred kB, while a result of a BRP task consists of four files of a few kB.

I'm not sure how this affects your network problems, though.

Do you have the same trouble with BRP CPU tasks?

BM

BM

M. Schmitt
M. Schmitt
Joined: 27 Jun 05
Posts: 478
Credit: 15872262
RAC: 0

Today I changed the MTU to

Today I changed the MTU to 1492 and there was only one transient upload error.
Yesterday and today:
0 invalid, 7 valid and 11 pending BRP tasks.
If it stays like this I will be happy.

http://einsteinathome.org/host/737731/tasks&offset=0&show_names=1&state=4

I don't run any BRP CPU tasks, cause that is ineffective. If you think it is a good idea I will try. In general these are to few results to consolidate an opinion.

If I get the first invalid result I will connect the host directly to the Fritzbox. I guess the DHCP Server in there might negotiate the correct MTU size with that box.

If this is working(would surprise me), I will use the Crossover cable to come closer to the solution. The switch was a cheap one from D-Link and better ones are pretty expensive. :(

What I don't understand is that the communication is over TCP and not UDP, so any faulty packet should be asked to send again from the receiving server. Doesn't an 'transient..' error indicate that the server got an invalid packet?

And I probably did not mention before that the Phenom doesn't get any transient upload errors at all.

[Edit] It will need some time to do these checks cause of a lot of other work units.

Stephan Goll
Stephan Goll
Joined: 13 Dec 05
Posts: 25
Credit: 27834196
RAC: 0

RE: Replacing the file

Quote:

Replacing the file upload handlers will take a few hours, but should be finished by 16:00 UTC (18:00 CEST). To avoid further validation errors from upload retries I suggest you suspend your network connection until then.

BM

Whatever you did, Bernd: it's magic. :-)

There's a big difference between
http://einsteinathome.org/host/2069906/tasks
and
http://einsteinathome.org/host/2069906/tasks&offset=20&show_names=0&state=0
... as well as
http://einsteinathome.org/host/702599/tasks
It seems that with the new FUH (nearly?) all errors are gone .. at least for me.
Stephan

M. Schmitt
M. Schmitt
Joined: 27 Jun 05
Posts: 478
Credit: 15872262
RAC: 0

Not even a single invalid

Not even a single invalid task so far. :)
Great job updating the FUH!

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.