I was looking through my tasks and I found:

lazlo

Joined: 20 Nov 19

Posts: 9

Credit: 2958620

RAC: 0

11 Dec 2019 8:04:00 UTC

Topic 220186

(moderation:

)

One of my wing men has almost seven times more aborted/timed out WU than completed WU!

https://einsteinathome.org/host/12791233/tasks/0/0

Keith Myers

Joined: 11 Feb 11

Posts: 4962

Credit: 18659782064

RAC: 5632203

Send them a friendly PM that

11 Dec 2019 20:33:39 UTC

Message 174816

(moderation:

)

Send them a friendly PM that their host is not performing correctly. They may not be aware they only produce bad results.

We have a dedicated thread "Invalid host messaging" over at Seti that documents all the bad hosts on the project.

The possible outcomes are that they review their host and correct the problem, ignore the PM or be unreachable because their computers are hidden.

Best scenario is they ask for help in correcting the invalid computer and start producing valid work.

The BOINC mechanism is supposed to automatically throttle the amount of work delivered to a host if it continuously returns errors or invalids. But it rarely works as designed. These bad hosts inflate the size of the project database needlessly.

lazlo

Joined: 20 Nov 19

Posts: 9

Credit: 2958620

RAC: 0

I sent him a PM, but I don't

13 Dec 2019 18:05:20 UTC

Message 174838

(moderation:

)

I sent him a PM, but I don't think it helped. At this time he has 842 "Error" tasks!

Betreger

Joined: 25 Feb 05

Posts: 992

Credit: 1582645810

RAC: 759235

It seems to me that most of

13 Dec 2019 18:33:29 UTC

Message 174839

(moderation:

)

It seems to me that most of those "bad" hosts belong to people who are of the set it and forget it variety.

Keith Myers

Joined: 11 Feb 11

Posts: 4962

Credit: 18659782064

RAC: 5632203

In our "Invalid host

14 Dec 2019 4:54:38 UTC

Message 174851

(moderation:

)

In our "Invalid host messaging" thread we actually have had half a dozen responses from people we have PM'd alerting them that the AMD 5700/XT models should not be used for Seti because they produce nothing but bad results and have a tendency to cross-validate against another 5700/XT. The amount of bad science getting injected into the database is becoming alarming. But those that have responded have said they will remove the card from Seti. But the "bad host" list is 4 times that amount that we have discovered so far.

It is going to get worse with the announcement today of an an even cheaper Navi 5500 model. And still no response from AMD other than they are investigating the issue.

We are trying to come up with a way to only send work to a single 5700/XT card and never pair its wingman with another Navi card. But no response yet from the scientists. Or just exclude sending any work to a Navi card is another solution. But it all has to be coded into the server code for Seti for that to happen.

lazlo

Joined: 20 Nov 19

Posts: 9

Credit: 2958620

RAC: 0

I just found another one with

14 Dec 2019 10:45:29 UTC

Message 174855

(moderation:

)

I just found another one with over 300 "Error" tasks:

https://einsteinathome.org/host/12462057/tasks/0/0

The owner does have two other systems that produce valid results. I sent him a PM.

I am a bit surprised by this though. I wonder why the servers are not set to detect that "X amount of failures" are coming from a single host in "Y amount of time" and throttle them down by only sending them one task a day until the system returns "Z amount of good work". That set up seems to work well over at LHC@Home.

Keith Myers

Joined: 11 Feb 11

Posts: 4962

Credit: 18659782064

RAC: 5632203

The BOINC client does have

14 Dec 2019 17:19:40 UTC

Message 174859

(moderation:

)

The BOINC client does have that mechanism coded into it. But is does not work on the majority of projects in all conditions. The other factor is that Einstein does not run the latest BOINC server code but many versions older and has mostly gone in an independent direction. So the latest improvements the BOINC developers have created and updated to handle bad hosts in the standard BOINC software is not being used here.

archae86

Joined: 6 Dec 05

Posts: 3157

Credit: 7213044931

RAC: 949621

I have seen Einstein limit

14 Dec 2019 17:52:24 UTC

Message 174861 in response to message 174859

(moderation:

)

I have seen Einstein limit task dispatch to me because of recent errors. I don't know of any evidence that the function is currently turned off here.

However, the allowed daily task download count as diminished by error doubles with each successfully returned task, so just a few successes can get a user up to full request level. Some (many) users set their requested task pre-fetch depth very high in days. Furthermore, the estimated task productivity can be wildly off in such common cases as switching from one task type to another (say GPU GRP to GPU GW, or worse yet going between CPU and GPU tasks).

In sum, with the mechanism working as designed, it is possible and common for individual machines to display quite large Error counts on the task list.

Keith Myers

Joined: 11 Feb 11

Posts: 4962

Credit: 18659782064

RAC: 5632203

Quote:I have seen Einstein

14 Dec 2019 20:36:47 UTC

Message 174865 in response to message 174861

(moderation:

)

Quote:

I have seen Einstein limit task dispatch to me because of recent errors. I don't know of any evidence that the function is currently turned off here.

Yes, I agree. I see the same thing when I dump a bunch of work because of a silly mistake. The BOINC mechanism works as designed in that case limiting you to a single task per day until you start returning valid work again and the mechanism slowly ramps up the work sent to you until you are back at your standard cache levels.

But why the servers continue to send out work to bad hosts at each scheduler connection when the host hasn't returned a single valid task is beyond my understanding.

I was looking through my tasks and I found:

Forums › Cruncher's Corner

Comment viewing options

Forums › Cruncher's Corner