Some Examples of Why You Should Pay Attention to Your Boxes

Alinator
Alinator
Joined: 8 May 05
Posts: 927
Credit: 9352143
RAC: 0
Topic 192013

Case 1: Why It's a Bad Idea to Set a Max Cache when Crunching Multiple Projects.

Case 1

Pretty self explanatory.

Case 2: Why You Shouldn't Assume BOINC *Always* Knows What It's Doing.

Case 2

For this machine, this represents roughly 33,950,000 seconds worth of work (or ~393 days). Deadlines might become an issue here. Not only that, but since the host is actually trying to work through them, as far as I can tell, this might lead to "orphaned" result in the DB as it reports expired results which have been purged. It would be interesting to see what the last few contact log entries for this host looked like.

Alinator

Alinator
Alinator
Joined: 8 May 05
Posts: 927
Credit: 9352143
RAC: 0

Some Examples of Why You Should Pay Attention to Your Boxes

Note for Case 2: The 33 MSecs might be a little harsh, lets call it 10-15 Msecs assuming it drew a mix of long and short, but you get the idea. ;-)

Aliantor

Annika
Annika
Joined: 8 Aug 06
Posts: 720
Credit: 494410
RAC: 0

*lol* That's kinda

*lol* That's kinda informative. Do you have any information if the person representing case 1 has been hunted down by other crunchers waiting for validation of their WUs? ;-) Just kidding, but I know how annoying it can be if you usually finish WUs after a day or so and regularly get teamed up with a guy who takes five or more, so... 14 days... wow ^^ Maybe some people should really think again about their cache size.

transient
transient
Joined: 3 Jun 05
Posts: 62
Credit: 115835369
RAC: 0

What about this

What about this guy

http://einsteinathome.org/host/33556

I don't know what he is doing but this host seems only to download WU's instead of actually processing them. I know I'm waiting for him to finally turn in some results. :(

Alinator
Alinator
Joined: 8 May 05
Posts: 927
Credit: 9352143
RAC: 0

Yep, I'd sure like to know

Yep, I'd sure like to know what happened to cause BOINC to go insane like this. It looks like it was running fine until around 8 to 10 days ago, and would have had to draw a full compliment of 144 results a day for that time period get to this level (1689 results in progress).

At this point, just aborting most of them would take a while. :-)

Alinator

Annika
Annika
Joined: 8 Aug 06
Posts: 720
Credit: 494410
RAC: 0

It definitely would ^^

It definitely would ^^ finally I've found someone whose list looks worse than my CPDN one with 251 results, 250 of them unprocessed... but in that case, it was a server bug and the first 250 WUs never really reached me, whereas Einstein servers seem to run fine, so I assume that guy really has all those 1600 WUs on his box... happy crunching with those :-D at least he can be sure his box won't be idle for the next years or so.

Alinator
Alinator
Joined: 8 May 05
Posts: 927
Credit: 9352143
RAC: 0

Well, the good news is if it

Well, the good news is if it doesn't start reporting results pretty quick, the host will start getting throttled back on new work sent in about 20 hours when the oldest ones start expiring.

The bad news is, there's a of lot hosts which are going be waiting for validation as this pretty hefty chunk gets reissued and run through the mill again.

Alinator

Bruce Allen
Bruce Allen
Moderator
Joined: 15 Oct 04
Posts: 1119
Credit: 172127663
RAC: 0

RE: The bad news is,

Message 49382 in response to message 49381

Quote:

The bad news is, there's a of lot hosts which are going be waiting for validation as this pretty hefty chunk gets reissued and run through the mill again.

Alinator

But the typical host will only be waiting for an additional two or three weeks....

Director, Einstein@Home

Jim Bailey
Jim Bailey
Joined: 31 Aug 05
Posts: 91
Credit: 1452829
RAC: 0

I just checked one machine

I just checked one machine where the person I'm paired with has 1040 wu's, of which he shares 112 with me, and none of those 112 have been returned. Doesn't look like they will be either.

Bruce Allen
Bruce Allen
Moderator
Joined: 15 Oct 04
Posts: 1119
Credit: 172127663
RAC: 0

RE: I just checked one

Message 49384 in response to message 49383

Quote:
I just checked one machine where the person I'm paired with has 1040 wu's, of which he shares 112 with me, and none of those 112 have been returned. Doesn't look like they will be either.

Someone has recently suggested a modification to the BOINC scheduler so that it will not send additional work to hosts that have too may workunits 'in progress'. I'll have a look -- this might help solve the problem.

Cheers,
Bruce

Director, Einstein@Home

Jim Bailey
Jim Bailey
Joined: 31 Aug 05
Posts: 91
Credit: 1452829
RAC: 0

RE: Someone has recently

Message 49385 in response to message 49384

Quote:

Someone has recently suggested a modification to the BOINC scheduler so that it will not send additional work to hosts that have too may workunits 'in progress'. I'll have a look -- this might help solve the problem.

Cheers,
Bruce

The people doing the wu's need to pay attention to what their machines are doing and take action when necessary. There's always that handy little "No New Work" button if the cache becomes to large. With the almost total lack of downtime on Einstein I've never seen any reason to carry a large cache, but, I still check my boxes on a daily basis to make sure everything is OK.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.