Server Flakiness == Lost Credit

the_shep (myYahoo)

Joined: 30 Jan 06

Posts: 1

Credit: 5923

RAC: 0

8 Jan 2007 22:56:51 UTC

Topic 192283

(moderation:

)

Dear E@H,

Due to recent server flakiness over the past 2 weeks, my client was unable to upload/report several results until this morning. But... not all of them received credit because a quorum had already been reached.

May I suggest that you extend the due date of all current WUs until the server becomes stable? Or auto-grant credit for WUs that had a due date during these past 2-weeks, even if they are reported late?

Of the 24 results submitted, 10 WUs did not get credit:
22014514
22014690
22014696
22014698
22014700
22014704
22014710
22014714
22014717
22014719

My PC host id is:
815532

If there is some way to grant 'late' credit, I would greatly appreciate it.

Thanks,
Chad

the_shep

Joined: 28 Jan 06

Posts: 2

Credit: 66479

RAC: 0

Server Flakiness == Lost Credit

8 Jan 2007 23:01:40 UTC

Message 59031

(moderation:

)

Oops... I meant to post using this E@H ID. Should you wish to contact me, please respond using the email address to 'the_shep'.
Thanks,
Chad

FalconFly

Joined: 16 Feb 05

Posts: 191

Credit: 15650710

RAC: 0

Looking at your Results, it

8 Jan 2007 23:22:45 UTC

Message 59032 in response to message 59031

(moderation:

)

Looking at your Results, it appears the outage was long enough for your Work to exceed the deadline.

While that's very unfortunate, it also suggests your Cache setting is a bit too high (your Computers basically buffered too much work to complete before the outage, thus busting the deadline - which is quite generous compared to other Projects, a full 14 days)

Reducing the Cache accordingly will leave more 'headroom' for the work to complete, in case something goes wrong.

A good indicator of above is the average turnaround time for your Pentium 4 2800MHz (carrying most of your Results) : 20.73 days (that's very bad)...
(mine are usually less than a day on all Systems)

Aim for turnaround times as fast as possible, while keeping the cache comfortable enough for your demands. Most people use a day or two, a few use upto 5-6 days, hardly anyone more than that.

With those settings, it's still possible to run out of EAH WorkUnits, but most Users have at least a second Project for this scenario to prevent running out of work completely. Additionally, it's generally better to accept a Project running out of work for a short time, rather than lose many cached Results because of a busted deadline due to a hughe cache.

Mahbubur

Joined: 31 Mar 06

Posts: 46

Credit: 258468

RAC: 0

Your cache doesnt seem

9 Jan 2007 0:07:07 UTC

Message 59033

(moderation:

)

Your cache doesnt seem particularly big. Its just an unfortunate set of circumstances that has led to you losing the credit, I dont think it could have been avoided unless you manually updated the client during the periods E@home was running.

I dont think you should read too much into the turnaround time as its obviously been hugely affected by the current server downtimes.

Annika

Joined: 8 Aug 06

Posts: 720

Credit: 494410

RAC: 0

Well, I don't think the

9 Jan 2007 0:24:29 UTC

Message 59034

(moderation:

)

Well, I don't think the outages have that much of an effect on the turnaround time unless the host was attached very recently. I'm still at a turnaround time of 0.9 on my laptop and 0.5 on the desktop, and those are not even 24/7 hosts...
Okay I keep a very small cache, but still, this seems to be an indicator that not everyone gets high turnaround times when the project is down...

the_shep

Joined: 28 Jan 06

Posts: 2

Credit: 66479

RAC: 0

Thanks for quick reply

9 Jan 2007 4:12:04 UTC

Message 59035 in response to message 59034

(moderation:

)

Thanks for quick reply FalconFly. Are you an admin for E@H? Can you do anything?

If no admin is able to help, then I'll assume the late results will never get credit. C'est la vie.

This particular PC is at my place of work, and as such, I can't control when the network proxy servers are operating. So, I was using a 7-day cache and manually updating results when they finished, or manually aborting them if they weren't gonna make the deadline. I attempted to refresh my client on Dec. 29th before the New Year holiday week (here in Japan), but E@H was down. And... I now realize that the company had turned off the proxy server until this morning.

I am now re-configuring my clients at work to use a 2-day cache and sign them up with a few more BOINC projects that have a long grace period; probably SETI, Spin, QMC, and WCG.

I never get to connect on weekends, even though my clients can continue crunching. So, I truly appreciate the 14-day allowance that E@H provides, as it is very helpful in my situation.

But... would it be possible to increase the timeout period in between the "no reply" for late results and the sending of an "extra" WU that is needed to complete a quorum to a week?

Thus, late results would have might still have a chance to get credit, and WUs would essentially have a 3-week window to complete. My other 14 results only got credit because I beat the "extra" WU that was allocated to someone else.

Many different countries all over the world have many different holidays at many different times. I was working on Xmas Day while most of the world wasn't, but not last week while most of the world was. It would be nice to have that extra week to accommodate for these cultural differences, and to also accommodate for unforeseen troubles, like server difficulties, at project HQ that may compound together.

Can the buffer between a WU due date and the sending of the "extra" WU be increased?

Ocean Archer

Joined: 18 Jan 05

Posts: 92

Credit: 368644

RAC: 0

The_Shep Sorry that you

9 Jan 2007 12:49:52 UTC

Message 59036

(moderation:

)

The_Shep

Sorry that you lost credit on those WUs Shep, and before I go any further - no, I'm not affiliated with E@H (other than crunching WUs).

Just be careful in adding other projects to your mix, because if you keep a large cache, each project will attempt to fill that cache size and you could get into an "overcommitted" state -- too much work, and too little time. I do work for many projects, but keep a very small cache. My system can normally contact daily if needed, assuming the ISP doesn't go bad. You will have to learn by trial and error just how much work you can accept for the projects you run -- good luck

If I've lived this long - I gotta be that old!

FalconFly

Joined: 16 Feb 05

Posts: 191

Credit: 15650710

RAC: 0

RE: Thanks for quick reply

9 Jan 2007 14:37:30 UTC

Message 59037 in response to message 59035

(moderation:

)

Quote:

Thanks for quick reply FalconFly. Are you an admin for E@H? Can you do anything?

No, I'm just a normal User and - besides giving advice - unfortunately can't help.
Since there are currently ca. 153000 Users with 322000 Hosts attached here, I'm afraid there'll be nobody to intervene on one Host missing the Deadline.

The Deadline is basically a 'brick wall' - miss it and lose Credit and applies for everyone, no exception.

The only exception I've ever seen was when Projects are hit by exceptionally long outages (Admins realizing the tremendous amount of work that would be entirely lost if they do not manually disable the deadline for a few days)

PS.
Ocean Archers' Tip is absolutely correct, the total amount of Work is (as far as I realize it) downloaded per Project.
e.g. 2 days Cache on 3 Projects can result in a total of upto 2x3=6 days worth of work (reduced only by several BOINC-internal factors concerning each individual Computer).

Mahbubur

Joined: 31 Mar 06

Posts: 46

Credit: 258468

RAC: 0

Let me explain what i just

9 Jan 2007 18:37:53 UTC

Message 59038

(moderation:

)

Let me explain what i just said.

The point i was trying to make was that cache size would not have helped lost credit. Instead all that would have a happened is that you lose less units but the pc would simply sit idle for longer.

The oc in question only had 550 credit at the time of the thread's creation. So thats around 40 wus + the 10 lost. If most of those were reported after the outage, the turnover time would be highly skewed.

But as people have said, the best course of action would be to have a backup project. A 7 day cache shouldnt be a problem if the pcs run 24/7.

FalconFly

Joined: 16 Feb 05

Posts: 191

Credit: 15650710

RAC: 0

I don't think so. If he

10 Jan 2007 0:44:43 UTC

Message 59039 in response to message 59038

(moderation:

)

I don't think so.

If he had a smaller cache, his last downloads to fill it would not have occured on 19 Dec , but later like on the 26th Dec or alike.

Any he would have downloaded before that time (like on the 19th as he did) would have been finished before the outage, any he would have downloaded later would have had suffient headroom to endure the few days outage without busting the deadline.

Looking at the numbers, his cache was about double of what it should have been to get past the outage without losing any credit. His avg. return time alone is good proof of that (the outage was only a few days, those with smaller caches went all unscratched as far as I can see)

Server Flakiness == Lost Credit

Forums › Cruncher's Corner

Comment viewing options

Forums › Cruncher's Corner