Server Flakiness == Lost Credit

the_shep (myYahoo)
the_shep (myYahoo)
Joined: 30 Jan 06
Posts: 1
Credit: 5923
RAC: 0
Topic 192283

Dear E@H,

Due to recent server flakiness over the past 2 weeks, my client was unable to upload/report several results until this morning. But... not all of them received credit because a quorum had already been reached.

May I suggest that you extend the due date of all current WUs until the server becomes stable? Or auto-grant credit for WUs that had a due date during these past 2-weeks, even if they are reported late?

Of the 24 results submitted, 10 WUs did not get credit:
22014514
22014690
22014696
22014698
22014700
22014704
22014710
22014714
22014717
22014719

My PC host id is:
815532

If there is some way to grant 'late' credit, I would greatly appreciate it.

Thanks,
Chad

the_shep
the_shep
Joined: 28 Jan 06
Posts: 2
Credit: 66479
RAC: 0

Server Flakiness == Lost Credit

Oops... I meant to post using this E@H ID. Should you wish to contact me, please respond using the email address to 'the_shep'.
Thanks,
Chad

FalconFly
FalconFly
Joined: 16 Feb 05
Posts: 191
Credit: 15650710
RAC: 0

Looking at your Results, it

Message 59032 in response to message 59031

Looking at your Results, it appears the outage was long enough for your Work to exceed the deadline.

While that's very unfortunate, it also suggests your Cache setting is a bit too high (your Computers basically buffered too much work to complete before the outage, thus busting the deadline - which is quite generous compared to other Projects, a full 14 days)

Reducing the Cache accordingly will leave more 'headroom' for the work to complete, in case something goes wrong.

A good indicator of above is the average turnaround time for your Pentium 4 2800MHz (carrying most of your Results) : 20.73 days (that's very bad)...
(mine are usually less than a day on all Systems)

Aim for turnaround times as fast as possible, while keeping the cache comfortable enough for your demands. Most people use a day or two, a few use upto 5-6 days, hardly anyone more than that.

With those settings, it's still possible to run out of EAH WorkUnits, but most Users have at least a second Project for this scenario to prevent running out of work completely. Additionally, it's generally better to accept a Project running out of work for a short time, rather than lose many cached Results because of a busted deadline due to a hughe cache.

Mahbubur
Mahbubur
Joined: 31 Mar 06
Posts: 46
Credit: 258468
RAC: 0

Your cache doesnt seem

Your cache doesnt seem particularly big. Its just an unfortunate set of circumstances that has led to you losing the credit, I dont think it could have been avoided unless you manually updated the client during the periods E@home was running.

I dont think you should read too much into the turnaround time as its obviously been hugely affected by the current server downtimes.

Annika
Annika
Joined: 8 Aug 06
Posts: 720
Credit: 494410
RAC: 0

Well, I don't think the

Well, I don't think the outages have that much of an effect on the turnaround time unless the host was attached very recently. I'm still at a turnaround time of 0.9 on my laptop and 0.5 on the desktop, and those are not even 24/7 hosts...
Okay I keep a very small cache, but still, this seems to be an indicator that not everyone gets high turnaround times when the project is down...

the_shep
the_shep
Joined: 28 Jan 06
Posts: 2
Credit: 66479
RAC: 0

Thanks for quick reply

Message 59035 in response to message 59034

Thanks for quick reply FalconFly. Are you an admin for E@H? Can you do anything?

If no admin is able to help, then I'll assume the late results will never get credit. C'est la vie.

This particular PC is at my place of work, and as such, I can't control when the network proxy servers are operating. So, I was using a 7-day cache and manually updating results when they finished, or manually aborting them if they weren't gonna make the deadline. I attempted to refresh my client on Dec. 29th before the New Year holiday week (here in Japan), but E@H was down. And... I now realize that the company had turned off the proxy server until this morning.

I am now re-configuring my clients at work to use a 2-day cache and sign them up with a few more BOINC projects that have a long grace period; probably SETI, Spin, QMC, and WCG.

I never get to connect on weekends, even though my clients can continue crunching. So, I truly appreciate the 14-day allowance that E@H provides, as it is very helpful in my situation.

But... would it be possible to increase the timeout period in between the "no reply" for late results and the sending of an "extra" WU that is needed to complete a quorum to a week?

Thus, late results would have might still have a chance to get credit, and WUs would essentially have a 3-week window to complete. My other 14 results only got credit because I beat the "extra" WU that was allocated to someone else.

Many different countries all over the world have many different holidays at many different times. I was working on Xmas Day while most of the world wasn't, but not last week while most of the world was. It would be nice to have that extra week to accommodate for these cultural differences, and to also accommodate for unforeseen troubles, like server difficulties, at project HQ that may compound together.

Can the buffer between a WU due date and the sending of the "extra" WU be increased?

Ocean Archer
Ocean Archer
Joined: 18 Jan 05
Posts: 92
Credit: 368644
RAC: 0

The_Shep Sorry that you

The_Shep

Sorry that you lost credit on those WUs Shep, and before I go any further - no, I'm not affiliated with E@H (other than crunching WUs).

Just be careful in adding other projects to your mix, because if you keep a large cache, each project will attempt to fill that cache size and you could get into an "overcommitted" state -- too much work, and too little time. I do work for many projects, but keep a very small cache. My system can normally contact daily if needed, assuming the ISP doesn't go bad. You will have to learn by trial and error just how much work you can accept for the projects you run -- good luck


If I've lived this long - I gotta be that old!

FalconFly
FalconFly
Joined: 16 Feb 05
Posts: 191
Credit: 15650710
RAC: 0

RE: Thanks for quick reply

Message 59037 in response to message 59035

Quote:
Thanks for quick reply FalconFly. Are you an admin for E@H? Can you do anything?

No, I'm just a normal User and - besides giving advice - unfortunately can't help.
Since there are currently ca. 153000 Users with 322000 Hosts attached here, I'm afraid there'll be nobody to intervene on one Host missing the Deadline.

The Deadline is basically a 'brick wall' - miss it and lose Credit and applies for everyone, no exception.

The only exception I've ever seen was when Projects are hit by exceptionally long outages (Admins realizing the tremendous amount of work that would be entirely lost if they do not manually disable the deadline for a few days)

PS.
Ocean Archers' Tip is absolutely correct, the total amount of Work is (as far as I realize it) downloaded per Project.
e.g. 2 days Cache on 3 Projects can result in a total of upto 2x3=6 days worth of work (reduced only by several BOINC-internal factors concerning each individual Computer).

Mahbubur
Mahbubur
Joined: 31 Mar 06
Posts: 46
Credit: 258468
RAC: 0

Let me explain what i just

Let me explain what i just said.

The point i was trying to make was that cache size would not have helped lost credit. Instead all that would have a happened is that you lose less units but the pc would simply sit idle for longer.

The oc in question only had 550 credit at the time of the thread's creation. So thats around 40 wus + the 10 lost. If most of those were reported after the outage, the turnover time would be highly skewed.

But as people have said, the best course of action would be to have a backup project. A 7 day cache shouldnt be a problem if the pcs run 24/7.

FalconFly
FalconFly
Joined: 16 Feb 05
Posts: 191
Credit: 15650710
RAC: 0

I don't think so. If he

Message 59039 in response to message 59038

I don't think so.

If he had a smaller cache, his last downloads to fill it would not have occured on 19 Dec , but later like on the 26th Dec or alike.

Any he would have downloaded before that time (like on the 19th as he did) would have been finished before the outage, any he would have downloaded later would have had suffient headroom to endure the few days outage without busting the deadline.

Looking at the numbers, his cache was about double of what it should have been to get past the outage without losing any credit. His avg. return time alone is good proof of that (the outage was only a few days, those with smaller caches went all unscratched as far as I can see)

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.