New (Albert) application and workunits

Ingleside

Joined: 23 Jan 05

Posts: 33

Credit: 82113555

RAC: 0

RE: I still suspect

29 Dec 2005 14:49:35 UTC

Message 22613 in response to message 22605

(moderation:

)

Quote:

I still suspect something with the servers in that I have a number of work units listed in my account where one of the three results is listed as "Unsent", in one, two of them are "Unsent", how a quorum is going to be formed with that one is beyond me ...

Einstein@home is preferrably sending results to users that already has the corresponding large input-file, for this reason a couple days from 1st. to last result in a wu is sent-out is normal.

"I make so many mistakes. But then just think of all the mistakes I don't make, although I might."

Paul D. Buck

Joined: 17 Jan 05

Posts: 754

Credit: 5385205

RAC: 0

RE: RE: I still suspect

29 Dec 2005 16:54:10 UTC

Message 22614 in response to message 22613

(moderation:

)

Quote:

Quote:
I still suspect something with the servers in that I have a number of work units listed in my account where one of the three results is listed as "Unsent", in one, two of them are "Unsent", how a quorum is going to be formed with that one is beyond me ...

Einstein@home is preferrably sending results to users that already has the corresponding large input-file, for this reason a couple days from 1st. to last result in a wu is sent-out is normal.

Quite bizarre ... especially as there are many mixed into the sequence where all 3 (or 4) are sent virtually simultaneously. Well, weird anyway ... then again, I never claimed to understand how all this stuff works ...

gravywavy

Joined: 22 Jan 05

Posts: 392

Credit: 68962

RAC: 0

RE: [EDIT December

29 Dec 2005 21:59:11 UTC

Message 22615

(moderation:

)

Quote:

[EDIT December 27]
Quote:
Is is intentional that the target number of results is three rather than the old value of four?

Yes, this is intentional. It may slow down result validation in some cases but will increase our computing power by ~ 25%.

You are underselling here.

Say you got 12 users. They used to crunch 3 different WU with a replication of 4; now they crunch 4 different WU, replicated 3 times. Throughput up from 3 to 4 is a 33% rise.

Thank you very much for this change - it is one I asked for, as did many others - tho not nearly as many as asked for the real time clock in the screensaver ;-)

~~gravywavy

gravywavy

Joined: 22 Jan 05

Posts: 392

Credit: 68962

RAC: 0

RE: Of course, with the

29 Dec 2005 22:15:07 UTC

Message 22616 in response to message 22589

(moderation:

)

Quote:

Of course, with the better versions out there I have no idea why anyone would still use 4.19 ... :)

On a low resolution screen the 4.19 manager made a much better use of screen space than anything since. Those buttons are so enormous that on 640x480 you can't even attach to a project cos you can't reach the button. Talk about shunning users of old equipment.

Why the next manager can't drop the buttons and use right-click context menus beats me - it's how it should have been done in the first place, let the OS decide how to fit it on the desktop when it is clicked. Would also make it accessible to those who need very large fonts sizes. Come on, fixed layout interfaces should be a no-no.

Also, some people liked the graphics slider showing progress (still available vie BOINCview by the way).

But those are the only areas where 4.19 still wins.

IMO there is no advantage at all to the 4.19 client.

And just to be clear, no I don't still use it - the advantages of later clients encouraged me to upgrade at whichever point EDF was working reasonably sensibly.

R~~

~~gravywavy

gravywavy

Joined: 22 Jan 05

Posts: 392

Credit: 68962

RAC: 0

RE: RE: RE: I still

29 Dec 2005 22:41:41 UTC

Message 22617 in response to message 22614

(moderation:

)

Quote:

Quote:
Quote:
I still suspect something with the servers in that I have a number of work units listed in my account where one of the three results is listed as "Unsent", in one, two of them are "Unsent", how a quorum is going to be formed with that one is beyond me ...

Einstein@home is preferrably sending results to users that already has the corresponding large input-file, for this reason a couple days from 1st. to last result in a wu is sent-out is normal.

Quite bizarre ... especially as there are many mixed into the sequence where all 3 (or 4) are sent virtually simultaneously. Well, weird anyway ... then again, I never claimed to understand how all this stuff works ...

I explained this just 213 days ago and some people have forgotten already ;-)

Seriously Paul - one day I will keep my promise to start contributing to the wiki, but in the meantime if you'd like to find a place for this it would be great.

My guess is that this is a non-deliberate side-effect of other scheduling rules. The patent on all such unintended side effects is held by Murphy.

Consider

rule 1 - wherever possible assign work from the data the client already holds

rule 2 - don't assign consecutive wu to the same pairings of computers

Rule 1 reduces download times, which are bad enough on E@h anyway. Rule 2 means that redundancy is spread out to reduce chances of two computers repeatedly make the same mistake on the same wu. Let me be clear, I don't know that rule 2 exists, I am 'reverse engineerig' it from what seems to happen.

Rule 1 certainly exists and is also known as locality scheduling (thanks to JOhn Keck for that)

Now suppose A (by luck) is the first computer to be assigned work from a new dataset.

Eventually, along comes B who has no more wu to be assigned form their old data, and thay are assigned wu from the same dataset as B. Because of rule 2, B will only be assigned one wu that is shared with A. B's next wu after that will be a different wu from the same dataset. Meanwhile A may well want a second wu.

Then along comes C, D, E each will only be assigned one of the WU that any other computer has had. We might have this picture just after G gets their frist wu from this dataset:

With an initial allocation of 4 we get:

wu 1 : A, B, C, D
wu 2 : B,
wu 3 : A, E, F, G
wu 4 : A,
wu 5 : B, E
wu 6 : A,
wu 7 : C, E
wu 8 : B, F
wu 9 : A
wu 10: D, F
wu 11: C,
wu 12: B
wu 13: A
wu 14: D
wu 15: C
wu 16: B
wu 17: A
wu 18: C
wu 19: D

eventually there are enough people on board that all wu get all their results issued close together. It is only around the startup of a new data file or a new app that I'd expect to see this kind of effect.

Question for a mathematician - what is the smallest N such that N results can be given to 4N people and no result given to the same two people as any other result? After N results you expect the issuing of results to start looking sensible instead of all over the place.

But notice that the very first WU, and in fact several others along the way, will get all their results sent out together even while others are kept in solo-crunch state for ages.

Bruce: From the project point of view the thing to notice is that you will not get good turn-round on very small batches of WU if your servers are keeping them back for those who've already seen those datasets - that N seems to define the minimum size -- if, of course, my guess is right about the scheduler's decision rules. If the rules are different it just needs someone to dry run on paper how many wu / hosts need to go through the process before there are sufficient returning hosts to make the wu fly out the door nicely.

R~~

~~gravywavy

gravywavy

Joined: 22 Jan 05

Posts: 392

Credit: 68962

RAC: 0

RE: Question for a

30 Dec 2005 0:10:41 UTC

Message 22618 in response to message 22617

(moderation:

)

Quote:

Question for a mathematician - what is the smallest N such that N results can be given to 4N people and no result given to the same two people as any other result?

No, my mistake - with N results and 4N hosts it's always possible - everyone gets just one wu! The question meant to ask is at what point do you stop needing almost as many wu as hosts - or something. This effect does go away after a startup period, and there must be some way to work out a switch-over, but its around midnight and I can't think it through...

R~~ zzzz

~~gravywavy

gravywavy

Joined: 22 Jan 05

Posts: 392

Credit: 68962

RAC: 0

RE: The new WU have

30 Dec 2005 0:42:55 UTC

Message 22619

(moderation:

)

Quote:

The new WU have different execution times, typically ranging from about 25% to 100% the previous execution times

hi again Bruce. If I understand correctly the different run times originate from the different frequecies, which are know at the outset - is this right?

If so it would be very useful for the new WU to come with predicted run times that scale with their actual run times. The reason for this is that it enables the schedulers to accurately fill the work cache. If the wu vary in length and the variation is known by the scheduler / client, that is no problem, you just get differing numbers of wu on each connect.

To keep working over a weekend, for example, 2.7 days work is fine if I know it will be 2.7 days.

If, however, the variation is not known / knowable by the scheduler and client at download, if say it can vary by a factor of four, I'd have to ask for extra work in case the work issued ran short. If it then ran long I might get into deadline issues, or it might put otherprojects into EDF, and so on.

So accurate estimates of run lengths please, and based on you tester's experience of crunching the test WU. As accurate as possible - if the science means the numbers can't be predicted then we'd all have to live with that.

River~~
R~~

~~gravywavy

genes

Joined: 10 Nov 04

Posts: 41

Credit: 2867568

RAC: 8858

RE: rule 1 - wherever

30 Dec 2005 1:50:52 UTC

Message 22620 in response to message 22617

(moderation:

)

Quote:

rule 1 - wherever possible assign work from the data the client already holds

*This* would explain why a new machine I just attached is getting nothing but Albert WU's while all my older machines get nothing but the original Einsteins.

(not that I'm complaining, just curious)

-Gene

gravywavy

Joined: 22 Jan 05

Posts: 392

Credit: 68962

RAC: 0

RE: RE: rule 1 -

30 Dec 2005 18:35:58 UTC

Message 22621 in response to message 22620

(moderation:

)

Quote:

Quote:

rule 1 - wherever possible assign work from the data the client already holds

*This* would explain why a new machine I just attached is getting nothing but Albert WU's while all my older machines get nothing but the original Einsteins.

(not that I'm complaining, just curious)

-Gene

Correct. Bruce said that the allocations are random, but you only go into the draw when there are no wu to crunch from the data you already have.

You may have noticed you get runs of WU with similar-starting names. It is only at the changeover from one such set of WU to another that you have any chance of getting an Albert. Presumably Alberts alos come in batches attached to different datasets, in which case when your computer can't get any more of the same set of Alberts, it may well revert at that point to the Einsteins.

Dial-up users may have noticed that at the changeover in the name of the wu they get very long connect times - this is because a huge chunk of new data is downloaded. At all other times the instructions for the next wu simply tell the app to do something different with the data already on disk.

Bruce: this makes me think of something else For you to think of...

When Einstein is finally withdrawn, there will be a spate of the server dishing out on-off wu - odds and ends from the old datasets. Oldtimers will remember this happening in previous chageovers. Some dial-up users got understandably upset, so it may be an idea to post warnings when that is about to happen, and suggest tactfully that the project would understand if its dial-up donors took a month crunching elsewhere, and let the ADSL folk cope with the spate of long downloads on consecutive WU.

The advantage of BOINC, of course, is that even if your primary loyalty is with one project, you can easily go elsewhere and come back to avoid temporary issues. By posting a warning like that, dial-up users are more likely to come back than if they are not warned and leave in a tizzy over costs. In my opinion :-)

River~~

~~gravywavy

Marck

Joined: 11 Feb 05

Posts: 9

Credit: 23428347

RAC: 0

RE: If so it would be very

30 Dec 2005 23:06:16 UTC

Message 22622

(moderation:

)

Quote:

If so it would be very useful for the new WU to come with predicted run times that scale with their actual run times. The reason for this is that it enables the schedulers to accurately fill the work cache. If the wu vary in length and the variation is known by the scheduler / client, that is no problem, you just get differing numbers of wu on each connect.

As it seems, the core client already is aware of the different run times. Right now, I've got Albert results waiting in the cache that have different "To completion" times.

Quote:

Some dial-up users got understandably upset, so it may be an idea to post warnings when that is about to happen, and suggest tactfully that the project would understand if its dial-up donors took a month crunching elsewhere, and let the ADSL folk cope with the spate of long downloads on consecutive WU.

I remember that this was already done for the previous changeover (see the news at April 7, 2005 in the archive), so I see no reason why it won't be done this time, too. :)

New (Albert) application and workunits

Forums › Cruncher's Corner

Comment viewing options

Forums › Cruncher's Corner