"Unsent"

wiseworker
wiseworker
Joined: 16 Sep 06
Posts: 7
Credit: 9,945,744
RAC: 0

I have recently returned to

I have recently returned to this project following a long break, caused by an unresolved problem, that has since been rectified. Only to find that the project has now developed a problem with the validation process. I have an extensive build up of completed units awaiting validation, and on inspection find that the other half of the unit is still waiting to be allocated to another computer.

Waiting for another computer to respond is one thing, but to be waiting in excess of 14 days before the unit is allocated is more than my patience can stand. Therefore, yet again I shall be leaving a project that I believe has potential until Einstein@home can get their act together.

The project has been running long enough for there not to be tis sort of hic-cup, so this may well be farewell

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4,297
Credit: 246,154,894
RAC: 10,791

As far as I can see, the

As far as I can see, the workunits affected are all from the GW search "S6CasA", names starting with "h1_". Is this correct?

This is the search on Einstein@Home that requires the most data to download (and, for that matter, the oldest one on E@H). To make it easier for the participants, we developed a system called "locality scheduling" to minimize the download volume for the Clients by assigning tasks based on the files you already downloaded. Another task of the same workunit is sent to a different Client when his files match closely (or exactly) the files that are required. BTW This is why it's likely you get the same "wingman" with every such task. It may well be that there is no such Client matching yours that has contacted the server in the last week.

Seeing on the server status page that there are 170k tasks in progress and only 25k unsent I'd judge that this is not a general problem, though it might be annoying to you personally. I'll have a look at it in any case.

If you care about fast validation & credit, this search is probably not the right one for you to run. You may opt-out of it in the Einstein@Home preference.

BM

BM

wiseworker
wiseworker
Joined: 16 Sep 06
Posts: 7
Credit: 9,945,744
RAC: 0

Yes all the affected units do

Yes all the affected units do start with 'h1'

I did not realise that any form of selection was made to allocate suitable units to individual crunchers, and that the second half of a unit required for validation would have to wait for a suitable companion.

That being said, I cannot understand why there should be the backlog of unsent units in the category of those waiting validation. It does not take a rocket scientist, or a politician, to work out that if a unit comes in two halves, then whenever the first half is allocated, the second half is allocated to the first suitable cruncher, ahead of that cruncher being allocated a new unit.

If there is as you say, a considerable backlog of second half units waiting to be allocated, why on earth was I issued with 'new' units, I would be just as happy spending my computer time validating 'old' units. Unless some skulduggery is going on, one validated unit is worth a dozen or more un-validated units.

As far as waiting for fast credit, that is of no consideration. It does not matter which project I involve myself with, there will always be the crunchers who fail to complete units to contend with.

Holmis
Joined: 4 Jan 05
Posts: 1,118
Credit: 1,055,935,564
RAC: 0

As Bernd tried to explain

As Bernd tried to explain tasks are just instructions on how to analyze a big set of data files.
So the first time I download a task I also have to download a big set of files to analyze, this set might be in excess of 100MB in size. The next time I ask for work the server tries to assign me a task that use the same big data set to save me from having to download another 100MB or so, the tasks themselves are only a few bytes. Each data set can be analyze in different ways in many tasks, I've seen data sets that have more then 1000 tasks, at least in previous searches.

So this means that the server will wait for other participants that have the right data set before assigning the unsent task to try and minimize the download required. In previous searches there has been some limit on how long the server will wait until it just sends the task to the next host that asks for work for this search. Maybe someone from the project can tell us how long this might be?

If you look at a task name like this one:
h1_0415.20_S6Directed__S6CasAf40a_415.6Hz_87_0
The number 87 close to the end indicates that there are another 86 tasks after this one before the data set is completely analyzed, so theoretically I could get 86 more tasks before I have to make another big download. This won't happen as I'm not the only one asking for work for this data set. This number usually starts high and then goes down as more and more tasks of the same data set are completed.

wiseworker
wiseworker
Joined: 16 Sep 06
Posts: 7
Credit: 9,945,744
RAC: 0

That still does not explain

That still does not explain WHY the vast majority of units assigned to me were 'new', when the exact same achievement would have been made if I had received units that were waiting for a second computer to validate. Thus reducing the backlog, rather than increasing it. Which surely must be an advantage to the project

Or is that too simple.

Mike Hewson
Mike Hewson
Moderator
Joined: 1 Dec 05
Posts: 6,568
Credit: 293,807,431
RAC: 70,542

RE: That still does not

Quote:

That still does not explain WHY the vast majority of units assigned to me were 'new', when the exact same achievement would have been made if I had received units that were waiting for a second computer to validate. Thus reducing the backlog, rather than increasing it. Which surely must be an advantage to the project

Or is that too simple.


Yes, it is too simple. :-)

The vast majority of our contributors worldwide - and thus our donated computing power - are on 28K dial-up access. Our policy of locality scheduling ( minimising a user's total downloads by requesting that they work upon data they already have ) helps to maintain their participation. That can be annoying for those of us who have access to considerably better resources, either in computation speed or communication bandwidth. Our user population is pyramidal in terms of these capabilities - the commonest hardware is the slowest - and if we configured our work allocation to suit the upper/peak parts of the pyramid we would actually exclude the base.

It is certainly true that if we could micromanage allocations then more efficiency would likely result, but by definition we have limited ( hardware and human ) resources server side -> that's why we are a volunteer project! :-)

Cheers, Mike.

I have made this letter longer than usual because I lack the time to make it shorter ...

... and my other CPU is a Ryzen 5950X :-) Blaise Pascal

wiseworker
wiseworker
Joined: 16 Sep 06
Posts: 7
Credit: 9,945,744
RAC: 0

My postings are obviously

My postings are obviously being answered by intelligent people, therefore the problem with communication has got to be with me.

I understand what you are saying about catering mainly for the slower/weaker links. Although I am a returning cruncher, I am returning with a new computer, and a different internet link, than previously used, therefor your descriptive of 'getting to know me and my capability' would be valid.

Except that this did not happen, it is quite clear that you have the capability to assess immediately my capability, and so you were able in my first 30 allocated units, to include 24 of these, if I dare call them 'time consuming' 'h1's. I had no objection to being allocated this type of unit, in fact I actually welcomed them.

My complaint however is not about the allocation of these units, because I am sure you are doing a marvellous job in allocation of 'new' units to the right people, but you seem to be doing an absolutely lousy job in managing the backlog of units waiting to be allocated for validation purposes. Before sending out a new unit, your system should be checking for, and giving priority to, allocating units waiting to be issued to it's second cruncher. I agree units may be held up by crunchers that, although have the capability, do not use that capability to it's fullest extent, or most probably request more work than can be processed in the allotted time, but that is a separate issue.

As I said in a previous post, one validated and completed unit has got to be worth a dozen or more units being held in limbo.

fadedrose
fadedrose
Joined: 6 Apr 13
Posts: 263
Credit: 316,405
RAC: 0

I like fast validation and

I like fast validation and credit, but I don't choose it over having a huge pending file where I'm ahead of the game. I would not like to be so slow that I don't have a pending file.

Now that I understand partners and their resources better, I'm just glad to be able to finish quickly and not hold them up any more than their pc or internet connection holds them up (not to mention lightening and Father Nature (Mother Nature doesn't do bad stuff))

Betreger
Betreger
Joined: 25 Feb 05
Posts: 988
Credit: 1,491,378,380
RAC: 696,258

RE: As I said in a previous

Quote:
As I said in a previous post, one validated and completed unit has got to be worth a dozen or more units being held in limbo.


This is not a sprint it is an endurance contest.

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5,869
Credit: 112,909,348,195
RAC: 36,061,386

RE: My postings are

Quote:
My postings are obviously being answered by intelligent people, therefore the problem with communication has got to be with me.


I don't think there's any problem with communication from either side. There seems to be a problem with unrealistic expectations about the capability of the back-end software and the extent to which the back-end hardware would need to be resourced in order to deliver that capability. Not to mention the manpower resources needed to write the code for the additional functionality.

Quote:
... it is quite clear that you have the capability to assess immediately my capability, and so you were able in my first 30 allocated units, to include 24 of these, if I dare call them 'time consuming' 'h1's.


The scheduler doesn't decide to send 'time consuming' tasks to 'more capable' computers and 'non-time consuming' tasks to computers with 'lower capability'. That is a complete misunderstanding of what the scheduler does do. In any case S6CASA tasks are not particularly 'time consuming' and there is no great variation in crunch time within that group. FGRP2 tasks take more than twice the time of S6CASA tasks - in the main.

What the scheduler does do is send only those types of tasks that your preferences allow. If you haven't restricted your preferences, you will get a mix of all types of tasks for which you have suitable hardware. The scheduler has 'rules' about the ratios to be used in issuing the mix of various types of tasks. The Devs can (and do) change these ratios if they need to for any reason.

Quote:
... you seem to be doing an absolutely lousy job in managing the backlog of units waiting to be allocated for validation purposes.


Why do you think this needs to be 'managed'? Have you worked out the 'cost' of doing this and can you identify any 'gains'? If you look at the current stats on the server status page for the S6CASA run, you will find that we are about 8% into a run that is currently projected to take 435 days. In the overall scheme of things, it really makes no difference to the scientific goals of the run, if a duplicate task is issued immediately or in a few days time. And I take issue with your statement in your initial post that,

... waiting in excess of 14 days before the unit is allocated ... since your computer joined the project only approximately 9 days before you made the statement, so a 14 day wait is quite impossible.

I've been a volunteer with this project for many years and I've seen many previous GW runs where locality scheduling has been a lifesaver, both to the volunteer and to the project. It saves enormous bandwidth for both. It has the very minor side effect (particularly early in the run) of having relatively short duration 'unsent' tasks. the longest 'unsent' time I've ever noticed was about 8-10 days if I remember correctly. As the run gets into full swing, this phase will surely pass - as it has with each previous run.

Quote:
As I said in a previous post, one validated and completed unit has got to be worth a dozen or more units being held in limbo.


I'm sorry, but this is completely wrong. Any completed task is just as valuable as any other completed task. The system will take care of the validation process in its own good time.

Cheers,
Gary.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.