FGRPB1G tasks on my machine vs server status on web

GWGeorge007
GWGeorge007
Joined: 8 Jan 18
Posts: 391
Credit: 686,222,996
RAC: 2,221,537
Topic 224137

Another "greetings" from a once again perplexed E@H user.

Looking at my tasks for my 3950X machine, I find that I've received just 26 TOTAL tasks for FGRPB1G (out of 2507 in all), 10 in pending and 16 in completed, and the last task I received was on Nov 28.

Does this mean my 3950X machine has a cache size too large because I held onto the tasks to get a large total number?

Looking at the server status web page, I see that FGRPB1G has approximately the same number of total tasks as O2MDF (~1.2M vs. ~1.4M) with the largest difference being that O2MDF has ~200K more tasks failed than FGRPB1G.  With that said, I find it odd that there are still ~180k tasks to send for O2MDF yet only ~4k tasks to send for FGRPB1G.

Would this mean that FGRPB1G is going by the wayside? 

Any help would be appreciated.

George

GWGeorge007
GWGeorge007
Joined: 8 Jan 18
Posts: 391
Credit: 686,222,996
RAC: 2,221,537

I happened to look at this

I happened to look at this web page https://einsteinathome.org/content/fgrp and found out that Fermi Gamma-ray Pulsars are relatively rare which might explain some of my questions.

For those that are too impatient (or lazy) to read it all, I'll just quote this one sentence:

"On average only 10 photons per day are detected from a typical pulsar by the LAT onboard the Fermi spacecraft."

 

George

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5,494
Credit: 65,680,163,617
RAC: 54,249,132

George wrote:Does this mean

George wrote:
Does this mean my 3950X machine has a cache size too large because I held onto the tasks to get a large total number?

No, not at all.

Before replying, I looked at your current list of tasks. You have both GRP and GW searches enabled for both CPU and GPU.  For CPU tasks in your list, the current GRP/GW numbers are 59/767 whilst for GPU they are 28/1622.  These are totals for each type, which includes both 'in progress' and 'completed' types.  Most of the numbers are in the form of completed work and how long such work is 'held' in the online database before being deleted (and is therefore viewable) is at the discretion of the project and their concerns about disk space and database size, etc.

The main reason for the apparent huge disparity (both CPU and GPU tasks) between the GRP and GW numbers  is not something you can modify, if you leave all searches enabled.  When you make a work request with all these types 'allowed' by your preferences, the scheduler gets to make the decision about what particular search type to send tasks for.

The GW search is the highest priority (in the projects viewpoint) so the scheduler will heavily 'prefer' to send you that type, if it is allowed to by your settings.  That's simply it.  It's nothing to do with the current 'ready to send' task numbers (unless they suddenly become zero) :-).  There's no immediate risk (that I know of) of either type of work running out in the near future.

George wrote:
Looking at the server status web page, I see that FGRPB1G has approximately the same number of total tasks as O2MDF (~1.2M vs. ~1.4M) with the largest difference being that O2MDF has ~200K more tasks failed than FGRPB1G.  With that said, I find it odd that there are still ~180k tasks to send for O2MDF yet only ~4k tasks to send for FGRPB1G.

GRP tasks are very quick to generate when needed.  For that reason, the 'ready to send' numbers are always quite low.  As they get used up, a workunit generator will spring to life and create a new batch.  The status page is a snapshot taken every 10mins or so.  Quite often, workunit generators will show as 'not running' at the time a snapshot is taken.

For the FGRPB1G search, a snapshot number of 4K would be quite normal.  As soon as that number drops below some 'trigger point' (maybe something like 1K - I don't know) the WU generator will spring to life and quickly top up to some upper limit, probably not far above the number you mention - maybe 5 - 10K or something like that.

For the GW search, locality scheduling requires that the 'ready to send' work being held must cover a large range of different frequencies so that the needs of any random host asking for a particular frequency can be met without having to supply a completely new set of large data files each time.  So it stands to reason that there will always be a much larger set of 'ready to send' O2MDF tasks, just to cover that large range of different frequencies.

Cheers,
Gary.

GWGeorge007
GWGeorge007
Joined: 8 Jan 18
Posts: 391
Credit: 686,222,996
RAC: 2,221,537

Once again, I thank you

Once again, I thank you Gary.  Your explanations are always easy to understand, at least for me.

So, if I am reading this right I shouldn't be worried about the settings in my 'config' files or 'cache', right?

Just one more simple question.  Could you explain the difference between task and work unit?  I tend to get them crossed not and then, and I'm not sure I fully understand.

Thanks again!

George

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5,494
Credit: 65,680,163,617
RAC: 54,249,132

I didn't notice anything of

I didn't notice anything of concern when I looked (very briefly) at your full list of tasks.

If you follow the above link, you'll see that list.  Notice the headings, Task ID, Workunit ID, Sent, ... etc.  A task is your individual copy of the job to be done.  A workunit is the full list of all copies of that task which have been sent to different computers.  By default, this is normally two copies (the initial replication) but this can expand if there are problems.

I just clicked on the oldest workunit ID currently still showing - 500536726 which actually shows 7 identical tasks that were eventually needed to find just two that agreed and gave a validated result.  Yours was one of those two.  You will notice that 5 had some sort of problem that resulted in the extra task copies being created and sent out.  If you really want to be anal, you can use the sent and received times as listed to trace the order/history of what actually happened with all those tasks :-).

That full set of 7 tasks that comprise the workunit is also referred to as the 'quorum' :-).

Hang in there, you'll get used to the lingo :-).

Cheers,
Gary.

GWGeorge007
GWGeorge007
Joined: 8 Jan 18
Posts: 391
Credit: 686,222,996
RAC: 2,221,537

Gary Roberts wrote: I didn't

Gary Roberts wrote:

I didn't notice anything of concern when I looked (very briefly) at your full list of tasks.

If you follow the above link, you'll see that list.  Notice the headings, Task ID, Workunit ID, Sent, ... etc.  A task is your individual copy of the job to be done.  A workunit is the full list of all copies of that task which have been sent to different computers.  By default, this is normally two copies (the initial replication) but this can expand if there are problems.

I just clicked on the oldest workunit ID currently still showing - 500536726 which actually shows 7 identical tasks that were eventually needed to find just two that agreed and gave a validated result.  Yours was one of those two.  You will notice that 5 had some sort of problem that resulted in the extra task copies being created and sent out.  If you really want to be anal, you can use the sent and received times as listed to trace the order/history of what actually happened with all those tasks :-).

That full set of 7 tasks that comprise the workunit is also referred to as the 'quorum' :-).

Hang in there, you'll get used to the lingo :-).

I've got a lot of learning still to do, but I'm trying.  It is taking longer than I expected though.  ;*) 

I don't think I'll get anal anytime soon, so I'm not going to worry about that.  I must admit though, this is the first time I've heard of 'quorum'.  But don't explain it, I can look it up myself online at Merriam-Webster's dictionary and/or Google.

George

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5,494
Credit: 65,680,163,617
RAC: 54,249,132

Quorum just means 'group' -

Quorum just means 'group' - in this case a group of identical tasks.  I'm replying because there is one other point of interest you should at least be aware of.

The full task name as listed on the website has a final field (an underscore followed by a 1 or 2 digit string) which is useful to understand.  The first tasks issued for a particular workunit (the initial replication) will have either _0 or _1 for this field.  I tend to call these the 'primary tasks' for a particular workunit.  If there are no errors in crunching these, that will be the end of the matter. 

If further copies are needed (as in the example I linked), they will have _2 or _3 or ... as many as are needed to get a valid result.  Any tasks with _2 or above are not the primary tasks.  We tend to call them 'resends' because that's what they are - extra copies that are sent to replace a failed primary task.  That field will stop at _19 if tasks keep failing (20 tasks in total) at which point the whole workunit will be abandoned.  You can see that limit by looking at the top of any workunit page where it will show:-

Max # of error/total/success tasks: 20, 20, 20

Project staff can configure those limits however they wish.

Cheers,
Gary.

GWGeorge007
GWGeorge007
Joined: 8 Jan 18
Posts: 391
Credit: 686,222,996
RAC: 2,221,537

Gary Roberts wrote: Quorum

Gary Roberts wrote:

Quorum just means 'group' - in this case a group of identical tasks.  I'm replying because there is one other point of interest you should at least be aware of.

The full task name as listed on the website has a final field (an underscore followed by a 1 or 2 digit string) which is useful to understand.  The first tasks issued for a particular workunit (the initial replication) will have either _0 or _1 for this field.  I tend to call these the 'primary tasks' for a particular workunit.  If there are no errors in crunching these, that will be the end of the matter. 

If further copies are needed (as in the example I linked), they will have _2 or _3 or ... as many as are needed to get a valid result.  Any tasks with _2 or above are not the primary tasks.  We tend to call them 'resends' because that's what they are - extra copies that are sent to replace a failed primary task.  That field will stop at _19 if tasks keep failing (20 tasks in total) at which point the whole workunit will be abandoned.  You can see that limit by looking at the top of any workunit page where it will show:-

Max # of error/total/success tasks: 20, 20, 20

Project staff can configure those limits however they wish.

Thank you Gary for a vivid explanation.  Believe it or not, I do get it.  I actually figured that out by looking at at my own tasks.  Oh... I wish that I could only know more!  Maybe someday... just maybe...

George

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.