I can´t get workunits from e@home Servers

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5850
Credit: 110034590031
RAC: 22379806

PorkyPies wrote:According to

PorkyPies wrote:
According to the server status page the FGRP5 are showing as 3.3 days left ...

Those figures are for the GPU run (FGRPB1G) not FGRP5 (which is the CPU run).  FGRP5 has available tasks at the moment.

For years now, the numbers for FGRPB1G have showed things 'about to run out' but they keep adding more data so the numbers just can't be believed.

Of course, at some point, that search will end (nothing lasts forever :-) ) but the last time there was an official comment about a similar dire shortage, the indications were that there was plenty still to come.

I would expect there to be an announcement about the future plans well before there is any sudden hard ending.  Of course, "well before" might only be a week or three, but it wouldn't be after the work had actually run out, you would hope :-).

Cheers,
Gary.

Harri Liljeroos
Harri Liljeroos
Joined: 10 Dec 05
Posts: 3702
Credit: 2930376109
RAC: 1041245

FGRPB1G tasks are available

FGRPB1G tasks are available again. Let's hope this will last.

archae86
archae86
Joined: 6 Dec 05
Posts: 3146
Credit: 7060534931
RAC: 1155337

The GPU Gamma-ray pulsar work

The GPU Gamma-ray pulsar work flowed to me enough to bang against two problems I'll mention here in case they are news to anyone.  I'm trying to help puzzled crunchers, not attempting to post a bug report.

You are most likely to see these issues if you have:

1. a somewhat fast GPU
2. are running Einstein Gamma-Ray pulsar exclusively on the GPU
3. have an appreciable queue size (over a day)

Issue 1: daily Quota.

If your last work request ends with a message in this general form: 

stopping work search - daily quota exceeded (416>=416)
Daily result quota 416 exceeded for host 10706295
No work sent
No work is available for Gamma-ray pulsar binary search #1 on GPUs
MSG(high) (reached daily quota of 416 tasks)
MSG( low) Project has no jobs available

and you see in the project status field a notification like:

deferred for 4:15:20

Then you've met the resource-dependent daily quota on task downloads.

The resources that count here are available GPUs, and available CPUs.

If you have falsified the real CPU count, your falsified number is used (that is why many who falsify do so) unless it exceeds the limit (something like 8 GPUs and 64 CPUs, but don't count on me getting that bit quite right).

If you set the preference item Computing|Processor usage|Use at most nn% of the processors to something below 100%, then for this purpose your number of processors is rounded down based on that limitation.

Currently (and for many months past), your quota gets 256 per GPU and 32 per available CPU.

I, personally have machines which hit their limits today at 352, 416, and 704 tasks downloaded.

Issue 2: clock mismatch

The code which decides until when you are deferred consistently chooses an amount of time that puts your next fetch request randomly within the first hour after midnight UTC.  And Boincmgr consistently sends a request when the deferral expires.

BUT, the code which decides what to do when your BOINC sends it a good-faith work request after your time in the penalty box has expired nearly always (in my case, anyway, over many dozens of observations, including three today) declares that you have asked too soon (presumably it is consulting some sort of clock with some sort of permission algorithm to decide this, but with strenuous effort I've failed to reverse engineer it) and that your daily quota is exceeded (even though in fact this is a new day UTC in which you have gotten exactly zero tasks).  

Then it decides that you should be deferred again, and falls back on that nice piece of code which gives you a deferral until sometime in the first hour after the NEXT midnight UTC.

But, if you fear running out of tasks, or just are curious and hit the project update button a few hours later, often you will find your request honored, get tasks, and not see the pesky deferral.  The sticky bit: what does "a few hours" mean here.  I've seen it both work and not work for updates over the range between 2 am. and 9 a.m. UTC on the "second" day.  I don't think it has ever declined to give me work in this sort of case if I hit the button with deferral still showing when I get out of bed here in USA mountain time, so roughly 11:00 UTC.

This clock mismatch problem has been true for years.  I've posted about it before.  I've generally not been believed. 

 

Keith Myers
Keith Myers
Joined: 11 Feb 11
Posts: 4754
Credit: 17706228705
RAC: 5295443

Quote:This clock mismatch

Quote:
This clock mismatch problem has been true for years.  I've posted about it before.  I've generally not been believed. 

I believe you.  Ran into this UTC clock (what constitutes a "new" day) problem when I was first deploying an anonymous gpu application and it was dumping errors the first few times I was trying to get it sorted out.

Think it took about 4 or 5 days before I could get the scheduler to send me work and I finally processed it correctly.

 

Tom M
Tom M
Joined: 2 Feb 06
Posts: 5662
Credit: 7742471208
RAC: 2467015

Harri Liljeroos

Harri Liljeroos wrote:

FGRPB1G tasks are available again. Let's hope this will last.

I had the profile on my systems set to run non-preferred tasks if preferred were not available.  And it just downloaded more GW tasks even though GR was available.

So I toggled the non-preferred off.

I would still prefer to have it automated to switch to GW when GR is not available.

Tom M

A Proud member of the O.F.A.  (Old Farts Association).  Be well, do good work, and keep in touch.® (Garrison Keillor)

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5850
Credit: 110034590031
RAC: 22379806

Tom M wrote:I had the profile

Tom M wrote:
I had the profile on my systems set to run non-preferred tasks if preferred were not available.  And it just downloaded more GW tasks even though GR was available.

This happens (I believe) because of the way generated tasks are cached ready for delivery.  There may be a large number generated (eg 5,000 or more so certainly no availability problem) but a very small fraction of those are in a 'fast cache' for 'rapid response' delivery.  As the fast cache drains, it is 'topped up' periodically from the main supply.  The idea is to attempt to have as rapid a response as possible when the routine work requests come in.

When the inevitable surges in demand occur, it's possible for there to be not enough tasks in the fast cache to supply all requests instantly.  There is a 'momentary outage' (maybe lasting only seconds) until it gets topped up again.  My impression (and that's all it is) is that this 'momentary outage' allows the scheduler to send you 'non-preferred' tasks at that instant, if you have that setting enabled.

In other words the scheduler is not 'smart enough' to wait for the inevitable top-up so it just takes the 'allowed' alternative.  I always keep the 'allow non-preferred apps' firmly set to 'No' to avoid this possibility.

Tom M wrote:
I would still prefer to have it automated to switch to GW when GR is not available.

Because of factors like the single DCF at Einstein and the fact that GRP GPU tasks run much quicker than the estimate whilst GW GPU tasks are completely the other way, I try to avoid allowing both types to crunch simultaneously on the one host.  It's easy for me to differentiate since I have lots of hosts.

For someone with a single host and wanting to support both, the least intrusive option is probably to give each search type a set period and switch at the end of that period (eg. weekly or monthly).  The quick way to switch is to set up two locations (previously called venues), one for GW and one for GRP.  At the end of the allotted period, set a small work cache first and then go to the host details page on the website and change the location.  This will allow the remaining tasks to complete without risking a huge number of the new type if they happen to have wildly low estimates.

If you want the ability to support one type mainly but quickly switch to the other if there is an outage, you could adopt the above technique pretty easily.  As soon as you see there is an outage, reduce work cache and set the alternative location.  When the outage is over, just switch back to the normal location.

Cheers,
Gary.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.