BRP6 out of work, Arecibo not available for your type of computer

SuperSluether
SuperSluether
Joined: 1 Sep 14
Posts: 4
Credit: 45,360,545
RAC: 101,424
Topic 198277

For some reason, BRP6 (Parkes PMPS XT) is out of work, and my computer won't download BRP4 (Arecibo) because it's "not available?" I checked the compatible apps, and I should be getting something here.

I'm on 64-bit Linux with an Nvidia GTX 760. Shouldn't I be getting BRP4G-cuda32-nv270 or BRP4G-Beta-cuda32-nv270? Does the 270 refer to a specific driver or card, hence why there's no work being sent?

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5,493
Credit: 65,471,844,761
RAC: 54,011,069

BRP6 out of work, Arecibo not available for your type of compute

The server status page shows no tasks for both BRP4G and BRP6. BRP4G is expected because data is scarce and available relatively infrequently. There should be plenty of BRP6 data but there is a history of very occasional outages if the task generator daemon stops working for some odd reason. This is probably one of those occasions.

Seeing as it's now still early on a Sunday morning, maybe nobody has noticed just yet. I'm sure someone might notice shortly :-). What happens will depend on the nature of the problem. It may be as simple as restarting the daemon. At worst, it might be something that can't readily be fixed until Monday, in which case one of the Devs is likely to post a short announcement at some point.

One of Murphy's famous laws says that if something can go wrong it will, and it will always be at a time that causes maximum inconvenience :-). I guess we'll find out later that there was some notorious all night party going on in that part of Hannover and that all the staff were (probably still are) in attendance and in no fit state to fix things :-).

Murphy always knows these things :-).

On a less frivolous note, it's always wise to have a cache of work to outlast the weekend. Your GTX 760 should do quite nicely here. I'm on 64bit Linux too and I've recently added a 750Ti to my farm. I'm running 3 tasks concurrently which improves the output. By the look of your crunch times you are running at least two concurrent GPU tasks, maybe more. How many are you actually running?

Cheers,
Gary.

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 2,041
Credit: 741,860,982
RAC: 1,420,425

I'm not entirely convinced by

I'm not entirely convinced by that. I have a couple of machines which have been concentrating on BRP6/intel_gpu, but didn't get any new work overnight.

The server log shows no attempt to scan the BRP6 plan_classes:

2015-10-18 09:03:51.2336 [PID=32592]   Request: [USER#xxxxx] [HOST#8864187] [IP xxx.xxx.xxx.143] client 7.6.9
2015-10-18 09:03:51.2341 [PID=32592]    [send] effective_ncpus 4 max_jobs_on_host_cpu 999999 max_jobs_on_host 999999
2015-10-18 09:03:51.2342 [PID=32592]    [send] effective_ngpus 1 max_jobs_on_host_gpu 999999
2015-10-18 09:03:51.2342 [PID=32592]    [send] Not using matchmaker scheduling; Not using EDF sim
2015-10-18 09:03:51.2342 [PID=32592]    [send] CPU: req 0.00 sec, 0.00 instances; est delay 0.00
2015-10-18 09:03:51.2342 [PID=32592]    [send] Intel GPU: req 26987.07 sec, 0.00 instances; est delay 0.00
2015-10-18 09:03:51.2342 [PID=32592]    [send] work_req_seconds: 0.00 secs
2015-10-18 09:03:51.2342 [PID=32592]    [send] available disk 97.71 GB, work_buf_min 43200
2015-10-18 09:03:51.2342 [PID=32592]    [send] active_frac 0.999982 on_frac 0.999373 DCF 0.860931
2015-10-18 09:03:51.2351 [PID=32592]    [send] [HOST#8864187] is reliable
2015-10-18 09:03:51.2353 [PID=32592]    [send] set_trust: random choice for error rate 0.000010: yes
2015-10-18 09:03:51.2353 [PID=32592]    [mixed] sending non-locality work first (0.2788)
2015-10-18 09:03:51.2549 [PID=32592]    [send] [HOST#8864187] will accept beta work.  Scanning for beta work.
2015-10-18 09:03:51.2846 [PID=32592]    [mixed] sending locality work second
2015-10-18 09:03:51.2879 [PID=32592] [debug]   [HOST#8864187] MSG(high) No work sent
2015-10-18 09:03:51.2879 [PID=32592]    Sending reply to [HOST#8864187]: 0 results, delay req 60.00
2015-10-18 09:03:51.2890 [PID=32592]    Scheduler ran 0.060 seconds


The machine is showing no signs of distress: All Parkes PMPS XT tasks for computer 8864187

Now I've added BRP4 to the 'selected apps' list, the Server log for host 8864187 shows a normal plan_class scan and work allocation.

Snow Crash
Snow Crash
Joined: 24 Dec 09
Posts: 65
Credit: 100,880,785
RAC: 0

I am getting occasional

I am getting occasional "resends" for BRP6-beta.
My log show similar to RH's - excepting that I am requesting ATI / NVIDIA depending on rig. Of course this is the morning I was adding a new rig and without a "reliable" factor there is no possibility of getting even resends - thanks Mr. Murphy :-)

--------------------------
- Crunch, Crunch, Crunch -
--------------------------

SuperSluether
SuperSluether
Joined: 1 Sep 14
Posts: 4
Credit: 45,360,545
RAC: 101,424

RE: The server status page

Quote:

The server status page shows no tasks for both BRP4G and BRP6. BRP4G is expected because data is scarce and available relatively infrequently. There should be plenty of BRP6 data but there is a history of very occasional outages if the task generator daemon stops working for some odd reason. This is probably one of those occasions.

Seeing as it's now still early on a Sunday morning, maybe nobody has noticed just yet. I'm sure someone might notice shortly :-). What happens will depend on the nature of the problem. It may be as simple as restarting the daemon. At worst, it might be something that can't readily be fixed until Monday, in which case one of the Devs is likely to post a short announcement at some point.

One of Murphy's famous laws says that if something can go wrong it will, and it will always be at a time that causes maximum inconvenience :-). I guess we'll find out later that there was some notorious all night party going on in that part of Hannover and that all the staff were (probably still are) in attendance and in no fit state to fix things :-).

Murphy always knows these things :-).

On a less frivolous note, it's always wise to have a cache of work to outlast the weekend. Your GTX 760 should do quite nicely here. I'm on 64bit Linux too and I've recently added a 750Ti to my farm. I'm running 3 tasks concurrently which improves the output. By the look of your crunch times you are running at least two concurrent GPU tasks, maybe more. How many are you actually running?

Thanks for the info. It looks like I mixed up BRP4 with BRP4G. I'm running 3 tasks on my GPU because each task was only using about 30% of it. I have my work buffer set to 0.5 days up to an extra day, but I don't want to set it too large because BOINC is running on a 2GB Ramdisk.

I thought the server was being prejudice against my system or something. Now I see it was just an oversight on my part, seeing as how BOINC has intermittent outages on every project. Thanks again for clearing it up. ☺

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5,493
Credit: 65,471,844,761
RAC: 54,011,069

RE: I'm not entirely

Quote:
I'm not entirely convinced by that. I have a couple of machines which have been concentrating on BRP6/intel_gpu, but didn't get any new work overnight.


Somewhere around 1600 (or a bit later) UTC on Oct 17, the flow of BRP6 primary tasks stopped for me. A trickle of resends has continued. The status page continues to show essentially zero tasks for BRP6 and BRP4G. BRP4 (not for mainstream discrete GPUs) continues to have work for those devices it was intended to support. On the basis that Bernd tries not to make changes on a Friday which might impact the weekend, I don't think the lack of work is deliberate or is associated with serverside programming changes. I'm still guessing that something unexpected has happened to the work generation process. It has happened before and has taken a while to be noticed before (failure of a monitoring script to report the problem) if I remember correctly.

Quote:
The server log shows no attempt to scan the BRP6 plan_classes:


My hosts seem to show the same. Maybe the logic is to scan only those plan classes for which there is work to send? After you added BRP4, the log shows only intel GPU plan classes and not BRP6, so if you didn't disable BRP6, it's perhaps still not a 'normal' log?

Cheers,
Gary.

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 2,041
Credit: 741,860,982
RAC: 1,420,425

RE: After you added BRP4,

Quote:
After you added BRP4, the log shows only intel GPU plan classes and not BRP6, so if you didn't disable BRP6, it's perhaps still not a 'normal' log?


Yes, I did it that way - currently explicitly allowing intel_gpu (only), BRP4 and BRP6 (only). I might try BRP6+'allow others if none available' tomorrow. And I did get one BRP6 resend after I set those preferences and posted.

cliff
cliff
Joined: 15 Feb 12
Posts: 176
Credit: 283,452,444
RAC: 0

Cant get any WU, seems

Cant get any WU, seems servers are offline, anyone got any idea when they might get a kick in the pants?

Cheers,

Cliff,

Been there, Done that, Still no damm T Shirt.

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5,493
Credit: 65,471,844,761
RAC: 54,011,069

RE: I thought the server

Quote:
I thought the server was being prejudice against my system or something. Now I see it was just an oversight on my part, seeing as how BOINC has intermittent outages on every project. Thanks again for clearing it up. ☺


Outages at Einstein are, fortunately, relatively infrequent. Quite often, things are back on track relatively quickly but there is always the slight chance of a more serious situation. None of the Devs have said anything yet that I've seen so there is no way of knowing what the problem actually is.

I'm now at the site where the bulk of my hosts are and the logs there tell me that things were seemingly OK at 7:22PM UTC on Sat, 17 Oct. The next host of mine 'in the queue' failed to get any BRP6 when it asked at 7:27PM and from that point onwards only 'resend' tasks (which don't require to be generated since they are direct extra copies of already generated tasks) have come through. I've been getting around 2 resends per hour over my whole fleet.

All this points to a problem with work generation. On the server status page, the work generator for BRP6 is shown as "Not running" but that tells you nothing because it only runs for a short time when needed to top up the supply of 'ready to send' tasks. The biggest chance is that the data for the status page will be refreshed at a time the daemon isn't running anyway.

These daemons have been known to quit unexpectedly on occasions but previous comments by the Devs lead me to believe that this should be detected and reported. I seem to recall a previous occasion where that didn't work either but I thought that failure had been fixed.

If the lack of available work causes angst to anyone, the two easiest solutions are either to have a backup project or to cache work for a sensible period based on the most likely type of outage. I would put that figure in the 1-3 day range. There have been longer outages but there are disadvantages in trying to cope with something like a 7-10 day outage, which is a very rare event at this project.

Cheers,
Gary.

Christian Beer
Christian Beer
Moderator
Joined: 9 Feb 05
Posts: 595
Credit: 100,014,460
RAC: 2,344

We are working on a solution

We are working on a solution right now. Stay tuned.

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4,031
Credit: 217,970,722
RAC: 50,666

BRP6 pre-processing got stuck

BRP6 pre-processing got stuck over the weekend. Unsent tasks are available again.

BM

BM

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.