Gravitational Wave All-sky search on LIGO O1 Open Data

Richie

Joined: 7 Mar 14

Posts: 656

Credit: 1702989778

RAC: 0

DoctorNow wrote:floyd

3 Nov 2018 8:10:17 UTC

Message 167630 in response to message 167629

(moderation:

)

DoctorNow wrote:

floyd wrote:
You can't be sure there's actually tasks ready to send, the server status page hasn't been updated for days.

??? Well that's strange...

I use this url for the server-status page and every time I update it it shows other numbers so it can't be true that it isn't updated regularly.

floyd wrote his message at the time when the status page really hadn't been updated for a few days. It was fixed after that.

https://einsteinathome.org/content/eh-server-status-page-not-updated-about-20h

Gary Roberts

Moderator

Joined: 9 Feb 05

Posts: 5887

Credit: 119603591176

RAC: 24856759

DoctorNow wrote:.... Still

3 Nov 2018 8:20:55 UTC

Message 167631 in response to message 167629

(moderation:

)

DoctorNow wrote:

.... Still without any work since O1D1 started... :-\

I'm not doing any O1D1 but when a new search starts, the initial tasks may have a shorter than normal deadline. I'm wondering if the scheduler might be ignoring your requests since you seem to be asking for a rather large amount.

I looked at the most recent contact your host made and it was asking for the following

CPU: req 2419200.00 sec, 6.00 instances; est delay 7710.45
CUDA: req 902178.86 sec, 0.00 instances; est delay 48221.14

which seem to be rather large requests. 2,419,200 is 28 days. 902,178 is nearly 10.5 days. The scheduler seemed to just completely ignore these requests. There was none of the checking of different plan classes in order to identify work suited to your request. It could be that you don't have the O1D1 search properly enabled in your preferences.

The log also showed

send_old_work() no feasible result older than 336.0 hours
send_old_work() no feasible result younger than 248.2 hours and older than 168.0 hours

and I don't recall seeing these sorts of messages previously. Maybe it's something to do with the excessively large requests you were making.

You need to check carefully different locations (venues) if you use them to make sure that the prefs for the correct location are set to allow O1D1 tasks. You need to set your work cache to a very low value - I'd start with 0.2 days to see if that will prompt the scheduler to send you some work. You can always increase it (in small steps) once the work starts to arrive.

Cheers,
Gary.

DoctorNow

Joined: 22 Jan 05

Posts: 12

Credit: 27391545

RAC: 0

Gary Roberts schrieb:I'm

3 Nov 2018 11:23:30 UTC

Message 167632 in response to message 167631

(moderation:

)

Gary Roberts wrote:

I'm wondering if the scheduler might be ignoring your requests since you seem to be asking for a rather large amount.

Well, that's something new to me. I'm always using a cache of 10 days to get enough work to let the host(s) run as long as they can without assistance, and since I never had problems with that I never changed it on any project.

Also I checked the venue settings now and made an update, although I don't remember to have ever changed them. Although this strange interface here from Einstein is always irritating me it seems I got it.

One of the things I've done right now seems to made the change, I got the first tasks! Lucky... ;-)

Life is Science, and Science rules. To the universe and beyond
Proud member of BOINC@Heidelberg

Gary Roberts

Moderator

Joined: 9 Feb 05

Posts: 5887

Credit: 119603591176

RAC: 24856759

DoctorNow wrote:.... I'm

4 Nov 2018 1:00:38 UTC

Message 167642 in response to message 167632

(moderation:

)

DoctorNow wrote:

.... I'm always using a cache of 10 days to get enough work to let the host(s) run as long as they can without assistance, and since I never had problems with that I never changed it on any project.

Strange as it may seem, but setting a 10 day cache (do you also set any 'extra' days?) is pretty much a guarantee that you will need to provide much more "assistance" (ie. continually 'micromanage') for BOINC not to make a total mess of things. Apart from the large cache size, you also support over 50 projects in total and whilst most seem to be inactive, there are around 8 that have a current significant RAC. Without knowing anything about your project mix and the resource shares you allocate, it's quite likely that all the non-Einstein work you already have on board has something to do with why you don't get more Einstein tasks.

I'm glad you were able to get some tasks (3 when I checked just now) but I see that your client was still asking for around 4.5 days worth of extra work so obviously you didn't take much notice of the cache size suggestion I made. You have given very little of the information needed to make recommendations so I'm not making any other than to point out that if/when BOINC gets into panic mode (high priority because of approaching deadlines) you are going to have a much more difficult time for BOINC to "run without assistance".

I would also point out that Einstein tends to be quite reliable and any outages get fixed relatively quickly. You might need a larger cache size for other projects (I wouldn't know) but not so much here. In any case, with the large number of projects you already support, what does it really matter if one has a temporary outage?

There is also an 'anti-social' aspect to excessively large cache sizes. At the moment the Einstein on-line database shows around 2.6M tasks being tracked. The servers struggle at times to process requests and there are often spikes in load that cause requests to time out and need to be repeated. You might say this is a project problem - nothing to do with the volunteers, and whilst it might be up to the project to provide server resources, I would tend to suggest that it's a volunteer responsibility not to unduly waste those resources.

Can you imagine what would happen if every volunteer decided to double or triple their cache size? I'm sure I don't have to spell it out. The current average cache size over the whole volunteer community here is (at a rough guess) around a day or two - some might have more, some less. If that was doubled, there would be 5.2M tasks in the online database and there would probably be a meltdown. So perhaps that might help you put a 10 day cache size into proper perspective.

In my previous response where I showed some lines from the scheduler logs referring to "no feasible tasks older than 336 hours", etc., I think I remember a comment about modifications to 'locality scheduling' (which is used for GW tasks) designed to make sure the second task in a quorum got sent out in a more timely manner. The scheduler used to delay sending the 2nd copy until a host that already had the correct data files on board made a request.. This new check must force old '2nd copy' tasks to be sent immediately irrespective of whether or not the requesting host has the correct data. In other words a more timely completion of a quorum is being gained at the expense of the extra large data file downloads needed to make up for a host not having the correct data.

Cheers,
Gary.

Mr Anderson

Joined: 28 Oct 17

Posts: 40

Credit: 154388538

RAC: 34788

Not getting any work on

5 Nov 2018 4:45:39 UTC

Message 167643

(moderation:

)

Not getting any work on either of my two Windows 10 PCs. I think I'll just let them run idle for now.

Perhaps the event log in BOINC should list the applications that the scheduler believes are selected and then show a short message for the first reason why the scheduler decides not to send any work for each application. Simply showing "got 0 tasks" and "no work sent" is not really cutting it and if there are any problems lurking then this will help to flush them out.

Edit: one day later and suddenly one of my PCs has received work. Don't know what changed.

DoctorNow

Joined: 22 Jan 05

Posts: 12

Credit: 27391545

RAC: 0

Geez, I've not expected such

5 Nov 2018 5:43:21 UTC

Message 167655 in response to message 167642

(moderation:

)

Geez, I've not expected such an indoctrination just because of cache size setting. I'm a long-time BOINC cruncher and thus I know a bit of things from BOINC too, so it was pretty much useless to tell me all that, though I appreciate the effort. ;-)
I have my reasons for my cache size setting, I won't change anything if it isn't really necessary (for me) and so I won't discuss it here much more.

Gary Roberts wrote:

Apart from the large cache size, you also support over 50 projects in total and whilst most seem to be inactive, there are around 8 that have a current significant RAC.

As you may know, RAC doesn't tell anything about what the host/user actually really does. My host had/has only ONE active CPU-project aside the three Einstein tasks.

Quote:

it's quite likely that all the non-Einstein work you already have on board has something to do with why you don't get more Einstein tasks.

No, most likely not. ;-)
It was pretty much "dry" as I tried to get work here every time, so it wasn't really unterstandable that I just got "only" 3 tasks with the amount of work I was asking for - except the server really didn't had nothing to give.
I'm doing most of the time only work from 1 or 2 projects - in some rare cases it can be more, but that happens seldom - until a certain point, then I went on to other ones. I rarely have so much work on my host(s) that the cache is really full - reason see the last passage.

Quote:

that your client was still asking for around 4.5 days worth of extra work

That was just because I've experimented several times with different numbers. Then I just let it be while I was away for a while.
Since the three tasks are finished it's back to the state before - I get nothing while having all things set for getting work and pretty less other project work way under cache size - which I've currently have set pretty low but it obviously doesn't help at new requests, so it seems it was pure coincidence that I got these tasks... :-\
Looking at the server status says it's around 500 tasks to send - probably the server is dry itself most of the time and doesn't give out anything when I compare that with the other subproject numbers.
Which brings me up to the question: Why isn't there more work like on the other subprojects? Is it because it is still new? Will there be more sooner or later?

Quote:

Can you imagine what would happen if every volunteer decided to double or triple their cache size? I'm sure I don't have to spell it out. The current average cache size ...
In my previous response where I showed some lines from the scheduler logs referring to "no feasible tasks older than 336 hours"... In other words a more timely completion of a quorum is being gained at the expense of the extra large data file downloads needed to make up for a host not having the correct data.

Currently I'm not sure if it's already used projectside but if you project guys want to have a better work flow why don't you limit the given out work to a per core/per host or whatever amount? There's quite a few projects which do that. This helps a lot to get work done in a more effective and fluid way. And with such a project side server setting it's almost obsolete how much work cache a user wants... ;-)
Think about it...
I remember isome years ago there were days when I asked for Einstein work (and with a lower cache setting of 10 days!) and got ten times the work I wanted just because the BOINC settings were ignored.
Now it's the exact opposite, LOL.
Heck, I wrote way too much - more than I wanted... ;-)

Life is Science, and Science rules. To the universe and beyond
Proud member of BOINC@Heidelberg

Richie

Joined: 7 Mar 14

Posts: 656

Credit: 1702989778

RAC: 0

I've begun to think that even

5 Nov 2018 8:10:01 UTC

Message 167656

(moderation:

)

I've begun to think that even if the app was released from Beta they might still be observing extra cautiously how the results start to look like. Maybe the pressure in the delivery pipe is still set extremely low. Maybe there are thousands of computers that haven't got a task yet, but users just aren't writing here in masses. Status page says there are over 15000 tasks in progress, which sort of speaks against that theory though.

edit: Okay, I used Windows Task Scheduler to run a script every 10 minutes to automatically "click update" for Einstein. After 50 minutes (10 automatic contacts + a couple of manual clicks) this Windows 10 host got five non-beta tasks. I must believe now tasks are available. It can just require some patience.

* Quota settings on this machine were mild: 3 cores (1 used, 2 available) + store work for at least 0.5 days and no additional work

Jim1348

Joined: 19 Jan 06

Posts: 463

Credit: 257957147

RAC: 0

Richie wrote:Maybe the

5 Nov 2018 14:58:28 UTC

Message 167660 in response to message 167656

(moderation:

)

Richie wrote:

Maybe the pressure in the delivery pipe is still set extremely low. Maybe there are thousands of computers that haven't got a task yet, but users just aren't writing here in masses.

They have probably given up. The absence of news leads to worst-case assumptions.

DoctorNow

Joined: 22 Jan 05

Posts: 12

Credit: 27391545

RAC: 0

Well, this was a big surprise

5 Nov 2018 18:01:19 UTC

Message 167661

(moderation:

)

Well, this was a big surprise now:

I came back to my comp after a few hours and it got a bunch of tasks - a few of them have already finished!
If that continues from now on I'm satisfied. ;-)

Life is Science, and Science rules. To the universe and beyond
Proud member of BOINC@Heidelberg

Darren Peets

Joined: 19 Nov 09

Posts: 37

Credit: 111452336

RAC: 52150

The bugs haven't all been

6 Nov 2018 1:06:54 UTC

Message 167665

(moderation:

)

The bugs haven't all been worked out yet...

https://einsteinathome.org/workunit/376907300

<message>
upload failure: <file_xfer_error>
  <file_name>h1_0104.20_O1C02Cl2In0__O1OD1_104.30Hz_33_8_0</file_name>
  <error_code>-131 (file size too big)</error_code>
</file_xfer_error>
</message>

Gravitational Wave All-sky search on LIGO O1 Open Data

Forums › Technical News

Comment viewing options

Forums › Technical News