Discussion Thread for the Continuous GW Search known as O2MD1 (now O2MDF - GPUs only)

Richie

Joined: 7 Mar 14

Posts: 656

Credit: 1702989778

RAC: 0

robl wrote:I am seeing

7 Oct 2019 17:01:01 UTC

Message 173737 in response to message 173734

(moderation:

)

robl wrote:

I am seeing credits in the range of 120 to 1000 across two pcs.

The amount of granted credits seem to be tied with the frequency of a task.

21.70 Hz ... 210 credits
21.80 Hz ... 220
22.10 Hz ... 430
22.40 Hz ... 440
42.00 Hz ... 810
43.40 Hz ... 840
55.55 Hz ... 1000
etc.

That feature makes sense with the freq-runtime-curve. Nice !

Richie

Joined: 7 Mar 14

Posts: 656

Credit: 1702989778

RAC: 0

I tested if setting '0.1

7 Oct 2019 17:34:17 UTC

Message 173738

(moderation:

)

I tested if setting '0.1 days' of work cache would limit receiving tasks for two hosts that already had some results. At that point they also had work in queue for about 1 day. No. Those hosts were still continuing to download more work (1 task per contact) even after there was already work in queue for about 100 hours. I wonder if they would've hit somekind of ridiculously high daily quota limit in the end. Looks like that setting for work cache can potentially behave almost like a non-limitting ON/OFF switch with these tasks. I'd suggest a little bit of manual monitoring on that.

Jim1348

Joined: 19 Jan 06

Posts: 463

Credit: 257957147

RAC: 0

Richie wrote: No. Those hosts

7 Oct 2019 18:34:43 UTC

Message 173739 in response to message 173738

(moderation:

)

Richie wrote:

No. Those hosts were still continuing to download more work (1 task per contact) even after there was already work in queue for about 100 hours. I wonder if they would've hit somekind of ridiculously high daily quota limit in the end. Looks like that setting for work cache can potentially behave almost like a non-limitting ON/OFF switch with these tasks. I'd suggest a little bit of manual monitoring on that.

I see you are running BOINC 7.16.3 too. I had that same problem (even worse) on WCG after upgrading, and posted on their forum. Someone suggested it was 7.16.3, but I doubted it. Now I am beginning to wonder what is going on.

EDIT: I think (but am not sure) that it straightens itself out eventually. Maybe just some initial value is set wrong?

Zalster

Joined: 26 Nov 13

Posts: 3117

Credit: 4050672230

RAC: 0

So here is a snip bit from

7 Oct 2019 20:26:26 UTC

Message 173740

(moderation:

)

So here is a snip bit from the last time my computer contacted the server.

2019-10-07 17:26:44.1555 [PID=10141]    [locality] App 'einstein_O1OD1I' (52) not selected
2019-10-07 17:26:44.1555 [PID=10141]    [locality] App 'einstein_O2MD1' (53) not selected

I only had Continuous Gravitational Wave search O2 All-Sky check

Binary Radio Pulsar Search (Arecibo)

Binary Radio Pulsar Search (Arecibo, GPU)

Gamma-ray pulsar binary search #1

Gamma-ray pulsar search #5

Gamma-ray pulsar binary search #1 (GPU)

Continuous Gravitational Wave search O2 All-Sky

Gravitational Wave Injection run on LIGO O1 Open Data

Gravitational Wave search O2 Multi-Directional

So now I've also checked the last 2 and will see if that makes a difference.

Richie

Joined: 7 Mar 14

Posts: 656

Credit: 1702989778

RAC: 0

Jim1348 wrote:I see you are

7 Oct 2019 21:21:25 UTC

Message 173741 in response to message 173739

(moderation:

)

Jim1348 wrote:

I see you are running BOINC 7.16.3 too. I had that same problem (even worse) on WCG after upgrading, and posted on their forum. Someone suggested it was 7.16.3, but I doubted it. Now I am beginning to wonder what is going on.

That's interesting, thanks for posting. It might well be that this has something to do with the Boinc version.

I have currently all hosts set to 'no new tasks'. I'll open the gates on those two hosts with '0.1 days' tomorrow and see if the scheduler still thinks more work should be downloaded.

Jim1348 wrote:

EDIT: I think (but am not sure) that it straightens itself out eventually. Maybe just some initial value is set wrong?

Actually just a moment ago with one other host I saw behaviour that could possibly support that straightening. This other host had a few dozen tasks in queue at peak, but I had set it to 'no new tasks' and about 6 tasks were left. I tested what would happen now with '0.1 days'. The host started downloading tasks one at a time, but it stopped after there was 16 tasks total (of which 4 are running). Scheduler says now "Not requesting tasks: don't need". So it's happy with 0.1 days being only 16 tasks (will take about 40 hours to drain out). That is positive already. I hope the other hosts will straighten likewise.

REAL-TIME-EDIT: Haha... the scheduler was just bluffing me on that "positive" case. It clearly noticed I was not watching that host and had started to download more tasks behind my back. There's now 30 'in progress' and that number was going up. 'No new tasks' for that system too ! Absolutely can't leave the gate open for them. I'll see if something will eventually change, but I'm starting to believe this is happening for me because of the Boinc v7.16.3 . No problems with the O2MD1 itself whatsoever !

Jim1348

Joined: 19 Jan 06

Posts: 463

Credit: 257957147

RAC: 0

Richie wrote:REAL-TIME-EDIT:

7 Oct 2019 21:23:27 UTC

Message 173742 in response to message 173741

(moderation:

)

Richie wrote:

REAL-TIME-EDIT: Haha... the scheduler was just bluffing me on that "positive" case. It clearly noticed I was not watching that host and had started to download more tasks behind my back. There's now 30 'in progress' and that number was going up.

Something that might help is that I set all my machines to converge more rapidly to the correct value.

In the cc_config.xml file, insert this:

<cc_config>
<options>
<rec_half_life_days>1.000000</rec_half_life_days>
</options>
</cc_config>

I am not having major problems at the moment, but am not sure everything is back to normal yet.

Keith Myers

Joined: 11 Feb 11

Posts: 4968

Credit: 18758958594

RAC: 7159054

For anyone running the

7 Oct 2019 21:31:24 UTC

Message 173743

(moderation:

)

For anyone running the default BOINC 7.14.2 version, and changes to the experimental 7.16 branch, you will definitely see changes in work requests. Lots of changes to how BOINC balances work fetches, deadlines and devices busy among projects in the latest client.

The change in REC in cc_config is very helpful in balancing work requests among projects.

Richie

Joined: 7 Mar 14

Posts: 656

Credit: 1702989778

RAC: 0

Thank you both ! I added that

7 Oct 2019 22:13:41 UTC

Message 173746

(moderation:

)

Thank you both !
I added that extra line now in cc_config.xml. Half-life was a freakin' great game... and at least a couple of those great moments when playing that masterpiece in my man cave back then... should had recorded some of that atmosphere. I knew. That will be fixed now.

Keith Myers

Joined: 11 Feb 11

Posts: 4968

Credit: 18758958594

RAC: 7159054

To get an idea of where each

7 Oct 2019 22:31:13 UTC

Message 173748

(moderation:

)

To get an idea of where each project stands in regard to other projects, turn on work_fetch_debug in Event Log logging options for a brief period. Then look at the output and pay attention to the REC value for each project. That shows you the ratio of resource share between projects.. For example earlier today I changed my resource share for Milkyway from 100 to 200. That caused Milkyway to have to "catch up" to Seti. All my gpus had to shift to exclusive running of MW from their normal Seti work. My Seti resource share is 1000 and when I changed MW to 200, the ratio changed from 1/10 to 1/5. MW is now caught up and and you can see that the ratio of the REC of Seti (59589923) divided by 5 is now the REC of Milkyway (11947152).

Mon 07 Oct 2019 03:18:27 PM PDT | | [work_fetch] ------- start work fetch state -------
Mon 07 Oct 2019 03:18:27 PM PDT | | [work_fetch] target work buffer: 43200.00 + 0.00 sec
Mon 07 Oct 2019 03:18:27 PM PDT | | [work_fetch] --- project states ---
Mon 07 Oct 2019 03:18:27 PM PDT | Einstein@Home | [work_fetch] REC 1524337.105 prio -9.940 can't request work: "no new tasks" requested via Manager
Mon 07 Oct 2019 03:18:27 PM PDT | GPUGRID | [work_fetch] REC 1846456.036 prio -1.000 can request work
Mon 07 Oct 2019 03:18:27 PM PDT | Milkyway@Home | [work_fetch] REC 11947152.895 prio -3.386 can't request work: scheduler RPC backoff (49.01 sec)
Mon 07 Oct 2019 03:18:27 PM PDT | SETI@home | [work_fetch] REC 59589923.555 prio -2.293 can't request work: scheduler RPC backoff (277.14 sec)

Zalster

Joined: 26 Nov 13

Posts: 3117

Credit: 4050672230

RAC: 0

So after I made the changes

8 Oct 2019 2:28:14 UTC

Message 173751 in response to message 173740

(moderation:

)

So after I made the changes to my preferences there is a change. Now seeing the below. Have 8 new O2MD1 in my cache

2019-10-08 01:25:21.2616 [PID=9439 ]    [version] Checking plan class 'LIBC215'
2019-10-08 01:25:21.2645 [PID=9439 ]    [version] reading plan classes from file '/2019-10-08 01:25:21.4638 [PID=9441 ]   SCHEDULER_REQUEST::parse(): unrecognized: <allow_multiple_clients>0</allow_multiple_clients>
2019-10-08 01:25:21.2646 [PID=9439 ]    [version] plan class ok
2019-10-08 01:25:21.2647 [PID=9439 ]    [version] Best version of app einstein_O2MD1 is 1.01 ID 1188 LIBC215 (8.90 GFLOPS

Discussion Thread for the Continuous GW Search known as O2MD1 (now O2MDF - GPUs only)

Forums › Cruncher's Corner

Comment viewing options

Forums › Cruncher's Corner