Discussion Thread for the Continuous GW Search known as O2MD1 (now O2MDF - GPUs only)

Richie
Richie
Joined: 7 Mar 14
Posts: 656
Credit: 1702989778
RAC: 0

robl wrote:I am seeing

robl wrote:
I am seeing credits in the range of 120 to 1000 across two pcs.

The amount of granted credits seem to be tied with the frequency of a task.

21.70 Hz ... 210 credits
21.80 Hz ... 220
22.10 Hz ... 430
22.40 Hz ... 440
42.00 Hz ... 810
43.40 Hz ... 840
55.55 Hz ... 1000
etc.

That feature makes sense with the freq-runtime-curve. Nice !

Richie
Richie
Joined: 7 Mar 14
Posts: 656
Credit: 1702989778
RAC: 0

I tested if setting '0.1

I tested if setting '0.1 days' of work cache would limit receiving tasks for two hosts that already had some results. At that point they also had work in queue for about 1 day. No. Those hosts were still continuing to download more work (1 task per contact) even after there was already work in queue for about 100 hours. I wonder if they would've hit somekind of ridiculously high daily quota limit in the end. Looks like that setting for work cache can potentially behave almost like a non-limitting ON/OFF switch with these tasks. I'd suggest a little bit of manual monitoring on that.

Jim1348
Jim1348
Joined: 19 Jan 06
Posts: 463
Credit: 257957147
RAC: 0

Richie wrote: No. Those hosts

Richie wrote:
No. Those hosts were still continuing to download more work (1 task per contact) even after there was already work in queue for about 100 hours. I wonder if they would've hit somekind of ridiculously high daily quota limit in the end. Looks like that setting for work cache can potentially behave almost like a non-limitting ON/OFF switch with these tasks. I'd suggest a little bit of manual monitoring on that.

I see you are running BOINC 7.16.3 too.  I had that same problem (even worse) on WCG after upgrading, and posted on their forum.  Someone suggested it was 7.16.3, but I doubted it.  Now I am beginning to wonder what is going on.

EDIT: I think (but am not sure) that it straightens itself out eventually.  Maybe just some initial value is set wrong?

 

Zalster
Zalster
Joined: 26 Nov 13
Posts: 3117
Credit: 4050672230
RAC: 0

So here is a snip bit from

So here is a snip bit from the last time my computer contacted the server. 

2019-10-07 17:26:44.1555 [PID=10141]    [locality] App 'einstein_O1OD1I' (52) not selected
2019-10-07 17:26:44.1555 [PID=10141]    [locality] App 'einstein_O2MD1' (53) not selected

I only had  Continuous Gravitational Wave search O2 All-Sky check

 

 

 
 Binary Radio Pulsar Search (Arecibo)
 Binary Radio Pulsar Search (Arecibo, GPU)
 Gamma-ray pulsar binary search #1
 Gamma-ray pulsar search #5
 Gamma-ray pulsar binary search #1 (GPU)
 Continuous Gravitational Wave search O2 All-Sky
 Gravitational Wave Injection run on LIGO O1 Open Data
 Gravitational Wave search O2 Multi-Directional 
 

 

 

So now I've also checked the last 2 and will see if that makes a difference.

Richie
Richie
Joined: 7 Mar 14
Posts: 656
Credit: 1702989778
RAC: 0

Jim1348 wrote:I see you are

Jim1348 wrote:
I see you are running BOINC 7.16.3 too.  I had that same problem (even worse) on WCG after upgrading, and posted on their forum.  Someone suggested it was 7.16.3, but I doubted it.  Now I am beginning to wonder what is going on.

That's interesting, thanks for posting. It might well be that this has something to do with the Boinc version.

I have currently all hosts set to 'no new tasks'. I'll open the gates on those two hosts with '0.1 days' tomorrow and see if the scheduler still thinks more work should be downloaded.

Jim1348 wrote:
EDIT: I think (but am not sure) that it straightens itself out eventually.  Maybe just some initial value is set wrong?

Actually just a moment ago with one other host I saw behaviour that could possibly support that straightening. This other host had a few dozen tasks in queue at peak, but I had set it to 'no new tasks' and about 6 tasks were left. I tested what would happen now with '0.1 days'. The host started downloading tasks one at a time, but it stopped after there was 16 tasks total (of which 4 are running). Scheduler says now "Not requesting tasks: don't need". So it's happy with 0.1 days being only 16 tasks (will take about 40 hours to drain out). That is positive already. I hope the other hosts will straighten likewise.

REAL-TIME-EDIT: Haha... the scheduler was just bluffing me on that "positive" case. It clearly noticed I was not watching that host and had started to download more tasks behind my back. There's now 30 'in progress' and that number was going up. 'No new tasks' for that system too ! Absolutely can't leave the gate open for them. I'll see if something will eventually change, but I'm starting to believe this is happening for me because of the Boinc v7.16.3 . No problems with the O2MD1 itself whatsoever !

Jim1348
Jim1348
Joined: 19 Jan 06
Posts: 463
Credit: 257957147
RAC: 0

Richie wrote:REAL-TIME-EDIT:

Richie wrote:
REAL-TIME-EDIT: Haha... the scheduler was just bluffing me on that "positive" case. It clearly noticed I was not watching that host and had started to download more tasks behind my back. There's now 30 'in progress' and that number was going up.

Something that might help is that I set all my machines to converge more rapidly to the correct value.

In the cc_config.xml file, insert this:

<cc_config>
  <options>    
      <rec_half_life_days>1.000000</rec_half_life_days>
  </options>
</cc_config>

I am not having major problems at the moment, but am not sure everything is back to normal yet.

 

Keith Myers
Keith Myers
Joined: 11 Feb 11
Posts: 4968
Credit: 18758413457
RAC: 7163457

For anyone running the

For anyone running the default BOINC 7.14.2 version, and changes to the experimental 7.16 branch, you will definitely see changes in work requests.  Lots of changes to how BOINC balances work fetches, deadlines and devices busy among projects in the latest client.

The change in REC in cc_config is very helpful in balancing work requests among projects.

 

Richie
Richie
Joined: 7 Mar 14
Posts: 656
Credit: 1702989778
RAC: 0

Thank you both ! I added that

Thank you both !
I added that extra line now in cc_config.xml. Half-life was a freakin' great game...  and at least a couple of those great moments when playing that masterpiece in my man cave back then... should had recorded some of that atmosphere. I knew. That will be fixed now.

Keith Myers
Keith Myers
Joined: 11 Feb 11
Posts: 4968
Credit: 18758413457
RAC: 7163457

To get an idea of where each

To get an idea of where each project stands in regard to other projects, turn on work_fetch_debug in Event Log logging options for a brief period.  Then look at the output and pay attention to the REC value for each project.  That shows you the ratio of resource share between projects..  For example earlier today I changed my resource share for Milkyway from 100 to 200.  That caused Milkyway to have to "catch up" to Seti.  All my gpus had to shift to exclusive running of MW from their normal Seti work. My Seti resource share is 1000 and when I changed MW to 200, the ratio changed from 1/10 to 1/5.  MW is now caught up and and you can see that the ratio of the REC of Seti (59589923) divided by 5 is now the REC of Milkyway (11947152).

Mon 07 Oct 2019 03:18:27 PM PDT | | [work_fetch] ------- start work fetch state -------
Mon 07 Oct 2019 03:18:27 PM PDT | | [work_fetch] target work buffer: 43200.00 + 0.00 sec
Mon 07 Oct 2019 03:18:27 PM PDT | | [work_fetch] --- project states ---
Mon 07 Oct 2019 03:18:27 PM PDT | Einstein@Home | [work_fetch] REC 1524337.105 prio -9.940 can't request work: "no new tasks" requested via Manager
Mon 07 Oct 2019 03:18:27 PM PDT | GPUGRID | [work_fetch] REC 1846456.036 prio -1.000 can request work
Mon 07 Oct 2019 03:18:27 PM PDT | Milkyway@Home | [work_fetch] REC 11947152.895 prio -3.386 can't request work: scheduler RPC backoff (49.01 sec)
Mon 07 Oct 2019 03:18:27 PM PDT | SETI@home | [work_fetch] REC 59589923.555 prio -2.293 can't request work: scheduler RPC backoff (277.14 sec)

 

Zalster
Zalster
Joined: 26 Nov 13
Posts: 3117
Credit: 4050672230
RAC: 0

So after I made the changes

So after I made the changes to my preferences there is a change. Now seeing the below. Have 8 new O2MD1 in my cache

2019-10-08 01:25:21.2616 [PID=9439 ]    [version] Checking plan class 'LIBC215'
2019-10-08 01:25:21.2645 [PID=9439 ]    [version] reading plan classes from file '/2019-10-08 01:25:21.4638 [PID=9441 ]   SCHEDULER_REQUEST::parse(): unrecognized: <allow_multiple_clients>0</allow_multiple_clients>
2019-10-08 01:25:21.2646 [PID=9439 ]    [version] plan class ok
2019-10-08 01:25:21.2647 [PID=9439 ]    [version] Best version of app einstein_O2MD1 is 1.01 ID 1188 LIBC215 (8.90 GFLOPS

 

 

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.