All-Sky Gravitational Wave Search on O3 data (O3ASHF1)

Gary Roberts

Moderator

Joined: 9 Feb 05

Posts: 5883

Credit: 119076654578

RAC: 24079222

I, too, can't get any BuB

23 Jan 2025 6:28:05 UTC

Message 232418 in response to message 232415

(moderation:

)

I, too, can't get any BuB work.

I haven't tried with the existing fleet since they all have plenty of the former stuff for the moment. I fired up a couple of 'temporarily retired BRP7 hosts' that have 2GB AMD GPUs and were shut down when the summer heat really hit. Here is the full scheduler log (dates truncated) as of 2025-01-23 05:31:36 UTC for one of them.

 [PID=131157]   Request: [USER#xxxxx] [HOST#xxxxx] [IP xxx.xxx.xxx.80] client 7.16.11
 [PID=131157] [debug]  [HOST#0] rpc_time:0 timezone:36000 d_total:41957146624.000000 d_free:29749141504.000000 d_boinc_used_total:9526673408.000000 d_boinc_used_project:9513607168.000000 d_boinc_max:0.000000
 [PID=131157] [debug]   have_master:1 have_working: 1 have_db: 1
 [PID=131157] [debug]   using working prefs
 [PID=131157] [debug]   have db 1; dbmod 1712097450.000000; global mod 0.000000
 [PID=131157]    [send] effective_ncpus 4 max_jobs_on_host_cpu 999999 max_jobs_on_host 999999
 [PID=131157]    [send] effective_ngpus 1 max_jobs_on_host_gpu 999999
 [PID=131157]    [send] Not using matchmaker scheduling; Not using EDF sim
 [PID=131157]    [send] CPU: req 0.00 sec, 0.00 instances; est delay 0.00
 [PID=131157]    [send] ATI: req 8640.00 sec, 1.00 instances; est delay 0.00
 [PID=131157]    [send] work_req_seconds: 0.00 secs
 [PID=131157]    [send] available disk 11.13 GB, work_buf_min 8640
 [PID=131157]    [send] active_frac 1.000000 on_frac 0.949559 DCF 0.105918
 [PID=131157]    [mixed] sending locality work first (0.3306)
 [PID=131157]    [send] send_old_work() no feasible result older than 336.0 hours
 [PID=131157]    [send] send_old_work() no feasible result younger than 281.8 hours and older than 168.0 hours
 [PID=131157]    [mixed] sending non-locality work second
 [PID=131157] [debug]   [HOST#xxxxx] MSG(high) No work sent
 [PID=131157]    Sending reply to [HOST#92619]: 0 results, delay req 60.00
 [PID=131157]    Scheduler ran 2.090 seconds

Unless I'm quite blind - always possible - I don't see any particular error condition and I wondered if the problem might have been that the former VRAM restriction hadn't been eased. However I swapped the 2GB unit for a spare 4GB device and it made no difference. So I guess that just leaves the plan class thing which didn't show up on the two hosts I tried. Fortunately, I wasn't lucky enough to score a Bu resend when I had the 4GB card in the machine :-).

Cheers,
Gary.

Bernd Machenschalk

Moderator

Administrator

Joined: 15 Oct 04

Posts: 4345

Credit: 252815400

RAC: 41141

Quote:Ian&Steve C.

23 Jan 2025 6:30:18 UTC

Message 232419 in response to message 232414

(moderation:

)

Quote:

Ian&Steve C. wrote:

2025-01-23 02:50:17.1142 [PID=4019067] [CRITICAL] Unknown plan class: GW-opencl-ati-3

Oh dear - thanks a ton for reporting! Copy/paste error.

Wedge009

Joined: 5 Mar 05

Posts: 138

Credit: 17821249211

RAC: 6470711

Thanks for fixing!

23 Jan 2025 6:57:10 UTC

Message 232421

(moderation:

)

Thanks for fixing!

Soli Deo Gloria

Bernd Machenschalk

Moderator

Administrator

Joined: 15 Oct 04

Posts: 4345

Credit: 252815400

RAC: 41141

Wedge009 wrote:Yes, I'd

23 Jan 2025 11:40:00 UTC

Message 232430 in response to message 232413

(moderation:

)

Wedge009 wrote:

Yes, I'd been noticing the complaints about WU 'too new' (and 'old') but don't really know what that means. I presume it's not something at my end.

That's a problem of the transition from old to new workunits & tasks. The scheduler picks a task for your client's work requests and then tries to find a suitable application version to process it. when this is a new task and it checks an app version for new-style workunits (plan class ending in "-3"), it finds the workunit "too old" for that app version. And vice versa.

The problem is that both app versions have different requirements, and the scheduler first picks the task, then the app version for it. So it could happen that the scheduler can't find an app version suitable for your host that can run the task it already picked, and you won't get any (GW) work sent on that request.

Ian&Steve C.

Joined: 19 Jan 20

Posts: 4115

Credit: 49133131834

RAC: 32310476

hey Bernd, it's no rush, but

23 Jan 2025 15:05:59 UTC

Message 232437

(moderation:

)

hey Bernd, it's no rush, but do you happen to have an ETA on getting the validator for BuB up and running? I can see the tasks waiting for validation stacking up. about 4000 waiting per the server status page.

_________________________________________________________________________

Link

Joined: 15 Mar 20

Posts: 137

Credit: 12276718

RAC: 37544

Bernd Machenschalk wrote:The

23 Jan 2025 20:09:21 UTC

Message 232441 in response to message 232375

(moderation:

)

Bernd Machenschalk wrote:

The tasks themselves (total frequency range, data required etc.) don't change, the only thing that changes is the command-line for the app and thus how these tasks are processed by it.

For the iGPU of my Ryzen 5700G the runtime increased from around 16k seconds to 18.4k seconds and the CPU time from 2.7k seconds to 4.9k seconds. If we still process same data and generate same results, this new way doesn't seem to be very efficient IMHO. Or is that just some kind of iGPU issue and "normal" GPUs process the tasks at same speed as they did with the old way of processing?

Wedge009

Joined: 5 Mar 05

Posts: 138

Credit: 17821249211

RAC: 6470711

Ian&Steve C. wrote:yeah

23 Jan 2025 20:48:37 UTC

Message 232442 in response to message 232441

(moderation:

)

Ian&Steve C. wrote:

yeah pretty much. tasks are slower by a small margin. about the same amount of time that the toplist calculation takes since there's two of those instead of one. plus maybe a little bit of overhead in stopping and starting

Link wrote:

For the iGPU of my Ryzen 5700G the runtime increased from around 16k seconds to 18.4k seconds and the CPU time from 2.7k seconds to 4.9k seconds.

I accept the variation will depend on hardware and run configurations. Of my machines that have completed BuB tasks, the run time has increased by around 40% compared with Bu tasks. If BRP7 didn't have such a high risk of invalid results returned, I'd likely be better off returning to those.

I deliberately do not 'optimise' my run configurations for a number of reasons, I understand that this increase is likely mitigated if I was to do so.

Soli Deo Gloria

Harri Liljeroos

Joined: 10 Dec 05

Posts: 4562

Credit: 3325377930

RAC: 1760038

My results of the new O3ASBuB

23 Jan 2025 22:02:50 UTC

Message 232443

(moderation:

)

My results of the new O3ASBuB seems to indicate about 15-16 % increase in run time on both of my two hosts running single tasks (one has 2 x GTX1070 and the other a RTX-4070 Ti Super).

mountkidd

Joined: 14 Jun 12

Posts: 179

Credit: 12937490626

RAC: 6009172

My Win/NV3060ti/ocl/116/x2

23 Jan 2025 22:25:15 UTC

Message 232445 in response to message 232442

(moderation:

)

My Win/NV3060ti/ocl/116/x2 system sees a drop of 22% on O3ASBuB without any config changes, reducing RAC by 150k/day to 700k/day. BRP7 was doing 530k/day. BuB for the win!

Bernd Machenschalk

Moderator

Administrator

Joined: 15 Oct 04

Posts: 4345

Credit: 252815400

RAC: 41141

Ian&Steve C. wrote: hey

24 Jan 2025 10:20:26 UTC

Message 232464 in response to message 232437

(moderation:

)

Ian&Steve C. wrote:

hey Bernd, it's no rush, but do you happen to have an ETA on getting the validator for BuB up and running? I can see the tasks waiting for validation stacking up. about 4000 waiting per the server status page.

Working on it. Some trouble with the first ~1000 "new" WUs that were issued.

All-Sky Gravitational Wave Search on O3 data (O3ASHF1)

Forums › Technical News

Comment viewing options

Forums › Technical News