All-Sky Gravitational Wave Search on O3 data (O3ASHF1)

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5883
Credit: 119076654578
RAC: 24079222

I, too, can't get any BuB

I, too, can't get any BuB work.

I haven't tried with the existing fleet since they all have plenty of the former stuff for the moment.  I fired up a couple of 'temporarily retired BRP7 hosts' that have 2GB AMD GPUs and were shut down when the summer heat really hit.  Here is the full scheduler log (dates truncated) as of 2025-01-23 05:31:36 UTC for one of them.

 



 [PID=131157]   Request: [USER#xxxxx] [HOST#xxxxx] [IP xxx.xxx.xxx.80] client 7.16.11
 [PID=131157] [debug]  [HOST#0] rpc_time:0 timezone:36000 d_total:41957146624.000000 d_free:29749141504.000000 d_boinc_used_total:9526673408.000000 d_boinc_used_project:9513607168.000000 d_boinc_max:0.000000
 [PID=131157] [debug]   have_master:1 have_working: 1 have_db: 1
 [PID=131157] [debug]   using working prefs
 [PID=131157] [debug]   have db 1; dbmod 1712097450.000000; global mod 0.000000
 [PID=131157]    [send] effective_ncpus 4 max_jobs_on_host_cpu 999999 max_jobs_on_host 999999
 [PID=131157]    [send] effective_ngpus 1 max_jobs_on_host_gpu 999999
 [PID=131157]    [send] Not using matchmaker scheduling; Not using EDF sim
 [PID=131157]    [send] CPU: req 0.00 sec, 0.00 instances; est delay 0.00
 [PID=131157]    [send] ATI: req 8640.00 sec, 1.00 instances; est delay 0.00
 [PID=131157]    [send] work_req_seconds: 0.00 secs
 [PID=131157]    [send] available disk 11.13 GB, work_buf_min 8640
 [PID=131157]    [send] active_frac 1.000000 on_frac 0.949559 DCF 0.105918
 [PID=131157]    [mixed] sending locality work first (0.3306)
 [PID=131157]    [send] send_old_work() no feasible result older than 336.0 hours
 [PID=131157]    [send] send_old_work() no feasible result younger than 281.8 hours and older than 168.0 hours
 [PID=131157]    [mixed] sending non-locality work second
 [PID=131157] [debug]   [HOST#xxxxx] MSG(high) No work sent
 [PID=131157]    Sending reply to [HOST#92619]: 0 results, delay req 60.00
 [PID=131157]    Scheduler ran 2.090 seconds

 

Unless I'm quite blind - always possible - I don't see any particular error condition and I wondered if the problem might have been that the former VRAM restriction hadn't been eased.  However I swapped the 2GB unit for a spare 4GB device and it made no difference.  So I guess that just leaves the plan class thing which didn't show up on the two hosts I tried.  Fortunately, I wasn't lucky enough to score a Bu resend when I had the 4GB card in the machine :-).

 

 

 

 

Cheers,
Gary.

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4345
Credit: 252815400
RAC: 41141

Quote:Ian&Steve C.

Quote:

Ian&Steve C. wrote:


2025-01-23 02:50:17.1142 [PID=4019067] [CRITICAL] Unknown plan class: GW-opencl-ati-3

Oh dear - thanks a ton for reporting! Copy/paste error.

BM

Wedge009
Wedge009
Joined: 5 Mar 05
Posts: 138
Credit: 17821249211
RAC: 6470711

Thanks for fixing!

Thanks for fixing!

Soli Deo Gloria

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4345
Credit: 252815400
RAC: 41141

Wedge009 wrote:Yes, I'd

Wedge009 wrote:

Yes, I'd been noticing the complaints about WU 'too new' (and 'old') but don't really know what that means. I presume it's not something at my end.

That's a problem of the transition from old to new workunits & tasks. The scheduler picks a task for your client's work requests and then tries to find a suitable application version to process it. when this is a new task and it checks an app version for new-style workunits (plan class ending in "-3"), it finds the workunit "too old" for that app version. And vice versa.

The problem is that both app versions have different requirements, and the scheduler first picks the task, then the app version for it. So it could happen that the scheduler can't find an app version suitable for your host that can run the task it already picked, and you won't get any (GW) work sent on that request.

BM

Ian&Steve C.
Ian&Steve C.
Joined: 19 Jan 20
Posts: 4115
Credit: 49133131834
RAC: 32310476

hey Bernd, it's no rush, but

hey Bernd, it's no rush, but do you happen to have an ETA on getting the validator for BuB up and running? I can see the tasks waiting for validation stacking up. about 4000 waiting per the server status page.

_________________________________________________________________________

Link
Link
Joined: 15 Mar 20
Posts: 137
Credit: 12276718
RAC: 37544

Bernd Machenschalk wrote:The

Bernd Machenschalk wrote:
The tasks themselves (total frequency range, data required etc.) don't change, the only thing that changes is the command-line for the app and thus how these tasks are processed by it.

For the iGPU of my Ryzen 5700G the runtime increased from around 16k seconds to 18.4k seconds and the CPU time from 2.7k seconds to 4.9k seconds. If we still process same data and generate same results, this new way doesn't seem to be very efficient IMHO. Or is that just some kind of iGPU issue and "normal" GPUs process the tasks at same speed as they did with the old way of processing?

.

Wedge009
Wedge009
Joined: 5 Mar 05
Posts: 138
Credit: 17821249211
RAC: 6470711

Ian&Steve C. wrote:yeah

Ian&Steve C. wrote:

yeah pretty much. tasks are slower by a small margin. about the same amount of time that the toplist calculation takes since there's two of those instead of one. plus maybe a little bit of overhead in stopping and starting

Link wrote:

For the iGPU of my Ryzen 5700G the runtime increased from around 16k seconds to 18.4k seconds and the CPU time from 2.7k seconds to 4.9k seconds.

I accept the variation will depend on hardware and run configurations. Of my machines that have completed BuB tasks, the run time has increased by around 40% compared with Bu tasks. If BRP7 didn't have such a high risk of invalid results returned, I'd likely be better off returning to those.

I deliberately do not 'optimise' my run configurations for a number of reasons, I understand that this increase is likely mitigated if I was to do so.

Soli Deo Gloria

Harri Liljeroos
Harri Liljeroos
Joined: 10 Dec 05
Posts: 4562
Credit: 3325377930
RAC: 1760038

My results of the new O3ASBuB

My results of the new O3ASBuB seems to indicate about 15-16 % increase in run time on both of my two hosts running single tasks (one has 2 x GTX1070 and the other a RTX-4070 Ti Super).

mountkidd
mountkidd
Joined: 14 Jun 12
Posts: 179
Credit: 12937490626
RAC: 6009172

My Win/NV3060ti/ocl/116/x2

My Win/NV3060ti/ocl/116/x2 system sees a drop of 22% on O3ASBuB without any config changes, reducing RAC by 150k/day to 700k/day.  BRP7 was doing 530k/day.  BuB for the win! 

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4345
Credit: 252815400
RAC: 41141

Ian&Steve C. wrote: hey

Ian&Steve C. wrote:

hey Bernd, it's no rush, but do you happen to have an ETA on getting the validator for BuB up and running? I can see the tasks waiting for validation stacking up. about 4000 waiting per the server status page.

Working on it. Some trouble with the first ~1000 "new" WUs that were issued.

BM

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.