Testing Radeon VII on All-Sky Gravitational Wave O3 (O3AS)

pututu
pututu
Joined: 6 Apr 17
Posts: 61
Credit: 653,417,392
RAC: 630
Topic 231060

After the end of MilkyWay gpu apps, I've been casually searching for BOINC projects to put Radeon VII to good use. Radeon VII is four generations older when compare to AMD Radeon RX 7000 series, so it's an aging card but I think it does age well...

The O3AS sub-project seems to be the best fit, assuming the credit is still at 10,000 per task. I'm guessing there are a lot of VII cards out there that could be put to good use to speed up OA3S search.

A quick summary of O3AS setup:

  • Best paired with fast CPU. Depending on the gpu and cpu combo, it could take 50% (rough ball park) of the time just doing cpu calculation while the gpu remains at low utilization.
  • Due to the above reason, run multiple tasks by staggering each task appropriately such that the gpu remains close to 100% utilized.
  • Running in Linux is faster than Windows setup but I had problem trying to control the power limit/voltage/frequency in Linux using corectrl utility. 

 

Results:

  1. Under Windows. I've posted the result in previous post here. The VII was paired with 7950x running at fixed 4.2GHz. I undervolt the cpu but could easily run at 5GHz with higher voltage and power. Avg run time ~402 secs/task or about 2.1M PPD. Better run time can be achieved with PBO enabled coupled with CO offset but I never used it, lol.
  2. Under Linux. I tested this recently and it is paired with 3950x running with two scenarios:
    1. @3.6GHz (fixed no PBO used), 4 tasks per gpu, with 100% gpu power. Avg runtime ~312 secs/task or ~2.7M PPD. I'm estimating the gpu is consuming north of 250W.
    2. @3.8GHz, 6 tasks per gpu with corectrl control, gpu power ~ 180W. Est average run time ~ 280 secs/task or ~ 3M PPD. Note: after running 20+ tasks, the corectrl crashed and system froze. I'm still unable to get the corectrl to work properly after the crashed, hence no further work.

It will be good if someone with VII card can duplicate the above findings.

Note: Since I didn't run hundred of tasks, there is a strong likelihood that you could lose the task staggering setup which may impact the runtime/PPD. 

 

 

tictoc
tictoc
Joined: 1 Jan 13
Posts: 43
Credit: 6,854,016,425
RAC: 8,265,509

Unless you babysit the host,

Unless you babysit the host, there is enough variance in task runtime, that they will eventually sync up.

After monitoring a few tasks, maximum GPU VRAM usage looks to be about 2.8GB/task with the current run of tasks.  Your system crash at 6x tasks was likely due to exceeding the 16GB VRAM on the Radeon VII.  5x is really the highest you can go on the VII, and if it's the primary GPU in the system, I would set it at 4x.

I started running GW tasks on two Radeon VII's again yesterday, and it looks like they will sit at about 1.9-2M ppd per GPU. https://einsteinathome.org/host/12883788  Those cards are running 4 tasks per GPU with GPU clocks set at 1800core 1000mem and no power limit.  Out of the box the efficiency isn't great, but there is significant room for improvement by under-volting and power-limiting the VII's. 

pututu
pututu
Joined: 6 Apr 17
Posts: 61
Credit: 653,417,392
RAC: 630

Yes, some tasks but not all

Yes, some tasks but not all will eventually sync up. However from my experience running MW gpu apps with 6 tasks in VII, I don't ever recall seeing all the 6 tasks sync up all at once. Perhaps anywhere from 2 to 4 tasks but there is still another group of 2 to 4 tasks that will be out of sync from the first group. I think the key is to keep the gpu busy at all times. Hence running more tasks couple with variation in run time might help to minimize this.

I don't think this is the vram limitation as the corectrl screenshot in the first post shows 13.8MB of usage with six O3 tasks running concurrently. On 3080 Ti with linux OS, each O3 task needs about 2.2GB of vram as a comparison.

I'm guessing that your longer runtime is related to your cpu speed. Are you running any other cpu projects? My 3950x only runs O3 tasks.

Looking at one of your tasks with long run time, here is the stderr.txt output showing longer cpu run time:

2024-05-01 10:06:48.3649 (2725229) [normal]: Finished main analysis.
2024-05-01 10:06:48.3650 (2725229) [normal]: Recalculating statistics for the final toplist(s)...
2024-05-01 10:14:20.4889 (2725229) [normal]: Finished recalculating toplist statistics.
2024-05-01 10:14:20.4890 (2725229) [normal]: Finished in 1029.58 s with peak RAM usage: 1074.0 MB on CPU 'AMD Ryzen Threadripper 3960X 24-Core Processor ', peak VRAM usage: 1681.3 MB on GPU Device: 'gfx906:sramecc+:xnack- ( Platform: AMD Accelerated Parallel Processing )' with backend: 'OpenCL'.

It takes about (10:06:48 - 10:14:20) 7 mins 32 secs to complete the cpu recalculation steps. Seems a bit too long compare to mine (see below). Also the peak VRAM usage is 1681MB indicating that running six tasks should be sufficient on VII.

 

Here is the stderr output for one of the completed tasks running on my PC. The cpu recalculation step took only 2 mins 52 secs.

2024-04-28 14:32:36.9604 (8442) [normal]: Finished main analysis.
2024-04-28 14:32:36.9605 (8442) [normal]: Recalculating statistics for the final toplist(s)...
2024-04-28 14:35:28.2283 (8442) [normal]: Finished recalculating toplist statistics.
2024-04-28 14:35:28.2284 (8442) [normal]: Finished in 865.46 s with peak RAM usage: 1050.0 MB on CPU 'AMD Ryzen 9 3950X 16-Core Processor            ', peak VRAM usage: 1681.0 MB on GPU Device: 'gfx906:sramecc+:xnack- ( Platform: AMD Accelerated Parallel Processing )' with backend: 'OpenCL'.

 

 

 

tictoc
tictoc
Joined: 1 Jan 13
Posts: 43
Credit: 6,854,016,425
RAC: 8,265,509

Looking back through some

Looking back through some usage logs from last night, running at 4x tasks the majority of the time they are at different stages of computation, but there are a handful of instances when all 4 tasks are at roughly the same stage.

The VRAM usage reported by the task doesn't track with the VRAM usage reported by the GPU/driver.  Monitoring the usage reported by the GPU, the highest reported usage at 4x tasks was 11.1GB.  Those GPUs are only running Einstein, a 6900xt is the primary GPU in that system. 

The outlier CPU runtimes might have to do with the fact that I only have 14 threads pinned to the BOINC instance running GPU tasks.  That same task that you linked spent half as much time in the 2nd recalc.  There is likely some CPU contention depending on how many tasks are doing the CPU portion of the task at the same time.  If I run any CPU projects on this machine, they are pinned to the first 32 threads running a separate BOINC instance. The 3960x in that system is running at a fixed 4.15GHz.

I'll try and run some tasks in a clean environment to see what the results are like.  When I did this previously, I think I settled on 4x because the overall throughput was only marginally better at 5x, and not worth it to me with the additional CPU usage.

Chooka
Chooka
Joined: 11 Feb 13
Posts: 134
Credit: 3,590,615,759
RAC: 1,610,986

Slightly off topic here but

Slightly off topic here but as fellow Radeon VII owners, have you ever seen the Driver timeout error message appear? I've seen it multiple times over the years but suspect the card might be on it's way out?

It basically doubles the time to complete a task if I leave it. A pc restart will fix the issue but not for long.

I'm only running 3 wu's at a time , Meerkat. 

Surely you guys have seen this with your Radeon VII's?

 

Edit - Actually I just woke this morning to find this error and a boinc message I've never seen before! 

"Postponed: Not enough free CPU/GPU memory available! Delaying next attempt for at least 15 minutes" 

I'm not running any cpu work. GPU memory on it's way out? 

Bizarrely, this is just showing for 2 wu which are "postponed" although there are still 3 wu running concurrently. 

 


mikey
mikey
Joined: 22 Jan 05
Posts: 12,555
Credit: 1,838,847,225
RAC: 24,737

Chooka wrote:Slightly off

deleted

MAGIC Quantum Mechanic
MAGIC Quantum M...
Joined: 18 Jan 05
Posts: 1,857
Credit: 1,350,835,936
RAC: 1,530,255

Quote: I have one  AMD Radeon

 I have one  AMD Radeon RX 570 and it started doing that and then when I checked the drivers had magically disappeared so I tried to d/l the newest drivers and after hours of doing that it then said the drivers will not work with my AMD Radeon RX 570.......even though I did the newest one for that card and that pc's OS

So since I still had the old one saved in my download collection I just reinstalled it again and rebooted and it magically started working again and is running All Sky right now (this is my only AMD and is running with AMD cpu) 

 

Quote:

pututu
pututu
Joined: 6 Apr 17
Posts: 61
Credit: 653,417,392
RAC: 630

Chooka wrote: Slightly off

Chooka wrote:

Slightly off topic here but as fellow Radeon VII owners, have you ever seen the Driver timeout error message appear? I've seen it multiple times over the years but suspect the card might be on it's way out?

It basically doubles the time to complete a task if I leave it. A pc restart will fix the issue but not for long.

I'm only running 3 wu's at a time , Meerkat. 

Surely you guys have seen this with your Radeon VII's?

 

Edit - Actually I just woke this morning to find this error and a boinc message I've never seen before! 

"Postponed: Not enough free CPU/GPU memory available! Delaying next attempt for at least 15 minutes" 

I'm not running any cpu work. GPU memory on it's way out? 

Bizarrely, this is just showing for 2 wu which are "postponed" although there are still 3 wu running concurrently. 

 

 

I've no issue running five O3AS tasks with AMD driver 23.5.2 as posted here. I did run BRP7 on VII with that same driver version but for only for about a dozen tasks only early this year. I think I ran 4 tasks per gpu. Perhaps haven't tested long enough...

 

What driver version are you using for this PC? 

 

Chooka
Chooka
Joined: 11 Feb 13
Posts: 134
Credit: 3,590,615,759
RAC: 1,610,986

I'm running 24.3.1 but I've

I'm running 24.3.1 but I've seen this occur over multiple drivers. 

Ok, I'm giving O3AS a go. Trying 4 task, cpu crunching of other projects suspended. Stock gpu speeds.

Gee the utilisation is 89% and temps 60C! That's impressive for 4 tasks! I hope it holds up. 

It's only pulling 120W!. Surely not!! (It's true though)

With such low wattage, it may not fail. Fingers crossed.

 


Chooka
Chooka
Joined: 11 Feb 13
Posts: 134
Credit: 3,590,615,759
RAC: 1,610,986

I have noticed like another

I have noticed like another poster commented that the wu's seem to realign themselves unfortunately even after I've staggered them out. That's a bit annoying. 

Results for my Radeon VII running 4 tasks is around 645sec/wu. 

 

https://einsteinathome.org/host/12602626/tasks/4/56

 

No driver failures which is great though!!!


pututu
pututu
Joined: 6 Apr 17
Posts: 61
Credit: 653,417,392
RAC: 630

Chooka wrote:I have noticed

Chooka wrote:

I have noticed like another poster commented that the wu's seem to realign themselves unfortunately even after I've staggered them out. That's a bit annoying. 

Results for my Radeon VII running 4 tasks is around 645sec/wu. 

 

https://einsteinathome.org/host/12602626/tasks/4/56

 

No driver failures which is great though!!!

You can try to run 5 tasks per gpu on Radeon VII. There is enough vram on the card. Usually at most I see two pairs of tasks (say task #1/#2 and task #3/#4) are aligned but you still have 3 sets of tasks still maintain some stagger which should keep the gpu utilization above 90%. On my rig, most of the time only a pair of tasks are aligned, rarely seeing two pairs aligned so my gpu runs close to 100% utilization for most of the time.

Edit: Looking at how long it takes for the cpu to calculate the first stat, yours took about 7 mins 19 secs for this task.

2024-05-30 23:19:39.9992 (6308) [normal]: Recalculating statistics for the final toplist(s)...
2024-05-30 23:26:58.5108 (6308) [normal]: Finished recalculating toplist statistics.

Are you running any other cpu projects on this PC? I don't know if the O3AS cpu portion uses AVX but I notice that it really slows down the cpu calculation time when the cpu is fully loaded on my PC. Someone also reported the same thing here with fast cpu like 3960x at 4.15GHz. If the cpu portion does use AVX, then best to allocate one full core. You may want to try to reduce the cpu load on the other cpu project (if you are running one) and monitor the cpu run time on O3AS and adjust the cpu project load accordingly. Fast cpu is required to speed up the O3AS completion time.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.