As you see, O3AS takes almost half of its time running purely CPU. Though there're some tricks like staggering multiple tasks, I don't think that's a nice enough solution (as many GPU does not have enough VRAM to do so).
First, on my laptop it takes about 1min to load the data into the VRAM. That's about 100MB/s. What is the bottleneck, CPU, PCIe bandwidth or SSD bandwidth? If the latter, I should consider buying a better one.
Besides, in the final period it takes long to sort the candidates. However, it is not that necessary to consume so much time. Sorting can be parallelized (please refer to https://github.com/axel92b/multithreaded_sort). And there's not need to sort exactly: the final few recursions can be skipped to save much time of function calls (if use quicksort rather than mergesort). Besides, many are bad candidates that can be simply abandoned without sorting (which I believe you should already be using).
One final question: is it better to use a X3D CPU? It will be A few hundred MHz slower in frequency but larger Cache than non-X3D versions.
Copyright © 2024 Einstein@Home. All rights reserved.
Thank you for these quesions
)
Thank you for these quesions I've been asking some of them myselfes and already suggested if it's possible to include more CPU cores into the GPU crunching.
However since you wrote about laptop. I am crunching on a very modern Desktop with the 7800 X3D as a CPU. And on a HP Z-Book Fury which has really good cooling capacity for a notebook. While the laptop get's more credits with the GW App than with the BRP7 one, I realised that it's on its thermal limit. There is no point in getting it more efficient as laptops in general can't handle the thermals of both GPU and CPU being at 100%.
And that is with an additional big 30cm Noctua fan blowing cold air onto the bottom of the laptop.
With the desktop it's a different story, I have no comparision to non X3D CPUs, but while the gap in which the GPU has less to do is still there and not insignifican, its much smaller than on the notebook. And with the new tasks ending with a -2 using less VRAM so by running 3 at a time I managed to get + 60k credits a day compared to running them 2x
I don't know where the bottelnek is but I noticed the more modern the hardware the much faster the load times and. My HP laptop is from 2021, it takes about 10 seconds to load the VRAM, my desktop system I bought this year and load times are on instant.
seewo wrote: First, on my
)
If willing, show your computer(s) in your account so the actual specs can be seen. It could be the CPUs speed (thermal throttle is very real with laptops and this type of work), or it could be the memory bandwidth and/or speed of the GPU. It is probably not PCIe bandwidth or SSD bandwidth.
B.I.G wrote: Thank you for
)
Well, my laptop is 13500H+4060 Laptop. It's cold here in winter and I encounter no thermal limits. The GPU is even below 65 ℃ when working on Einstein with 3000rpm fan. You see that it's not even reaching 50W when hitting the voltage limit even when I overclocked 237MHz. It's way too much easy for a laptop.
In power config I set PCIe settings to maximum battery life(minimum performance). I wonder whether that's relevant. In GPU-Z the bus interface load is always low. I don't know what that exactly means but I guess the PCIe lane is not the bottleneck in loading anyway. I still quess the SSD may be the problem, the single thread random read is only 64MB/s.
When you say "but while the gap in which the GPU has less to do is still there and not insignifican, its much smaller than on the notebook", do you take CPU frequency into account? Anyway I'm gonna ask the other projects and then decide.
Boca Raton Community HS
)
I don't think it's either thormal limit or memory bandwidth. The CPU is 13500H on a fixed 3GHz. That will make it about 75℃. CPU speed may slower the postprocessing but I don't think it matters a lot in loading the data. The disk seq read is 2000MB/s but rand read is only 65MB/s. If the loading is seq read, it is not the problem. On memory, the CPU memory is 16Gx2 DDR4 3200, the VRAM is 8GB GDDR6 set to 1875 MHz(7500MT/s perhaps?).
seewo wrote: In power config
)
Yes, that is relevant, set it to maximum performance and see if there is a difference.
CPU Frequency alone doesn't explain it but a CPU is much more than it's frequency. The notebook with an Intel 10750H boosts up to 4Ghz one a single core, the desktop is running at 4.8 Ghz and because of better cooling I use 7 of the 8 cores. But there is no doubt that Intel chips stand no chance against AMD chips currently. Even back then the 10750 stood no chance against it's AMD counterpart. And maybe your i5 chip lacks features that the i7 has that are important.
Also RAM speed is significantly higher on the desktop (4800mhz vs 2667 mhz). In addition my observation outside of Einstein@home, for my daily work, is that latency can have a huge impact on system performance, And maybe there also is a difference between NVIDIA and AMD.
In theory your notebook should be much more potent than mine. What is the run time of a single task on your notebook?
seewo wrote: Boca Raton
)
I still do not think it would be bottlenecked at the disk. What about the GPU memory bandwidth? That is actually what I was referring to. Also, are you running other tasks at the same time (for E@H or other projects)?
Boca Raton Community HS
)
I only run 1 project at a time. The memory bandwidth is 240GB/s.
seewo wrote:Boca Raton
)
That VRAM bandwidth might be the issue. That is just my suggestion though.
Boca Raton Community HS
)
I doubt that, my W5500m has a memory bandwith of 224GB/s yet it loads into the VRAM much faster.
I can't imagine a 4060 performing worse than a W5500.
Such things can be a pain to figgure out. I wonder if it has to do with the i5 CPU maybe performing worse with certain calculations?