I currently have a 3080. I am starting to think about what my next upgrade will be sure to go for either a 4080/5080 if they are out sometime next year or should I change and go with a 7900 XT? (AMD)
Ah I missed that specific question. If you think 4080 why not go for the 7900xtx which is about 200€ cheaper than the 4080 and outperforms it?
But unless money is no issue and you just want to buy it because you can afford it. I'd wait for the next generation and then see how the two brands compare 6 months after launch. AMD usually has bad drivers in the beginning but does a great job in updating them to get the max out of the cards while nvidia has good drivers right from the start but you don't get the improvements later. For example when the 7900xtx launched it was about as good as the 4080, 6 months later with the latest drivers it got ahead to the point where depending on what you do it is as good as the 4090, in gaming the Nvidia is still clearly ahead but in productivity the AMD card often enough is equal, sometimes even faster. And costs 600$ less... or at the moment a whole whooping 1000$ less.
But for that knowledge to crystalise you have to wait a while and it's different with every card generation.
you shouldn't compare gaming benchmarks and apply that performance to expected performance for compute or BOINC. there is far too much specific optimization for gaming and even per game optimization in the drivers. both nvidia and amd do this.
but compute is a bit more generalized as there isn't really much application specific optimization in the drivers. that comes down to the code and what the hardware is capable of.
nvidia, on certain platforms tends to be more tunable, especially if the application is CUDA. AMD tends to do better than Nvidia only when the code is less optimized.
just look through the leaderboard here. nvidia is dominant. it takes a system like a 8x MI100 monster compute cards to match my 6x 3080Tis with an optimized app and some CUDA tweaks.
I looked at a 7900XTX on the leaderboard, doing the BRP7 tasks in about 200s each effective (running 2x in 400s)
My 3080Tis run the BRP7 tasks in about 166s effective (running 3x in 500s)
I currently have a 3080. I am starting to think about what my next upgrade will be sure to go for either a 4080/5080 if they are out sometime next year or should I change and go with a 7900 XT? (AMD)
Ah I missed that specific question. If you think 4080 why not go for the 7900xtx which is about 200€ cheaper than the 4080 and outperforms it?
This is why I'm asking. From what I have seen so far it looks like the 4080 is the better performer.
AMD is approximately $1099 cheaper (not on sale) If the 4080 is the better performer I will eventually end up getting 1 of those
Are you able to point me to where it is outperforming the 4080 when it comes to boinc? Looking at the top hosts here I cannot see which host is running what card without clicking on each individual user
you shouldn't compare gaming benchmarks and apply that performance to expected performance for compute or BOINC.
That's why I pointed out that there is a difference between gaming and even in computing between applications. But in general gaming cards offer more performance than workstation cards, at least here on einstein@home.
Ian&Steve C. wrote:
nvidia, on certain platforms tends to be more tunable, especially if the application is CUDA. AMD tends to do better than Nvidia only when the code is less optimized.
Specific here on BOINC I can't say but general in computing it's even more complex because there are computations where AMD is ahead of NVIDIA with their accelerator cards and although not advertised that also shows in their gaming cards and that lead to some surprises in benchmarks when with one specific computation the 7900xtx was twice as fast as the 4090 but when those cores that games need for ray tracing can be used to compute something the NVIDIA card eats anything else alive. And then you have everything in between depending on the type of computation.
But it's a similar thing with CPUs... I had a 5 year older, much slower Core 2 beating a Xeon by quite a big margin. The Xeon again being much faster than a way newer i7 and the AMD 7800x3d being 3 times as fast as said Xeon except with the FGRP5 app where the now 12 years old Intel is faster...
Are you able to point me to where it is outperforming the 4080 when it comes to boinc?
Ah Boinc only is a different topic, as Ian&Steve C. explained. So you want that card only for boinc and nothing else or at least boinc is the main focus? Then according to Ian the 4080 is better.
Maybe a different approach. Unless your computer is limited to one card, why not go for more cheaper cards that might deliver more credits for the money and power.
Up to a point power consumption and money scales with the price but the more to the top you go the more you pay both in power draw and price compared to less gains in performance.
since you seem to be very knowledgable, you wrote about the BRP7 tasks, did you also check the GW tasks? I only have AMD cards in use now but I noticed that my Laptop card, the W5500m is significantly better with the GW tasks - could get almost double the credits - while in the desktop the W7600 would get slightly more credits with the BRP7 tasks. How do the nvidia perform with the GW tasks? But since the GW tasks are very CPU heavy maybe there is a CPU bias in it?
For me the easy answer is to stick with Nvidia as it's essentially plug and play
With Windows 11 it doesn't matter at all.
As for what card to get, in general at the moment the AMD cards offer more performance for the money. If you need cuda for some projects obvsiously NVIDA is the only choice. Other than that - unless we go really into the details - go with whatever brand you like more.
Just don't get the 4090 that power connector doesn't seem to be made to endure that power draw and if under heavy load 24/7 I wouldn't trust it without a fire alarm next to the computer.
I would actually defend the 4090 if it is being used appropriately and safely. It is not a perfect GPU (memory bandwidth being the main issue) but it really is incredible. When running Meerkat x2, they only pull ~270 watts (on average) which is in no way an excessive amount (Linux Mint, default settings). They never get above 60C (usually steady 45C) and fans are never higher than 40%. I understand that people might want to push the card to the max (overclock, etc) but there really is no reason to, especially when using them for scientific reasons. If you increase the clocks on this GPU, expect computational errors.
That being said, I still dislike the power connection... But, Nvidia seems to have made it clear they are sticking with it. OEMs are accepting it- one of them is coming out with an official connector soon with a 90deg connector instead of the typical connector.
If you push a huge amount of current through the connection, then yes, there might be problems. Don't push too much current and seat the connection correctly and the card its self is amazing.
Are you able to point me to where it is outperforming the 4080 when it comes to boinc?
Ah Boinc only is a different topic, as Ian&Steve C. explained. So you want that card only for boinc and nothing else or at least boinc is the main focus? Then according to Ian the 4080 is better.
Maybe a different approach. Unless your computer is limited to one card, why not go for more cheaper cards that might deliver more credits for the money and power.
Up to a point power consumption and money scales with the price but the more to the top you go the more you pay both in power draw and price compared to less gains in performance.
since you seem to be very knowledgable, you wrote about the BRP7 tasks, did you also check the GW tasks? I only have AMD cards in use now but I noticed that my Laptop card, the W5500m is significantly better with the GW tasks - could get almost double the credits - while in the desktop the W7600 would get slightly more credits with the BRP7 tasks. How do the nvidia perform with the GW tasks? But since the GW tasks are very CPU heavy maybe there is a CPU bias in it?
It is important to remember that the more modern the GPU, the more efficient it is. For instance- we sometimes still run a couple AMD Firepro GPUs that are about 10 years old. They pull 150 watts (if I remember correctly) and they are terribly inefficient and are barely worth running at all for the amount of work they can accomplish.
It is important to remember that the more modern the GPU, the more efficient it is.
Yes, definitely, always go for the most modern architecture. But like you pointed out the 4090 consumes 270 watts with 2 BRP7 WUs at that load it is crazy efficient. But if you push it to 100% of it's capability it'll consume close to 600 watts but not achieve double the performance. Actually will be far away from double the performance.
So again depending on the architecture it is quite frequent that cheaper GPUs are equally efficient but incapable of reaching the max performance of the top GPU. In that case you can get more performance per watt and money buy going for a mid range GPU than for the top of the line one. And maybe get more performance for the same money with 2 mid range GPUs than running one high end GPU at max power.
the GW tasks are very CPU biased/sensitive. about half, or more, of the total runtime is purely done on the CPU. so faster CPUs will finish this half faster. GW tasks will crunch on the GPU for 0-99% on BOINC progress reporting. and at that point you will notice that the task does not stop, but the GPU will be idle while the CPU keeps going on a single thread. when I tested this a while back (before they changed the app recently) it took about 8 mins to crunch the GPU, then another 8 mins for the final 1% on the CPU only. for a total of 16mins. if you keep the CPU the same, that last 1% will take the same amount of time no matter what GPU you have.
re: new cards and efficiency, the titan V does break the mold there. I don't know what magic is happening with the titan V and BRP7 with our optimized app, but on my systems, the titan V is a tad faster than a 3070Ti, and uses only 120W doing it. ppd/W was about on par with 4090s that I've seen. (about half the speed of a 4090, but half the power draw, and 1/3rd the cost). but that's specific to the BRP7 project. the 40-series cards seem to have a nice clock speed and efficiency advantage for several other projects like GPUGRID, PrimeGrid (especially PG!), and Asteroids.
Are you able to point me to where it is outperforming the 4080 when it comes to boinc?
B.I.G wrote:
Ah Boinc only is a different topic, as Ian&Steve C. explained. So you want that card only for boinc and nothing else or at least boinc is the main focus? Then according to Ian the 4080 is better.
Maybe a different approach. Unless your computer is limited to one card, why not go for more cheaper cards that might deliver more credits for the money and power.
Up to a point power consumption and money scales with the price but the more to the top you go the more you pay both in power draw and price compared to less gains in performance.
Yes you're right boinc is my main focus. I only have space for 1 GPU. There is a key difference between my computer and Ian&Steve C. is I am running Windows & Ian and Steve are running Linux. I believe Linux will always run work faster (thanks to the special app) than Windows for Einstein. I am not wanting to start a OS debate.
about half, or more, of the total runtime is purely done on the CPU. so faster CPUs will finish this half faster. GW tasks will crunch on the GPU for 0-99% on BOINC progress reporting. and at that point you will notice that the task does not stop, but the GPU will be idle while the CPU keeps going on a single thread
As mentioned, the exact proportions of the CPU-intensive and the GPU-intensive portions of recent Gravity Wave GPU tasks will depend on the relative speed of your system's GPU and the CPU core in use, but an important detail has recently changed.
Before the change to the -2 work, the pattern was:
The task starts CPU-only, until initial prep gets things ready for CPU work (this phase takes a minute or two on my systems)
next a transition to GPU-intensive, able to reach near 100% GPU utilization in this period
The GPU-heavy transition is simultaneous with a snap-back in reported completion percentage from, perhaps 6% suddenly dropping to perhaps 0.3%
Next transition is to end work which is nearly purely CPU
Formerly this transition happened at reported completion percentage of 99%.
The work looked like it was failing to thrive--as the reported percentage stayed at 99% for many minutes, and the GPU was essentially unused.
Watching the temperature (and hence power consumption and utilization) of the GPU graphically showed this very plainly. So the efficiency trick to get a big output boost from running 2X was to offset the launch times of the two running units to the 50% elapsed time (not 50% reported completion) point. This worked extremely well both on my Nvidia RTX 3060 and on my two AMD cards (a Radeon RX 5700 and a Radeon RX 6800 XT).
That worked very nicely. You could easily see on the temperature graph whether you had the offset about right. Your reward was a temperature graph that showed the GPU working hard most of the time, rather than just half the time. The output increase going from 1X to 2X was by far the largest I'd seen in many years of running Einstein tasks.
BUT, the new work, as reported in the technical new thread, actually splits the task into two sequential segments, so the 50% offset technique is not the best--in fact it is probably the very worst.
Running at 1X, you see a temperature graph which looks very similar to the old one, until you notice that there are two humps per work unit, not one.
So offsetting at the 50% point actually gets these two tasks engaging in their GPU intense work at the same time, and neglecting the GPU the other half of the time.
So at 2X the correct recipe for start time offset for the new work (application name ending in -2) is to start the second task at 25% elapsed time.
Again the temperature graph is your friend (I use the one provided by TThrottle, but anything that gives you a time graph should be adequate.) If you have a gap at one side, you need to sllde the starting offset a bit in the direction to shrink it.
B.I.G wrote: Speedy
)
you shouldn't compare gaming benchmarks and apply that performance to expected performance for compute or BOINC. there is far too much specific optimization for gaming and even per game optimization in the drivers. both nvidia and amd do this.
but compute is a bit more generalized as there isn't really much application specific optimization in the drivers. that comes down to the code and what the hardware is capable of.
nvidia, on certain platforms tends to be more tunable, especially if the application is CUDA. AMD tends to do better than Nvidia only when the code is less optimized.
just look through the leaderboard here. nvidia is dominant. it takes a system like a 8x MI100 monster compute cards to match my 6x 3080Tis with an optimized app and some CUDA tweaks.
I looked at a 7900XTX on the leaderboard, doing the BRP7 tasks in about 200s each effective (running 2x in 400s)
My 3080Tis run the BRP7 tasks in about 166s effective (running 3x in 500s)
3080Tis are cheaper than 7900XTX at this time.
_________________________________________________________________________
Ian&Steve C. wrote: B.I.G
)
Ian&Steve C. wrote: you
)
That's why I pointed out that there is a difference between gaming and even in computing between applications. But in general gaming cards offer more performance than workstation cards, at least here on einstein@home.
Specific here on BOINC I can't say but general in computing it's even more complex because there are computations where AMD is ahead of NVIDIA with their accelerator cards and although not advertised that also shows in their gaming cards and that lead to some surprises in benchmarks when with one specific computation the 7900xtx was twice as fast as the 4090 but when those cores that games need for ray tracing can be used to compute something the NVIDIA card eats anything else alive. And then you have everything in between depending on the type of computation.
But it's a similar thing with CPUs... I had a 5 year older, much slower Core 2 beating a Xeon by quite a big margin. The Xeon again being much faster than a way newer i7 and the AMD 7800x3d being 3 times as fast as said Xeon except with the FGRP5 app where the now 12 years old Intel is faster...
I live in the wrong country it seems ;)
Speedy wrote: Are you able
)
Ah Boinc only is a different topic, as Ian&Steve C. explained. So you want that card only for boinc and nothing else or at least boinc is the main focus? Then according to Ian the 4080 is better.
Maybe a different approach. Unless your computer is limited to one card, why not go for more cheaper cards that might deliver more credits for the money and power.
Up to a point power consumption and money scales with the price but the more to the top you go the more you pay both in power draw and price compared to less gains in performance.
@ Ian&Steve C.
since you seem to be very knowledgable, you wrote about the BRP7 tasks, did you also check the GW tasks? I only have AMD cards in use now but I noticed that my Laptop card, the W5500m is significantly better with the GW tasks - could get almost double the credits - while in the desktop the W7600 would get slightly more credits with the BRP7 tasks. How do the nvidia perform with the GW tasks? But since the GW tasks are very CPU heavy maybe there is a CPU bias in it?
B.I.G wrote: mikey
)
I would actually defend the 4090 if it is being used appropriately and safely. It is not a perfect GPU (memory bandwidth being the main issue) but it really is incredible. When running Meerkat x2, they only pull ~270 watts (on average) which is in no way an excessive amount (Linux Mint, default settings). They never get above 60C (usually steady 45C) and fans are never higher than 40%. I understand that people might want to push the card to the max (overclock, etc) but there really is no reason to, especially when using them for scientific reasons. If you increase the clocks on this GPU, expect computational errors.
That being said, I still dislike the power connection... But, Nvidia seems to have made it clear they are sticking with it. OEMs are accepting it- one of them is coming out with an official connector soon with a 90deg connector instead of the typical connector.
If you push a huge amount of current through the connection, then yes, there might be problems. Don't push too much current and seat the connection correctly and the card its self is amazing.
B.I.G wrote: Speedy
)
It is important to remember that the more modern the GPU, the more efficient it is. For instance- we sometimes still run a couple AMD Firepro GPUs that are about 10 years old. They pull 150 watts (if I remember correctly) and they are terribly inefficient and are barely worth running at all for the amount of work they can accomplish.
Boca Raton Community HS
)
Yes, definitely, always go for the most modern architecture. But like you pointed out the 4090 consumes 270 watts with 2 BRP7 WUs at that load it is crazy efficient. But if you push it to 100% of it's capability it'll consume close to 600 watts but not achieve double the performance. Actually will be far away from double the performance.
So again depending on the architecture it is quite frequent that cheaper GPUs are equally efficient but incapable of reaching the max performance of the top GPU. In that case you can get more performance per watt and money buy going for a mid range GPU than for the top of the line one. And maybe get more performance for the same money with 2 mid range GPUs than running one high end GPU at max power.
the GW tasks are very CPU
)
the GW tasks are very CPU biased/sensitive. about half, or more, of the total runtime is purely done on the CPU. so faster CPUs will finish this half faster. GW tasks will crunch on the GPU for 0-99% on BOINC progress reporting. and at that point you will notice that the task does not stop, but the GPU will be idle while the CPU keeps going on a single thread. when I tested this a while back (before they changed the app recently) it took about 8 mins to crunch the GPU, then another 8 mins for the final 1% on the CPU only. for a total of 16mins. if you keep the CPU the same, that last 1% will take the same amount of time no matter what GPU you have.
re: new cards and efficiency, the titan V does break the mold there. I don't know what magic is happening with the titan V and BRP7 with our optimized app, but on my systems, the titan V is a tad faster than a 3070Ti, and uses only 120W doing it. ppd/W was about on par with 4090s that I've seen. (about half the speed of a 4090, but half the power draw, and 1/3rd the cost). but that's specific to the BRP7 project. the 40-series cards seem to have a nice clock speed and efficiency advantage for several other projects like GPUGRID, PrimeGrid (especially PG!), and Asteroids.
_________________________________________________________________________
B.I.G wrote:Speedy
)
Yes you're right boinc is my main focus. I only have space for 1 GPU. There is a key difference between my computer and Ian&Steve C. is I am running Windows & Ian and Steve are running Linux. I believe Linux will always run work faster (thanks to the special app) than Windows for Einstein. I am not wanting to start a OS debate.
Ian&Steve C. wrote: about
)
As mentioned, the exact proportions of the CPU-intensive and the GPU-intensive portions of recent Gravity Wave GPU tasks will depend on the relative speed of your system's GPU and the CPU core in use, but an important detail has recently changed.
Before the change to the -2 work, the pattern was:
The task starts CPU-only, until initial prep gets things ready for CPU work (this phase takes a minute or two on my systems)
next a transition to GPU-intensive, able to reach near 100% GPU utilization in this period
The GPU-heavy transition is simultaneous with a snap-back in reported completion percentage from, perhaps 6% suddenly dropping to perhaps 0.3%
Next transition is to end work which is nearly purely CPU
Formerly this transition happened at reported completion percentage of 99%.
The work looked like it was failing to thrive--as the reported percentage stayed at 99% for many minutes, and the GPU was essentially unused.
Watching the temperature (and hence power consumption and utilization) of the GPU graphically showed this very plainly. So the efficiency trick to get a big output boost from running 2X was to offset the launch times of the two running units to the 50% elapsed time (not 50% reported completion) point. This worked extremely well both on my Nvidia RTX 3060 and on my two AMD cards (a Radeon RX 5700 and a Radeon RX 6800 XT).
That worked very nicely. You could easily see on the temperature graph whether you had the offset about right. Your reward was a temperature graph that showed the GPU working hard most of the time, rather than just half the time. The output increase going from 1X to 2X was by far the largest I'd seen in many years of running Einstein tasks.
BUT, the new work, as reported in the technical new thread, actually splits the task into two sequential segments, so the 50% offset technique is not the best--in fact it is probably the very worst.
Running at 1X, you see a temperature graph which looks very similar to the old one, until you notice that there are two humps per work unit, not one.
So offsetting at the 50% point actually gets these two tasks engaging in their GPU intense work at the same time, and neglecting the GPU the other half of the time.
So at 2X the correct recipe for start time offset for the new work (application name ending in -2) is to start the second task at 25% elapsed time.
Again the temperature graph is your friend (I use the one provided by TThrottle, but anything that gives you a time graph should be adequate.) If you have a gap at one side, you need to sllde the starting offset a bit in the direction to shrink it.