i don't think you need a "2" file at all. i only ever saw "0" and "1" versions of the file and my anonymous platform setup does not have that file. not sure where Tom got that or if he just made one extra.
but i think they have something to do with the tasks being cut in half and getting the 2nd half of the GPU portion of the task to restart on the proper GPU device. these files weren't present in the older version of the app.
I just tested your point on by clearing the exe and the all the .config files. It only downloaded the two. So I jumped to an incorrect conclusion (again) without testing my idea.
Tom M
A Proud member of the O.F.A. (Old Farts Association). Be well, do good work, and keep in touch.® (Garrison Keillor) I want some more patience. RIGHT NOW!
Check your awarded credits! Mine have gone from 10,000 per task back down to 5,000 per task.
Guess whats NOT getting Invalids.
A Proud member of the O.F.A. (Old Farts Association). Be well, do good work, and keep in touch.® (Garrison Keillor) I want some more patience. RIGHT NOW!
Nothing to worry about. Those 5000 tasks are just resends from before the reward was upped to 10k. The credit reward is intrinsic to the task itself, not necessarily when you processed and returned it. Those tasks were created before Bernd increased the reward
So you are not using Hyper-threading/SMT for cup projects?
No, I tried that shortly to see how my system performs under full load and while CPU output went up it did severely punish GPU performance. Which was expected: you have a CPU intensive part of a GPU task that runs on 1 core. If you use SMT you take away half of the CPU performance for that task. Best is if each GPU task can have 1 CPU core dedicated at max boost clock Since the cooling is adequate I can run 7 tasks at 4.8Ghz which is close to the max boost of 5 Ghz. I always leave 1 core free for whatever the system wants to do.
Basically if you haven't started collecting data at 1 x, 2 x to go with your 3 x results you are not completely sure what your most productive setting will be.
Thank you for that explanation however the question was if the X3d CPUs perform better.
Now I happen to have such a CPU but can't answer the question because I don't know how to compare the CPUs performance to other machines, especially since I have a not very common GPU.
I think we'd need a system with an 7700X CPU and the same GPU to see if the additional 3d cache beats the higher frequency of what otherwise is an identical CPU.
It could also be tested by someone with a 7900X3D or 7950X3D, because they could put the tasks on the chiplet with the 3d cache, or put them on the other chiplet without the 3d cache, and then compare the runtimes. Everything else would be equal.
On a memory-bound (an Epycd8 MB only running two memory sticks, the rest are in the mail) it appears there is no significant difference in the length of the task runtime between 1x and 2x. I want to revisit this test once I have more memory ram.
And run through all the way to 4x (again).
This IS supporting the idea that the processing speed on this task is much more CPU dependent than GPU dependent.
So it may be on a faster CPU/Memory/cache combo crowding in as many tasks as you can on your GPU maximizes total production (and your RAC).
Shades of the cache heavy CPU discussion :)
Tom M
A Proud member of the O.F.A. (Old Farts Association). Be well, do good work, and keep in touch.® (Garrison Keillor) I want some more patience. RIGHT NOW!
Yeah, the O3AS gpu tasks need fast cpu too. The gpu never reaches the power limit that I set, at least on my Radeon VII with openCL. Maybe with the future cuda apps, that will change for Nvidia cards.
Preliminary results seem to show: 1 x is about 987 seconds and 2 x is about 1620 seconds. This is on my faster Epyc box with all memory available. The 1x is a similar result that George reported on a 3950x/rtx 3080ti system in another message area.
My calculated results show a lower RAC for 1x and I have already been running 2x previously so this system is on to 3x to see if it predictably scales to 3 GPU tasks and still crunches for more total production.
I expect to also test 4x also. Then I will be able to compare my results with an Epyc-7601 that I am also testing this on. After more CPU memory arrives for it.
Tally Ho!
Tom M
A Proud member of the O.F.A. (Old Farts Association). Be well, do good work, and keep in touch.® (Garrison Keillor) I want some more patience. RIGHT NOW!
Fixed the core clock to 4.2GHz paired with Tesla P100 for this host. No other cpu project is running.
With 3 tasks per gpu, average run time is about 630 secs when staggering the 3 tasks. Run to run variation is small (look for those between 1880 - 1890 secs range). About 1.37M PPD. GPU board power fluctuates from 80W to 130W.
Ian&Steve C. wrote: i don't
)
I just tested your point on by clearing the exe and the all the .config files. It only downloaded the two. So I jumped to an incorrect conclusion (again) without testing my idea.
Tom M
A Proud member of the O.F.A. (Old Farts Association). Be well, do good work, and keep in touch.® (Garrison Keillor) I want some more patience. RIGHT NOW!
Check your awarded credits!
)
Check your awarded credits! Mine have gone from 10,000 per task back down to 5,000 per task.
Guess whats NOT getting Invalids.
A Proud member of the O.F.A. (Old Farts Association). Be well, do good work, and keep in touch.® (Garrison Keillor) I want some more patience. RIGHT NOW!
Nothing to worry about. Those
)
Nothing to worry about. Those 5000 tasks are just resends from before the reward was upped to 10k. The credit reward is intrinsic to the task itself, not necessarily when you processed and returned it. Those tasks were created before Bernd increased the reward
_________________________________________________________________________
Tom M wrote:So you are not
)
No, I tried that shortly to see how my system performs under full load and while CPU output went up it did severely punish GPU performance. Which was expected: you have a CPU intensive part of a GPU task that runs on 1 core. If you use SMT you take away half of the CPU performance for that task. Best is if each GPU task can have 1 CPU core dedicated at max boost clock Since the cooling is adequate I can run 7 tasks at 4.8Ghz which is close to the max boost of 5 Ghz. I always leave 1 core free for whatever the system wants to do.
Tom M wrote: Basically if
)
Thank you for that explanation however the question was if the X3d CPUs perform better.
Now I happen to have such a CPU but can't answer the question because I don't know how to compare the CPUs performance to other machines, especially since I have a not very common GPU.
I think we'd need a system with an 7700X CPU and the same GPU to see if the additional 3d cache beats the higher frequency of what otherwise is an identical CPU.
It could also be tested by
)
It could also be tested by someone with a 7900X3D or 7950X3D, because they could put the tasks on the chiplet with the 3d cache, or put them on the other chiplet without the 3d cache, and then compare the runtimes. Everything else would be equal.
Tom M wrote: Guess whats NOT
)
Still! :)
On a memory-bound (an Epycd8 MB only running two memory sticks, the rest are in the mail) it appears there is no significant difference in the length of the task runtime between 1x and 2x. I want to revisit this test once I have more memory ram.
And run through all the way to 4x (again).
This IS supporting the idea that the processing speed on this task is much more CPU dependent than GPU dependent.
So it may be on a faster CPU/Memory/cache combo crowding in as many tasks as you can on your GPU maximizes total production (and your RAC).
Shades of the cache heavy CPU discussion :)
Tom M
A Proud member of the O.F.A. (Old Farts Association). Be well, do good work, and keep in touch.® (Garrison Keillor) I want some more patience. RIGHT NOW!
Yeah, the O3AS gpu tasks
)
Yeah, the O3AS gpu tasks need fast cpu too. The gpu never reaches the power limit that I set, at least on my Radeon VII with openCL. Maybe with the future cuda apps, that will change for Nvidia cards.
Preliminary results seem to
)
Preliminary results seem to show: 1 x is about 987 seconds and 2 x is about 1620 seconds. This is on my faster Epyc box with all memory available. The 1x is a similar result that George reported on a 3950x/rtx 3080ti system in another message area.
My calculated results show a lower RAC for 1x and I have already been running 2x previously so this system is on to 3x to see if it predictably scales to 3 GPU tasks and still crunches for more total production.
I expect to also test 4x also. Then I will be able to compare my results with an Epyc-7601 that I am also testing this on. After more CPU memory arrives for it.
Tally Ho!
Tom M
A Proud member of the O.F.A. (Old Farts Association). Be well, do good work, and keep in touch.® (Garrison Keillor) I want some more patience. RIGHT NOW!
Fixed the core clock to
)
Fixed the core clock to 4.2GHz paired with Tesla P100 for this host. No other cpu project is running.
With 3 tasks per gpu, average run time is about 630 secs when staggering the 3 tasks. Run to run variation is small (look for those between 1880 - 1890 secs range). About 1.37M PPD. GPU board power fluctuates from 80W to 130W.