Bernd, Is it possible to release the VRAM when the recalc phase begins? Was trying to run 3 O3AS WUs on a 12 GB GPU but they fail even though one or two of them are in the recalc phase and barely even using the GPU.
Guess they must be staggered for that to work. What's the BOINC command :-?
manually. Just run 1x until the task hits the CPU portion and switch the host to run 2x to start another one. It will likely shift over time and you’ll probably have to intervene again occasionally. Or just let it go and don’t worry about it.
either way, the fact remains that there is no automated or BOINC level way to do this easily. And the app is not coded with any special command line options to allow for it either. I’m not aware of any BOINC app that ever did that.
we had a Special SETI app that was custom coded with a mutex lock that essentially did this though. When you ran the app and set BOINC to 2x, it would run the first one and the second task would pre-load onto the GPU but did not actually run the task until the GPU context was released. Worked well.
Actually that's some optimization that we already thought of, though we didn't finish the implementation yet. In principle the BOINC client should start only so much tasks in parallel as fit in the available GPU memory, thus by adjusting the memory size (and free cores) one should be able to convince the client to start another GPU task when the memory is freed by the one still running. However I think the client only performs this check every five minutes, which might not be fine-grained enough. Also too I would change the app such that a memory allocation failure becomes a "transient" error, so the client would start the same task again after some time. For now, though, these are just thoughts, we never tried that, and there are a few other more urgent problems to solve. But we'll keep that in mind.
All Sky takes 5 times the VRAM as MeerKat and takes more than twice as long to run. Yet O3AS WUs are created with a 7 day Deadline while BRP7 WUs are created with a 14 day Deadline. This is backwards and completely irrational. It makes it very difficult to make more efficient use of a GPU by running multiple projects. These extreme deadlines wreak havoc with the BOINC scheduler. I can easily run 2 O3AS and 1 BRP7 on GPUs with over 9 GB VRAM. The problem is that BRP7 will not start if there's more than one O3AS WU Ready-To-Start (RTS). Since O3AS WUs are delivered at a much faster rate than BRP7 WUs there is always a surplus of O3AS WUs and the BRP7 WUs just sit there when they could be running. Suspend all the RTS O3AS WUs and a BRP7 WU will start running even if 2 O3AS WUs are already running. No point running more than one BRP7 along with an O3AS since a BRP7 WU maintains close to 100% GPU utilization. One can watch this using the Linux utility NVITOP: https://github.com/XuehaiPan/nvitop
The Deadline should be the same and I suggest 7 days is too much for either project and they should both be 2 days for such short running WUs. Such lengthy Deadlines harks back to the days of dial-up modems.
The Deadline should be the same and I suggest 7 days is too much for either project and they should both be 2 days for such short running WUs. Such lengthy Deadlines harks back to the days of dial-up modems.
I strongly oppose this comment. BOINC was never meant to be used by power crunchers alone running monster rigs 24/7. That's independent of dial-up modems or gigabit links.
The founding idea of BOINC was to utilize idle CPU cycles of normal desktop computers, laptops that run for a few hours a day. This is citizen science. I find deadlines of 7 days very short, demanding. Shortening them even further excludes all users who use computers (even very old ones) sporadically, perhaps not every day or just a few hours a day. I understand that deadlines cannot be 2 months to keep the size of the server-side BOINC task database small. But 2 days is discriminatory towards 'normal' slow pace crunchers.
I agree with Aurum's argument deadlines should correspond to expected runtimes: longer runtimes, longer deadlines.
The Deadline should be the same and I suggest 7 days is too much for either project and they should both be 2 days for such short running WUs. Such lengthy Deadlines harks back to the days of dial-up modems.
I strongly oppose this comment. BOINC was never meant to be used by power crunchers alone running monster rigs 24/7. That's independent of dial-up modems or gigabit links.
The founding idea of BOINC was to utilize idle CPU cycles of normal desktop computers, laptops that run for a few hours a day. This is citizen science. I find deadlines of 7 days very short, demanding. Shortening them even further excludes all users who use computers (even very old ones) sporadically, perhaps not every day or just a few hours a day. I understand that deadlines cannot be 2 months to keep the size of the server-side BOINC task database small. But 2 days is discriminatory towards 'normal' slow pace crunchers.
I agree with Aurum's argument deadlines should correspond to expected runtimes: longer runtimes, longer deadlines.
I too agree with Scrooge McDuck, in order to be able to enable the widest group of crunchers possible the deadlines should be commersate with the "expected runtime, longer runtime, longer deadlines" and 2 days is not going to allow those who do things on their pc's besides crunching participate to the fullest extent possible. If the same few people do all the crunching then the chance of the results being skewed is significantly greater than a much wider base of crunchers. The 2nd problem is if one or more of those few crunchers leaves the Project then it could crumble and disappear and that would not be a good thing at all. Projects need to both entice new people and embrace the power crunchers as well, having plenty of tasks with reasonable dealines does that.
How long does it take to run an O3AS WU? My 1080 Ti's can run a pair of them in a little over an hour. So you're saying that your computers can only crunch an All-Sky WU for an hour a week? Then you should run Meerkats that take about 15 minutes each.
Another very serious issue is the disparity in estimated run times that the WUs DL with. O3AS does an excellent job of starting them at about 1.5 hours. Meerkat is an abysmal failure estimating that a 15 minute WU will take anywhere from 3.5 to 6.5 hours. That wreaks havoc with the BOINC scheduler. The advantage to configuring to allow running multiple projects is that when one runs out of work you keep working on something without frequent babysitting.
All-Sky wastes 50% of the GPU and consumes massive VRAM. Meerkat makes near full use of a GPU with minimal VRAM. Seems they're a good pairing until one tries it and discovers the BOINC Scheduler enters a fugue state and can only run one or the other and can only DL one or the other. Sadly opposites happen and ruins everything. Meerkat with its obscenely overstated run times fills up the queue and triggers "Not requesting work: not needed" and DLs grind to a halt. All-Sky with its shorter deadline takes priority and runs alone until all WUs are gone then Meerkat can run.
If Meerkat would shorten their deadline to the same 5 days as All-Sky and shorten their estimated run time to less than an hour then they would play nice together and the pairing alleviates the inefficient GPU utilization of All-Sky.
Credits for the GW searches weren't updated for quite some time. Sorry. Newly validated tasks should receive 5000 instead of 1000 credit.
If you normalize points to Meerkat then you should award 10000 points per O3AS WU.
A 1080 Ti can run a pair of Meerkats every 20 minutes so it has a throughput of 6 WU/hr and earns 3333 points per WU or 20000 points per hour per GPU.
O3AS WUs vary a good bit so I'd say an average of an hour each when running a pair on a 1080 Ti. That means a GPU can run 2 an hour. So you have to award 10000 points per WU if you want to make it the equivalent of Meerkat.
they try to make the points equitable in terms of computational effort/difficulty, not runtime. since 50% of the WU runs on the CPU, it requires far less flops for that last 50% of the runtime than the first 50%.
they try to make the points equitable in terms of computational effort/difficulty, not runtime. since 50% of the WU runs on the CPU, it requires far less flops for that last 50% of the runtime than the first 50%.
You lost me, half of the GPU is wasted running All-Sky. So you're saying divide to account for the wastage? Can you show how you do the calculation?
You lost me, half of the GPU is wasted running All-Sky. So you're saying divide to account for the wastage? Can you show how you do the calculation?
like i said it's based on computational effort and not runtime. half of the GPU is not "wasted" it's just not used. you arent burning the same amount of power when it's idle. they havent been able to move that final calculation to the GPU (maybe for precision reasons? Bernd would have to clarify) so it goes to the slower CPU.
same reason they don't award the CPU tasks insane amounts of credit for taking much longer. it's a slower device so it gets less credit. the second half of the task runs on a slower device and gets less credit. I'm not sure the exact formula if there even is one, just that's the way they generally approach it
running two O3AS tasks per GPU helps fill in those gaps for a more efficient overall operation. and if you do a little bit of babysitting to keep O3AS tasks staggered, two identical systems can actually come very close to the same overall ppd on either BRP7 or O3AS. but that's on you to configure the system to run optimally. or if you prefer lower power operation just run O3AS one at a time and enjoy less overall power use and less credits.
edit, from the first post in this thread
Bernd said:
Quote:
To reach a better sensitivity, we'll make the resulting list of candidates even longer. This means not only larger result files to be uploaded (well, shouldn't be much of a problem nowadays), but also a longer time taken for the "recalc" step ("recalculating toplist ststistics" is written in stderr). This step is done purely on the CPU. We are working on porting it to the GPUs, but the memory access pattern of this step is so unpredictable that we don't get much speedup from that yet (accessing "global" memory on the GPU is still terribly slow). We hope to get an improved version of the App out during the run.
so probably a combination of required sensitivity and code optimization making the final part better suited to the CPU. sounds like they will be trying to port this to the GPU at some point, and at that time i would guess the app will run much faster/efficiently.
Ian&Steve C. wrote: Aurum
)
Then how did you expect them to be staggered?
manually. Just run 1x until
)
manually. Just run 1x until the task hits the CPU portion and switch the host to run 2x to start another one. It will likely shift over time and you’ll probably have to intervene again occasionally. Or just let it go and don’t worry about it.
either way, the fact remains that there is no automated or BOINC level way to do this easily. And the app is not coded with any special command line options to allow for it either. I’m not aware of any BOINC app that ever did that.
we had a Special SETI app that was custom coded with a mutex lock that essentially did this though. When you ran the app and set BOINC to 2x, it would run the first one and the second task would pre-load onto the GPU but did not actually run the task until the GPU context was released. Worked well.
_________________________________________________________________________
Bernd Machenschalk
)
All Sky takes 5 times the VRAM as MeerKat and takes more than twice as long to run. Yet O3AS WUs are created with a 7 day Deadline while BRP7 WUs are created with a 14 day Deadline. This is backwards and completely irrational. It makes it very difficult to make more efficient use of a GPU by running multiple projects. These extreme deadlines wreak havoc with the BOINC scheduler. I can easily run 2 O3AS and 1 BRP7 on GPUs with over 9 GB VRAM. The problem is that BRP7 will not start if there's more than one O3AS WU Ready-To-Start (RTS). Since O3AS WUs are delivered at a much faster rate than BRP7 WUs there is always a surplus of O3AS WUs and the BRP7 WUs just sit there when they could be running. Suspend all the RTS O3AS WUs and a BRP7 WU will start running even if 2 O3AS WUs are already running. No point running more than one BRP7 along with an O3AS since a BRP7 WU maintains close to 100% GPU utilization. One can watch this using the Linux utility NVITOP: https://github.com/XuehaiPan/nvitop
The Deadline should be the same and I suggest 7 days is too much for either project and they should both be 2 days for such short running WUs. Such lengthy Deadlines harks back to the days of dial-up modems.
Aurum schrieb:The Deadline
)
I strongly oppose this comment. BOINC was never meant to be used by power crunchers alone running monster rigs 24/7. That's independent of dial-up modems or gigabit links.
Scrooge McDuck wrote: Aurum
)
I too agree with Scrooge McDuck, in order to be able to enable the widest group of crunchers possible the deadlines should be commersate with the "expected runtime, longer runtime, longer deadlines" and 2 days is not going to allow those who do things on their pc's besides crunching participate to the fullest extent possible. If the same few people do all the crunching then the chance of the results being skewed is significantly greater than a much wider base of crunchers. The 2nd problem is if one or more of those few crunchers leaves the Project then it could crumble and disappear and that would not be a good thing at all. Projects need to both entice new people and embrace the power crunchers as well, having plenty of tasks with reasonable dealines does that.
How long does it take to run
)
How long does it take to run an O3AS WU? My 1080 Ti's can run a pair of them in a little over an hour. So you're saying that your computers can only crunch an All-Sky WU for an hour a week? Then you should run Meerkats that take about 15 minutes each.
Another very serious issue is the disparity in estimated run times that the WUs DL with. O3AS does an excellent job of starting them at about 1.5 hours. Meerkat is an abysmal failure estimating that a 15 minute WU will take anywhere from 3.5 to 6.5 hours. That wreaks havoc with the BOINC scheduler. The advantage to configuring to allow running multiple projects is that when one runs out of work you keep working on something without frequent babysitting.
All-Sky wastes 50% of the GPU and consumes massive VRAM. Meerkat makes near full use of a GPU with minimal VRAM. Seems they're a good pairing until one tries it and discovers the BOINC Scheduler enters a fugue state and can only run one or the other and can only DL one or the other. Sadly opposites happen and ruins everything. Meerkat with its obscenely overstated run times fills up the queue and triggers "Not requesting work: not needed" and DLs grind to a halt. All-Sky with its shorter deadline takes priority and runs alone until all WUs are gone then Meerkat can run.
If Meerkat would shorten their deadline to the same 5 days as All-Sky and shorten their estimated run time to less than an hour then they would play nice together and the pairing alleviates the inefficient GPU utilization of All-Sky.
Bernd Machenschalk
)
If you normalize points to Meerkat then you should award 10000 points per O3AS WU.
A 1080 Ti can run a pair of Meerkats every 20 minutes so it has a throughput of 6 WU/hr and earns 3333 points per WU or 20000 points per hour per GPU.
O3AS WUs vary a good bit so I'd say an average of an hour each when running a pair on a 1080 Ti. That means a GPU can run 2 an hour. So you have to award 10000 points per WU if you want to make it the equivalent of Meerkat.
they try to make the points
)
they try to make the points equitable in terms of computational effort/difficulty, not runtime. since 50% of the WU runs on the CPU, it requires far less flops for that last 50% of the runtime than the first 50%.
_________________________________________________________________________
Ian&Steve C. wrote:they try
)
You lost me, half of the GPU is wasted running All-Sky. So you're saying divide to account for the wastage? Can you show how you do the calculation?
Aurum wrote:You lost me, half
)
like i said it's based on computational effort and not runtime. half of the GPU is not "wasted" it's just not used. you arent burning the same amount of power when it's idle. they havent been able to move that final calculation to the GPU (maybe for precision reasons? Bernd would have to clarify) so it goes to the slower CPU.
same reason they don't award the CPU tasks insane amounts of credit for taking much longer. it's a slower device so it gets less credit. the second half of the task runs on a slower device and gets less credit. I'm not sure the exact formula if there even is one, just that's the way they generally approach it
running two O3AS tasks per GPU helps fill in those gaps for a more efficient overall operation. and if you do a little bit of babysitting to keep O3AS tasks staggered, two identical systems can actually come very close to the same overall ppd on either BRP7 or O3AS. but that's on you to configure the system to run optimally. or if you prefer lower power operation just run O3AS one at a time and enjoy less overall power use and less credits.
edit, from the first post in this thread
Bernd said:
so probably a combination of required sensitivity and code optimization making the final part better suited to the CPU. sounds like they will be trying to port this to the GPU at some point, and at that time i would guess the app will run much faster/efficiently.
_________________________________________________________________________