To reach a better sensitivity, we'll make the resulting list of candidates even longer. This means not only larger result files to be uploaded (well, shouldn't be much of a problem nowadays), but also a longer time taken for the "recalc" step ("recalculating toplist ststistics" is written in stderr). This step is done purely on the CPU. We are working on porting it to the GPUs, but the memory access pattern of this step is so unpredictable that we don't get much speedup from that yet (accessing "global" memory on the GPU is still terribly slow). We hope to get an improved version of the App out during the run.
so probably a combination of required sensitivity and code optimization making the final part better suited to the CPU. sounds like they will be trying to port this to the GPU at some point, and at that time i would guess the app will run much faster/efficiently.
The GPUs global memory, as the name suggests, is not associated with any particular compute unit. I had thought/heard that it was page swaps of the GPUs global memory (ie. oversubscription & cache miss) that was the slow down (possibly two orders slower). Thus those really fast compute units are I/O bound (the GPU to CPU interconnect).
Cheers, Mike.
I have made this letter longer than usual because I lack the time to make it shorter ...
... and my other CPU is a Ryzen 5950X :-) Blaise Pascal
It seems to me the way the estimated run time for Meerkat and All-Sky are estimated are confounded somehow. Yesterday one computer finished its All-Sky WUs and it just ran Meerkats for a day with my queue set to 1/0. The TimeLeft (From BoincTasks) dropped from 7 hours to 00:25:49 hours. Then in Preferences I turned off Meerkat and turned on All-Sky and with my queue set to 1/0.5 it DLed 155 O3AS WUs with a TimeLeft of 00:06:11 hours and I already had 79 Meerkat WUs. Very strange behavior since the previous DL of All-Sky had WUs with TimeLeft of about 1.5 hours.
I have another computer that DLed Meerkats with 11 hour run times. Unless Meerkat runs alone for a day it can start with any time from 3 to 11 hours for a 15 minute task.
There also seems to be interference with FGRP5 CPU WUs. With queue set to 1/0 if the TimeLeft for the GPU WUs totals a day then I cannot DL any FGRP5 WUs even if there are no CPU WUs of any kind running.
CPU WUs and GPU WUs should be managed separately but they appear to be combined on E@H.
it's because the project (as do all projects AFAIK) shares the same DCF for all task types. and almost always one task type needs a wildly different DCF than another to produce an accurate runtime estimation. so as you process one kind of task, the DCF shifts in the direction to converge on the value required for that task, which makes the other task runtime estimations more and more inaccurate.
there's not really a good solution to this other than just running only one kind of task, or a really small work cache so that the inaccurate runtimes don't impact the schedule requests as much.
Every einstein cruncher initially stumbles over the strangely fluctuating run time estimates when crunching different task types (the default). Estimates are far too short then far too long. By observing tasks for a long enough time, you can get closer to the cause and realize what's the only solution: keep task cache small (or confine to one task type). But probably only experienced long-time crunchers who may also have read the responsible source code of the BOINC client (or run it in full debug mode flooding the messages window) fully understand the reasons. So, I think a short FAQ entry about this 'feature' would be very useful.
We are planning a change to the current run. We will trade in a bit of runtime for memory. The workunits that we plan to produce in the future will run a bit longer (~10%), but take significantly less memory, such that GPUs with 4GB can run these. A few modifications need to be made to the app to achieve this. The app version 1.06 that we just published already incorporates the necessary changes. With the current workunits, however, it should behave identical to the previous 1.04 version.
In (our part of) Germany there is a long weekeend ahead, and I'm reluctant to deploy such a change of the workunits before a longer period of reduced attention from our side. So this change of workunits will likely happen mid next week.
That is good news, overall. Originally, you all were working on the recalculation step (from CPU to GPU) but said it didn't seem to speed up the work. Is anything in the works related to this? That recalculation step is really intensive. With the drop in VRAM requirements, I would assume more users will want to run more of these work units at the same time which can lead to some interesting bottlenecks (from what I have seen on our systems).
was the change mainly to widen the pool of available devices for an overall increase in crunching power?
yes indeed.
what is the estimate for the new minimum amount of VRAM needed for the new app/tasks?
We're still running tests, and t seems that we have only a limited range of machines and, in particular OS and the app doesn't behave the same. Also too, the memory consumption is data dependent (i.e. on the individual WU).
But the max should be below what was required for the previous run O3MD1. which was 3,5GB max.
Ian&Steve C. wrote: Bernd
)
The GPUs global memory, as the name suggests, is not associated with any particular compute unit. I had thought/heard that it was page swaps of the GPUs global memory (ie. oversubscription & cache miss) that was the slow down (possibly two orders slower). Thus those really fast compute units are I/O bound (the GPU to CPU interconnect).
Cheers, Mike.
I have made this letter longer than usual because I lack the time to make it shorter ...
... and my other CPU is a Ryzen 5950X :-) Blaise Pascal
It seems to me the way the
)
It seems to me the way the estimated run time for Meerkat and All-Sky are estimated are confounded somehow. Yesterday one computer finished its All-Sky WUs and it just ran Meerkats for a day with my queue set to 1/0. The TimeLeft (From BoincTasks) dropped from 7 hours to 00:25:49 hours. Then in Preferences I turned off Meerkat and turned on All-Sky and with my queue set to 1/0.5 it DLed 155 O3AS WUs with a TimeLeft of 00:06:11 hours and I already had 79 Meerkat WUs. Very strange behavior since the previous DL of All-Sky had WUs with TimeLeft of about 1.5 hours.
I have another computer that DLed Meerkats with 11 hour run times. Unless Meerkat runs alone for a day it can start with any time from 3 to 11 hours for a 15 minute task.
There also seems to be interference with FGRP5 CPU WUs. With queue set to 1/0 if the TimeLeft for the GPU WUs totals a day then I cannot DL any FGRP5 WUs even if there are no CPU WUs of any kind running.
CPU WUs and GPU WUs should be managed separately but they appear to be combined on E@H.
the estimated runtime comes
)
the estimated runtime comes from your client.
it's because the project (as do all projects AFAIK) shares the same DCF for all task types. and almost always one task type needs a wildly different DCF than another to produce an accurate runtime estimation. so as you process one kind of task, the DCF shifts in the direction to converge on the value required for that task, which makes the other task runtime estimations more and more inaccurate.
there's not really a good solution to this other than just running only one kind of task, or a really small work cache so that the inaccurate runtimes don't impact the schedule requests as much.
_________________________________________________________________________
Great explanation of the
)
Great explanation of the things going on CPU and GPU. Thanks to all.
It is very "educational" and interesting (for me).
I`m just wondering if this is important to disect how the credits/run times/etc are precalculated.
For the sake of understanding the way it works it's fine.
But I think it is irrelevant to the objective of crunching.
No matter how the calcs are being done (poor or great) I personally don't mind/care ...
I just want to crunch - which I am trying to do.
I'm very satified (probably no one cares about my positioning) with the way things work here.
cheers
S-F-V
Most projects have ditched
)
Most projects have ditched the DCF mechanism for tasks, Einstein being one of the single digit holdouts.
Most BOINC projects moved to separate APR calculation numbers for each application type.
This method has correct estimated calculation numbers for each task.
It might be a good idea to
)
It might be a good idea to put the content of Ian's last comment, appropriately described, directly on the project's FAQ page:
https://einsteinathome.org/faq
Every einstein cruncher initially stumbles over the strangely fluctuating run time estimates when crunching different task types (the default). Estimates are far too short then far too long. By observing tasks for a long enough time, you can get closer to the cause and realize what's the only solution: keep task cache small (or confine to one task type). But probably only experienced long-time crunchers who may also have read the responsible source code of the BOINC client (or run it in full debug mode flooding the messages window) fully understand the reasons. So, I think a short FAQ entry about this 'feature' would be very useful.
We are planning a change to
)
We are planning a change to the current run. We will trade in a bit of runtime for memory. The workunits that we plan to produce in the future will run a bit longer (~10%), but take significantly less memory, such that GPUs with 4GB can run these. A few modifications need to be made to the app to achieve this. The app version 1.06 that we just published already incorporates the necessary changes. With the current workunits, however, it should behave identical to the previous 1.04 version.
In (our part of) Germany there is a long weekeend ahead, and I'm reluctant to deploy such a change of the workunits before a longer period of reduced attention from our side. So this change of workunits will likely happen mid next week.
BM
that's great to hear
)
that's great to hear Bernd.
was the change mainly to widen the pool of available devices for an overall increase in crunching power?
what is the estimate for the new minimum amount of VRAM needed for the new app/tasks?
_________________________________________________________________________
That is good news, overall.
)
That is good news, overall. Originally, you all were working on the recalculation step (from CPU to GPU) but said it didn't seem to speed up the work. Is anything in the works related to this? That recalculation step is really intensive. With the drop in VRAM requirements, I would assume more users will want to run more of these work units at the same time which can lead to some interesting bottlenecks (from what I have seen on our systems).
was the change mainly to
)
yes indeed.
We're still running tests, and t seems that we have only a limited range of machines and, in particular OS and the app doesn't behave the same. Also too, the memory consumption is data dependent (i.e. on the individual WU).
But the max should be below what was required for the previous run O3MD1. which was 3,5GB max.
BM