In previous O3ASHF search the effect of splitting the frequency width (and therefore the memory requireents) of the workunits in half on the run times was barely noticeable (few %), and the gain by getting more GPUs to run this search was a good benefit.
Thus we tried to same trick with this search. This time, however, results look less encouraging - the run time of a single task increases a lot (17% on average taken from DB over all hosts), and what's currently worse is that the split seems to worsen validation.
We are re-examining our validator anyway, but this may take some time and the ultimate outcome is unclear. I'll keep the results in the DB ("waiting for file deletion") to allow for later re-examination (and crediting.
We may ultimately decide to abort that experiment for that search and go back to the single 1Hz WUs we began with. To me at least it looks like we might loose more with the "split" WUs than we could gain.
In light of your other post about the aim for intended devices on the respective subprojects here, you mentioned that BRP7 was kind of the GPU project for slower GPUs and O3AS for the more powerful GPUs. I think trying to get 2GB GPUs on O3AS is kind of contradictory to that idea. all 2GB GPUs are going to be older/slower low end GPUs at this point. I'd be curious how much production and computing power is really gained by opening the pool to them, factoring in the slowdown of each WU to do so. I can't say that I see a large number of users complaining that they cant contribute with their 2GB GPUs, but it was far more common to see the 4GB GPU users and occasionally 3GB users complaining that they couldn't crunch O3ASHF, so it made more sense to split them to include the 4GB devices. Is the juice worth the squeeze?
O3ASHF never got down sub-2GB, at least for Nvidia users. I saw each O3ASHF1b-d process using ~2.2GB, and O3ASBu was using ~2.3GB, so not much changed there. O3ASBuB is definitely less than 2GB now at around 1.2GB.
i'm also seeing more invalids with the new BuB work/validator. even using the DemodSSE app (1.17/1.15/1.08).
with the O3ASBu work, and older O3ASHF work I had essentially zero invalids with this application. or maybe 1 invalids in tens of thousands of completed tasks. I spot checked several kinds of systems and there doesnt seem to be much pattern to it, everyone seems to have a small number of random invalids. maybe the validator is too strict?
i'm also seeing more invalids with the new BuB work/validator. even using the DemodSSE app (1.17/1.15/1.08).
For reference, my Godzilla with GW BuB O3AS (1.17), I've 50 tasks completed and validated and I'm seeing 7 invalids & 2 errors. With my 'Bu' tasks completed and validated, I have zero invalids or errors.
For reference, my Godzilla with GW BuB O3AS (1.17), I've 50 tasks completed and validated and I'm seeing 7 invalids & 2 errors. With my 'Bu' tasks completed and validated, I have zero invalids or errors.
Considering this + the 17% average slow down I guess that a lot of those 2GB cards are required to compensate the loss on all 4+GB cards. Are there that many here? Most of them are around 10 years old.
In general I also think project applications should be compatible to as many devices as possible, so users with older hardware can contribute, but here the price is too high IMHO, in particular since those cards can run BRP7. They could also run BRP4 if there was an application available, maybe even older cards with less than 2GB could run it if the old 1.33 CUDA application was still available. So there is use for those cards without ruining O3 performance of 4+GB cards.
@Bernd - I've had a look at my O3ASBuB invalids over the past 3days. Win/NV/116ocl is running 26%, linux/NV/116 is running 28% and linux/NV/117 is 10%. Same HW config as HF/Bu where zero invalids were recorded w/114 app. Hopefully something can be done with the validation process.
In previous O3ASHF search the
)
In previous O3ASHF search the effect of splitting the frequency width (and therefore the memory requireents) of the workunits in half on the run times was barely noticeable (few %), and the gain by getting more GPUs to run this search was a good benefit.
Thus we tried to same trick with this search. This time, however, results look less encouraging - the run time of a single task increases a lot (17% on average taken from DB over all hosts), and what's currently worse is that the split seems to worsen validation.
We are re-examining our validator anyway, but this may take some time and the ultimate outcome is unclear. I'll keep the results in the DB ("waiting for file deletion") to allow for later re-examination (and crediting.
We may ultimately decide to abort that experiment for that search and go back to the single 1Hz WUs we began with. To me at least it looks like we might loose more with the "split" WUs than we could gain.
BM
In light of your other post
)
In light of your other post about the aim for intended devices on the respective subprojects here, you mentioned that BRP7 was kind of the GPU project for slower GPUs and O3AS for the more powerful GPUs. I think trying to get 2GB GPUs on O3AS is kind of contradictory to that idea. all 2GB GPUs are going to be older/slower low end GPUs at this point. I'd be curious how much production and computing power is really gained by opening the pool to them, factoring in the slowdown of each WU to do so. I can't say that I see a large number of users complaining that they cant contribute with their 2GB GPUs, but it was far more common to see the 4GB GPU users and occasionally 3GB users complaining that they couldn't crunch O3ASHF, so it made more sense to split them to include the 4GB devices. Is the juice worth the squeeze?
O3ASHF never got down sub-2GB, at least for Nvidia users. I saw each O3ASHF1b-d process using ~2.2GB, and O3ASBu was using ~2.3GB, so not much changed there. O3ASBuB is definitely less than 2GB now at around 1.2GB.
_________________________________________________________________________
i'm also seeing more invalids
)
i'm also seeing more invalids with the new BuB work/validator. even using the DemodSSE app (1.17/1.15/1.08).
with the O3ASBu work, and older O3ASHF work I had essentially zero invalids with this application. or maybe 1 invalids in tens of thousands of completed tasks. I spot checked several kinds of systems and there doesnt seem to be much pattern to it, everyone seems to have a small number of random invalids. maybe the validator is too strict?
_________________________________________________________________________
Bernd Machenschalk
)
Yes, I'm surprised at the numbers of results being marked as invalid.
Soli Deo Gloria
Bernd Machenschalk wrote:and
)
No invalids here yet, some inconclusives and some of those will likely become invalid.
EDIT: one became invalid now.
.
Ian&Steve C. wrote: i'm also
)
For reference, my Godzilla with GW BuB O3AS (1.17), I've 50 tasks completed and validated and I'm seeing 7 invalids & 2 errors. With my 'Bu' tasks completed and validated, I have zero invalids or errors.
Proud member of the Old Farts Association
GWGeorge007 wrote:For
)
Considering this + the 17% average slow down I guess that a lot of those 2GB cards are required to compensate the loss on all 4+GB cards. Are there that many here? Most of them are around 10 years old.
In general I also think project applications should be compatible to as many devices as possible, so users with older hardware can contribute, but here the price is too high IMHO, in particular since those cards can run BRP7. They could also run BRP4 if there was an application available, maybe even older cards with less than 2GB could run it if the old 1.33 CUDA application was still available. So there is use for those cards without ruining O3 performance of 4+GB cards.
.
Also had virtually no
)
Also had virtually no invalids previously but many now.
Jumping in to say I'm on the
)
Jumping in to say I'm on the same boat. Almost never had a invalid result on this sub-project but have 38 since the move to BuB.
@Bernd - I've had a look at
)
@Bernd - I've had a look at my O3ASBuB invalids over the past 3days. Win/NV/116ocl is running 26%, linux/NV/116 is running 28% and linux/NV/117 is 10%. Same HW config as HF/Bu where zero invalids were recorded w/114 app. Hopefully something can be done with the validation process.