Is this the machine you're talking about? It shows as having a 2GB AMD 7800 series GPU.
I looked at the tasks list and the only MeerKAT tasks still showing were run with the v0.01 app over a month ago. Have you actually run concurrent tasks more recently with something closer to a production version??
Yes that's the one. Bad choice of words on my part as it is no longer running tasks, but it was and didn't seem to have many issues then.
I'm not too worried I have an issue, just want the devs to know what I'm experiencing for future development of these apps.
For now I'll just let my cache run at 2x on my NVIDIA GPUs and turn off Beta for now.
Is this the machine you're talking about? It shows as having a 2GB AMD 7800 series GPU.
I looked at the tasks list and the only MeerKAT tasks still showing were run with the v0.01 app over a month ago. Have you actually run concurrent tasks more recently with something closer to a production version??
Yes that's the one. Bad choice of words on my part as it is no longer running tasks, but it was and didn't seem to have many issues then.
I'm not too worried I have an issue, just want the devs to know what I'm experiencing for future development of these apps.
For now I'll just let my cache run at 2x on my NVIDIA GPUs and turn off Beta for now.
Thanks to all who responded and helped.
There's a whole thread about the Meerkat tasks with Bernd the Admin talking about how they are working to fix the Validations etc by tweaking the apps and what's working now and what isn't here:
With another 24 hours running 0.12 MeerKAT for Windows/AMD on three hosts things continue to look very good.
I looked at my invalid tasks sorted by sent date. I did see a couple of failures to quorum partners running v0.12 Linux applications. I've not yet spotted either success of failure for my tasks matched against v0.13 Liinux work. I have high hopes based on reports from others.
I also saw something new to me: half a dozen tasks with the status showing as Completed, can't validate.
These turned out to be tasks created from Work Units which reached the 20 task limit.
While I expected these would turn out to be WUs which had piled up computation errors on too many machines, the dominant problem in all cases was "Error while downloading". with the failures logged largely on September 2 and 3.
As with the previous problem with computation errors, it appears that some recent downloading failures are not random, but preferentially associated with tasks created from a small subset of WUs.
I just noticed that there was a window a couple of hours ago in which my system received MeerKAT work on an 0.14 version.
The full listed application reference is
Binary Radio Pulsar Search (MeerKAT) v0.14 (BRP7-opencl-ati-gcc4) windows_x86_64
There was a cuda55 variant also carrying the 0.14 and gcc4 markings.
As my subsequent work fetches have again been 0.12 work, and the application page does not currently list anything above 0.13, I surmise something was not satisfactory about these. I'm currently processing the seven of these I have in hand ahead of their turn, so I'll have seen some behavior on one of my systems within an hour.
[edit to add: I've run some Windows AMD 0.14 gcc4 tasks to completion on one of my systems. It seemed entirely unremarkable, taking very similar elapsed time to 0.12. However, among the seven tasks, all were initially assigned 0.14 cuda55 quorum partners. Of the three or four of those cuda55 trials that had reported when I looked, all had generated fast fails ("error while computing" with elapsed time under 10 seconds). I suppose that may be why these versions were pulled out of service so quickly.]
"gcc4" sounds like they are playing with an older compiler for their application. though I wonder why using such an old version would be beneficial. gcc4 is VERY old. depending on which exact subversion they are using, GCC 4 releases spanned from ~2005-2016.
So my seven 0.14 gcc4 tasks have had 0.14 cuda quorum partners return results for four different tasks, all of which have generated fast fails.
To add insult to injury, one of those WUs where I had an initial fast failing 0.14 cuda partner, was then issued as an 0.12 cuda task. For this I got an inconclusive--in other words my result did not match well enough to count. This is a bit troublesome, as 0.12 cuda tasks have successfully validated against my 0.12 ati tasks quite often.
If you have some of these 0.14 gcc4 tasks in your queue, you may wish to consider aborting them. I doubt that running them at this stage will add much value.
Gary Roberts
)
Yes that's the one. Bad choice of words on my part as it is no longer running tasks, but it was and didn't seem to have many issues then.
I'm not too worried I have an issue, just want the devs to know what I'm experiencing for future development of these apps.
For now I'll just let my cache run at 2x on my NVIDIA GPUs and turn off Beta for now.
Thanks to all who responded and helped.
bluestang wrote: Gary
)
There's a whole thread about the Meerkat tasks with Bernd the Admin talking about how they are working to fix the Validations etc by tweaking the apps and what's working now and what isn't here:
https://einsteinathome.org/content/em-searches-brp-raidiopulsar-and-fgrp-gamma-ray-pulsar
In short everyday the validations are increasing against other types of OS's and other things as well.
With another 24 hours running
)
With another 24 hours running 0.12 MeerKAT for Windows/AMD on three hosts things continue to look very good.
I looked at my invalid tasks sorted by sent date. I did see a couple of failures to quorum partners running v0.12 Linux applications. I've not yet spotted either success of failure for my tasks matched against v0.13 Liinux work. I have high hopes based on reports from others.
I also saw something new to me: half a dozen tasks with the status showing as Completed, can't validate.
These turned out to be tasks created from Work Units which reached the 20 task limit.
While I expected these would turn out to be WUs which had piled up computation errors on too many machines, the dominant problem in all cases was "Error while downloading". with the failures logged largely on September 2 and 3.
As with the previous problem with computation errors, it appears that some recent downloading failures are not random, but preferentially associated with tasks created from a small subset of WUs.
I just noticed that there was
)
I just noticed that there was a window a couple of hours ago in which my system received MeerKAT work on an 0.14 version.
The full listed application reference is
Binary Radio Pulsar Search (MeerKAT) v0.14 (BRP7-opencl-ati-gcc4) windows_x86_64
There was a cuda55 variant also carrying the 0.14 and gcc4 markings.
As my subsequent work fetches have again been 0.12 work, and the application page does not currently list anything above 0.13, I surmise something was not satisfactory about these. I'm currently processing the seven of these I have in hand ahead of their turn, so I'll have seen some behavior on one of my systems within an hour.
[edit to add: I've run some Windows AMD 0.14 gcc4 tasks to completion on one of my systems. It seemed entirely unremarkable, taking very similar elapsed time to 0.12. However, among the seven tasks, all were initially assigned 0.14 cuda55 quorum partners. Of the three or four of those cuda55 trials that had reported when I looked, all had generated fast fails ("error while computing" with elapsed time under 10 seconds). I suppose that may be why these versions were pulled out of service so quickly.]
"gcc4" sounds like they are
)
"gcc4" sounds like they are playing with an older compiler for their application. though I wonder why using such an old version would be beneficial. gcc4 is VERY old. depending on which exact subversion they are using, GCC 4 releases spanned from ~2005-2016.
Ubunbtu 22.04 has v11.x for example.
_________________________________________________________________________
So my seven 0.14 gcc4 tasks
)
So my seven 0.14 gcc4 tasks have had 0.14 cuda quorum partners return results for four different tasks, all of which have generated fast fails.
To add insult to injury, one of those WUs where I had an initial fast failing 0.14 cuda partner, was then issued as an 0.12 cuda task. For this I got an inconclusive--in other words my result did not match well enough to count. This is a bit troublesome, as 0.12 cuda tasks have successfully validated against my 0.12 ati tasks quite often.
https://einsteinathome.org/workunit/669056670
If you have some of these 0.14 gcc4 tasks in your queue, you may wish to consider aborting them. I doubt that running them at this stage will add much value.