Outcomes on MeerKAT 0.05

bluestang
bluestang
Joined: 13 Apr 15
Posts: 34
Credit: 2492970228
RAC: 2903

Gary Roberts

Gary Roberts wrote:

bluestang wrote:
... my lowly AMD 7870XT is doing 2x happily.

Is this the machine you're talking about?  It shows as having a 2GB AMD 7800 series GPU.

I looked at the tasks list and the only MeerKAT tasks still showing were run with the v0.01 app over a month ago.  Have you actually run concurrent tasks more recently with something closer to a production version??

Yes that's the one.  Bad choice of words on my part as it is no longer running tasks, but it was and didn't seem to have many issues then.

I'm not too worried I have an issue, just want the devs to know what I'm experiencing for future development of these apps.

For now I'll just let my cache run at 2x on my NVIDIA GPUs and turn off Beta for now.

Thanks to all who responded and helped.

mikey
mikey
Joined: 22 Jan 05
Posts: 11944
Credit: 1832571315
RAC: 218293

bluestang wrote: Gary

bluestang wrote:

Gary Roberts wrote:

bluestang wrote:
... my lowly AMD 7870XT is doing 2x happily.

Is this the machine you're talking about?  It shows as having a 2GB AMD 7800 series GPU.

I looked at the tasks list and the only MeerKAT tasks still showing were run with the v0.01 app over a month ago.  Have you actually run concurrent tasks more recently with something closer to a production version??

Yes that's the one.  Bad choice of words on my part as it is no longer running tasks, but it was and didn't seem to have many issues then.

I'm not too worried I have an issue, just want the devs to know what I'm experiencing for future development of these apps.

For now I'll just let my cache run at 2x on my NVIDIA GPUs and turn off Beta for now.

Thanks to all who responded and helped.

There's a whole thread about the Meerkat tasks with Bernd the Admin talking about how they are working to fix the Validations etc by tweaking the apps and what's working now and what isn't here:

https://einsteinathome.org/content/em-searches-brp-raidiopulsar-and-fgrp-gamma-ray-pulsar

In short everyday the validations are increasing against other types of OS's and other things as well.

archae86
archae86
Joined: 6 Dec 05
Posts: 3145
Credit: 7056634931
RAC: 1605648

With another 24 hours running

With another 24 hours running 0.12 MeerKAT for Windows/AMD on three hosts things continue to look very good.

I looked at my invalid tasks sorted by sent date.  I did see a couple of failures to quorum partners running v0.12 Linux applications.  I've not yet spotted either success of failure for my tasks matched against v0.13 Liinux work. I have high hopes based on reports from others.

I also saw something new to me: half a dozen tasks with the status showing as Completed, can't validate.

These turned out to be tasks created from Work Units which reached the 20 task limit.

While I expected these would turn out to be WUs which had piled up computation errors on too many machines, the dominant problem in all cases was "Error while downloading".  with the failures logged largely on September 2 and 3.

As with the previous problem with computation errors, it appears that some recent downloading failures are not random, but preferentially associated with tasks created from a small subset of WUs.

 

archae86
archae86
Joined: 6 Dec 05
Posts: 3145
Credit: 7056634931
RAC: 1605648

I just noticed that there was

I just noticed that there was a window a couple of hours ago in which my system received MeerKAT work on an 0.14 version.

The full listed application reference is 

Binary Radio Pulsar Search (MeerKAT) v0.14 (BRP7-opencl-ati-gcc4) windows_x86_64

There was a cuda55 variant also carrying the 0.14 and gcc4 markings.

As my subsequent work fetches have again been 0.12 work, and the application page does not currently list anything above 0.13, I surmise something was not satisfactory about these.  I'm currently processing the seven of these I have in hand ahead of their turn, so I'll have seen some behavior on one of my systems within an hour.

[edit to add: I've run some Windows AMD 0.14 gcc4 tasks to completion on one of my systems.  It seemed entirely unremarkable, taking very similar elapsed time to 0.12.  However, among the seven tasks, all were initially assigned 0.14 cuda55 quorum partners.  Of the three or four of those cuda55 trials that had reported when I looked, all had generated fast fails ("error while computing" with elapsed time under 10 seconds).  I suppose that may be why these versions were pulled out of service so quickly.]

Ian&Steve C.
Ian&Steve C.
Joined: 19 Jan 20
Posts: 3709
Credit: 34642603276
RAC: 41973757

"gcc4" sounds like they are

"gcc4" sounds like they are playing with an older compiler for their application. though I wonder why using such an old version would be beneficial. gcc4 is VERY old. depending on which exact subversion they are using, GCC 4 releases spanned from ~2005-2016.

Ubunbtu 22.04 has v11.x for example.

_________________________________________________________________________

archae86
archae86
Joined: 6 Dec 05
Posts: 3145
Credit: 7056634931
RAC: 1605648

So my seven 0.14 gcc4 tasks

So my seven 0.14 gcc4 tasks have had 0.14 cuda quorum partners return results for four different tasks, all of which have generated fast fails.

To add insult to injury, one of those WUs where I had an initial fast failing 0.14 cuda partner, was then issued as an 0.12 cuda task.  For this I got an inconclusive--in other words my result did not match well enough to count.  This is a bit troublesome, as 0.12 cuda tasks have successfully validated against my 0.12 ati tasks quite often.

https://einsteinathome.org/workunit/669056670

If you have some of these 0.14 gcc4 tasks in your queue, you may wish to consider aborting them.  I doubt that running them at this stage will add much value.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.