Gravitational Wave search O2 Multi-Directional ("O2MD1")

Mr Anderson

Joined: 28 Oct 17

Posts: 39

Credit: 150078722

RAC: 43278

Gary Roberts wrote:If you've

14 Oct 2019 11:14:13 UTC

Message 173873 in response to message 173866

(moderation:

)

Gary Roberts wrote:

If you've successfully used your GPU for the FGRPB1G search you should continue to use it there. Unless you have access to a more modern GPU, you probably should just use CPU cores for the new GW tasks.

But isn't this why there is an option for running test applications? So that the applications can be developed and tested on a variety of systems, both old and new, to see how they perform? After all many enthusiasts will often cobble together whatever computer hardware they have available to run these searches and not all of it is going to be the latest stuff. Therefore it is important that tasks don't just consume massive amounts of time and energy, going nowhere and then erroring out when run on less than ideal systems. Granted, the GPU is not the newest but in my experience programming embedded systems, oftentimes with very little debugging capabilities, it is easy just to pass the buck and find something easy to blame rather than to delve into the problem and find the true cause. In this case, some things don't sit right with me with the simple "it's too old" explanation:

1. If the hardware is indeed too old then it should never be getting the job in the first place. Perhaps this may have happened because the parameters, which are for deciding if some hardware is capable of running a job or not, have not been fine tuned yet. In that case this information should assist in setting these parameters.

2. It is true that the FGRPB1G search runs quite well on my hardware. However if the credit awarded is a reasonable representation of the computing effort performed, then that would indicate that the GPU is about 30 times better than the CPU since the FGRPB1G search earns me about 1 point per second whereas the CPU based tasks only earn that much in about 30 or so seconds. Perhaps this is overly simplistic but given the difference between them, one would expect a GPU based task to then run somewhat faster than the CPU one and not be far slower, unless of course something terrible went wrong and it effectively deadlocked itself or choked on it because some resource was entirely used up but then we come back to the point of beta testing applications for finding these things and fixing them.

Bernd Machenschalk

Moderator

Administrator

Joined: 15 Oct 04

Posts: 4312

Credit: 250554979

RAC: 34566

There was a bit of trouble

25 Oct 2019 7:47:00 UTC

Message 174018

(moderation:

)

There was a bit of trouble recently with this search, as apparently the workunit generation got stuck.

We will continue this search today with a new "sub-run" (attentive participants may notice an "O2MD1G2" in the task names), that extends the frequency range of the "G374.3" target to 650Hz.

Due to different analysis parameters the timing might be a little different to what you are used from previous tasks, but the difference in overall run-time should stay in the order of normal variation (~10%).

Independently we issued a new version of the GPU App (2.02). The analysis code and thus behavior should be identical to the previous version, we just leave the hastily, manually built 2.01 versions (that were necessary because our normal CI system failed) in favor of properly, automatically built and tagged versions. An additional benefit of the automated build is that we now again have versions for Mac OSX.

Until further notice, the computing setup will remain the same, i.e. full, fixed replication, GPU app versions in Beta test, which results will be "validated" with that of CPU versions.

Aurum

Joined: 12 Jul 17

Posts: 77

Credit: 3412397040

RAC: 531

Bernd Machenschalk

25 Oct 2019 16:35:13 UTC

Message 174032 in response to message 174018

(moderation:

)

Bernd Machenschalk wrote:

...results will be "validated" with that of CPU versions.

Wow!!! I now know why validations may take months.

Is there a mathematical reason for doing this???

Bernd Machenschalk

Moderator

Administrator

Joined: 15 Oct 04

Posts: 4312

Credit: 250554979

RAC: 34566

Sorry, the previous validator

28 Oct 2019 7:05:02 UTC

Message 174100

(moderation:

)

Sorry, the previous validator wasn't prepared for the new "sub-run", which caused a lot of "validate errors" over the weekend. The problem has been fixed and I'll issue 're-validation' of the affected results from the weekend, so you will get proper credit.

Betreger

Joined: 25 Feb 05

Posts: 992

Credit: 1591312361

RAC: 766311

I don't think it's quite

28 Oct 2019 20:30:36 UTC

Message 174110 in response to message 174100

(moderation:

)

I don't think it's quite fixed yet. I still have 3 validate errors and in all 3 cases the task validated with a CPU and another CPU. The credits I don't care about but I think there may still be a small bug running around.

Keith Myers

Joined: 11 Feb 11

Posts: 4964

Credit: 18744608437

RAC: 7014789

I have had a few of those

28 Oct 2019 21:02:50 UTC

Message 174111

(moderation:

)

I have had a few of those myself. But it is not caused by the application. It is caused by a race condition where the third task replication is sent out after the 2nd replication task is returned but before it is validated. You get an invalid simply because the task was already validated. Problem with the validator code here. On other projects like Seti, you get credit for the task if it agrees with the previous canonical result of the first two replications even after they have been validated.

Betreger

Joined: 25 Feb 05

Posts: 992

Credit: 1591312361

RAC: 766311

I just got another for 4 as

28 Oct 2019 21:53:10 UTC

Message 174113 in response to message 174111

(moderation:

)

I just got another for 4 as of now.

Gary Roberts

Moderator

Joined: 9 Feb 05

Posts: 5872

Credit: 117665579359

RAC: 35185048

Keith Myers wrote:... On

29 Oct 2019 1:13:52 UTC

Message 174115 in response to message 174111

(moderation:

)

Keith Myers wrote:

... On other projects like Seti, you get credit for the task if it agrees with the previous canonical result of the first two replications even after they have been validated.

If the extra result is within the validator's limits, it will also be credited here as well, so a 'race' condition that leads to the validator improperly dealing with an extra result can't be the explanation of invalid results here.

In any case, Betreger's comment was about validate errors and not a result that actually gets declared as invalid after being compared to other results. It may be something to do with results in the pipeline but not previously presented for validation - no way to know the full story. We don't know what changes were made to the validator (and exactly when) or how and when the 'adjustment' to previously declared validate errors is going to be made so everyone needs to wait a while until Bernd has had a chance to follow through on his planned corrective action. I'm guessing that a bunch of validate errors may actually turn out to be a bunch of useful results.

Cheers,
Gary.

Betreger

Joined: 25 Feb 05

Posts: 992

Credit: 1591312361

RAC: 766311

I just got another validate

29 Oct 2019 12:53:21 UTC

Message 174119

(moderation:

)

I just got another validate error and it was sent to me on the 28th. Methinks a bug still lurks.

Keith Myers

Joined: 11 Feb 11

Posts: 4964

Credit: 18744608437

RAC: 7014789

All my validate errors I have

31 Oct 2019 6:22:26 UTC

Message 174161

(moderation:

)

All my validate errors I have lost out to two GWnew 2.00 cpu tasks against my GW 2.02 gpu app. My validate error tasks have all been sent AFTER the first of the the wingmen reported but before the second wingman reported.

I am always the third wingman. Never the original two replication wingmen.

https://einsteinathome.org/workunit/423017223

https://einsteinathome.org/workunit/422737175

https://einsteinathome.org/workunit/422753854

https://einsteinathome.org/workunit/422781958

Gravitational Wave search O2 Multi-Directional ("O2MD1")

Forums › Technical News

Comment viewing options

Forums › Technical News