Gravitational Wave search O2 Multi-Directional ("O2MD1")

Mr Anderson
Mr Anderson
Joined: 28 Oct 17
Posts: 28
Credit: 47,375,985
RAC: 27,922

Gary Roberts wrote:If you've

Gary Roberts wrote:
If you've successfully used your GPU for the FGRPB1G search you should continue to use it there.  Unless you have access to a more modern GPU, you probably should just use CPU cores for the new GW tasks.

But isn't this why there is an option for running test applications? So that the applications can be developed and tested on a variety of systems, both old and new, to see how they perform? After all many enthusiasts will often cobble together whatever computer hardware they have available to run these searches and not all of it is going to be the latest stuff. Therefore it is important that tasks don't just consume massive amounts of time and energy, going nowhere and then erroring out when run on less than ideal systems. Granted, the GPU is not the newest but in my experience programming embedded systems, oftentimes with very little debugging capabilities, it is easy just to pass the buck and find something easy to blame rather than to delve into the problem and find the true cause. In this case, some things don't sit right with me with the simple "it's too old" explanation:

1. If the hardware is indeed too old then it should never be getting the job in the first place. Perhaps this may have happened because the parameters, which are for deciding if some hardware is capable of running a job or not, have not been fine tuned yet. In that case this information should assist in setting these parameters.

2. It is true that the FGRPB1G search runs quite well on my hardware. However if the credit awarded is a reasonable representation of the computing effort performed, then that would indicate that the GPU is about 30 times better than the CPU since the FGRPB1G search earns me about 1 point per second whereas the CPU based tasks only earn that much in about 30 or so seconds. Perhaps this is overly simplistic but given the difference between them, one would expect a GPU based task to then run somewhat faster than the CPU one and not be far slower, unless of course something terrible went wrong and it effectively deadlocked itself or choked on it because some resource was entirely used up but then we come back to the point of beta testing applications for finding these things and fixing them.

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 3,906
Credit: 191,338,509
RAC: 51,041

There was a bit of trouble

There was a bit of trouble recently with this search, as apparently the workunit generation got stuck.

We will continue this search today with a new "sub-run" (attentive participants may notice an "O2MD1G2" in the task names), that extends the frequency range of the "G374.3" target to 650Hz.

Due to different analysis parameters the timing might be a little different to what you are used from previous tasks, but the difference in overall run-time should stay in the order of normal variation (~10%).

Independently we issued a new version of the GPU App (2.02). The analysis code and thus behavior should be identical to the previous version, we just leave the hastily, manually built 2.01 versions (that were necessary because our normal CI system failed) in favor of properly, automatically built and tagged versions. An additional benefit of the automated build is that we now again have versions for Mac OSX.

Until further notice, the computing setup will remain the same, i.e. full, fixed replication, GPU app versions in Beta test, which results will be "validated" with that of CPU versions.

BM

Aurum
Aurum
Joined: 12 Jul 17
Posts: 46
Credit: 2,381,715,162
RAC: 168,565

Bernd Machenschalk

Bernd Machenschalk wrote:
...results will be "validated" with that of CPU versions.

Wow!!! I now know why validations may take months.

Is there a mathematical reason for doing this???

Electric bill: $1037. Computer costs: north of $100K. Value to science: Priceless.
There's nothing "spare" about distributed computing clients.

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 3,906
Credit: 191,338,509
RAC: 51,041

Sorry, the previous validator

Sorry, the previous validator wasn't prepared for the new "sub-run", which caused a lot of "validate errors" over the weekend. The problem has been fixed and I'll issue 're-validation' of the affected results from the weekend, so you will get proper credit.

BM

Betreger
Betreger
Joined: 25 Feb 05
Posts: 867
Credit: 460,103,671
RAC: 161,569

I don't think it's quite

I don't think it's quite fixed yet. I still have 3 validate errors and in all 3 cases the task validated with a CPU and another CPU. The credits I don't care about but I think there may still be a small bug running around. 

Keith Myers
Keith Myers
Joined: 11 Feb 11
Posts: 520
Credit: 505,954,094
RAC: 1,021,384

I have had a few of those

I have had a few of those myself.  But it is not caused by the application.  It is caused by a race condition where the third task replication is sent out after the 2nd replication task is returned but before it is validated.  You get an invalid simply because the task was already validated.  Problem with the validator code here. On other projects like Seti, you get credit for the task if it agrees with the previous canonical result of the first two replications even after they have been validated.

BoincStats

Betreger
Betreger
Joined: 25 Feb 05
Posts: 867
Credit: 460,103,671
RAC: 161,569

I just got another for 4 as

I just got another for 4 as of now. 

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5,044
Credit: 34,782,402,287
RAC: 35,367,119

Keith Myers wrote:... On

Keith Myers wrote:
... On other projects like Seti, you get credit for the task if it agrees with the previous canonical result of the first two replications even after they have been validated.

If the extra result is within the validator's limits, it will also be credited here as well, so a 'race' condition that leads to the validator improperly dealing with an extra result can't be the explanation of invalid results here.

In any case, Betreger's comment was about validate errors and not a result that actually gets declared as invalid after being compared to other results.  It may be something to do with results in the pipeline but not previously presented for validation - no way to know the full story.  We don't know what changes were made to the validator (and exactly when) or how and when the 'adjustment' to previously declared validate errors is going to be made so everyone needs to wait a while until Bernd has had a chance to follow through on his planned corrective action.  I'm guessing that a bunch of validate errors may actually turn out to be a bunch of useful results.

Cheers,
Gary.

Betreger
Betreger
Joined: 25 Feb 05
Posts: 867
Credit: 460,103,671
RAC: 161,569

I just got another validate

I just got another validate error and it was sent to me on the 28th. Methinks a bug still lurks. 

Keith Myers
Keith Myers
Joined: 11 Feb 11
Posts: 520
Credit: 505,954,094
RAC: 1,021,384

All my validate errors I have

All my validate errors I have lost out to two GWnew 2.00 cpu tasks against my GW 2.02 gpu app. My validate error tasks have all been sent AFTER the first of the the wingmen reported but before the second wingman reported.

I am always the third wingman. Never the original two replication wingmen.

https://einsteinathome.org/workunit/423017223

https://einsteinathome.org/workunit/422737175

https://einsteinathome.org/workunit/422753854

https://einsteinathome.org/workunit/422781958

 

 

BoincStats

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.