Gravitational Wave Engineering run on LIGO O1 Open Data

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5,838
Credit: 108,510,457,053
RAC: 33,665,341

DanNeely wrote:Setting

DanNeely wrote:

Setting cpu/gpu usage to 0 threw an error message when I tried reading the config file.  Going the other direction and setting hardware requirements well in excess of what my boxes have worked great though:

<app>
    <name>einstein_O1OD1E</name>
    <gpu_versions>
        <gpu_usage>99</gpu_usage>
        <cpu_usage>99</cpu_usage>
    </gpu_versions>
</app>

I guess that makes perfect sense if you think about it :-).  Telling BOINC that a particular app requires much more hardware than you have - so don't even try these tasks - sounds like the logical solution :-).

 

Cheers,
Gary.

DanNeely
DanNeely
Joined: 4 Sep 05
Posts: 1,364
Credit: 3,562,358,667
RAC: 17,012

DanNeely wrote:Gary Roberts

DanNeely wrote:
Gary Roberts wrote:
Richie wrote:
DanNeely wrote:
Is there a way to opt out of the GPU tasks from the engineering run until they're able to perform better while still running CPU work from it?

'ON' for CPU and 'OFF' for all GPUs (AMD , Nvidia , Intel).

I suspect Dan would just want to exclude O1OD1E GPU tasks and not FGRPB1G tasks as well.  Your suggestion excludes all types of GPU crunching.  Off the top of my head (I've never tried it) a possible way would be to use the app_config.xml mechanism and use both the name and plan class tags to identify just the GPU version.  Perhaps setting the cpu_usage and gpu_usage (or maybe the max_concurrent) for that combination to zero might effectively exclude those tasks without affecting anything else.  It would be worth experimenting.

 

 

Setting cpu/gpu usage to 0 threw an error message when I tried reading the config file.  Going the other direction and setting hardware requirements well in excess of what my boxes have worked great though:

 

<app>     <name>einstein_O1OD1E</name>     <gpu_versions>         <gpu_usage>99</gpu_usage>         <cpu_usage>99</cpu_usage>     </gpu_versions> </app>

 

after ~12 hours on each of two systems I'm reasonably confident this is working as expected, I'm getting a mix of O1OD1E CPU tasks and Fermi GPU tasks on both, but not anything I don't want.

 

And apparently I'm not as clever as I thought, the server was just toying with me.  Both of my boxes recently got several 99CPU 99GPU needed tasks.  It's late so I'm not going to screw around and see if boinc will attempt to run them until sometime tomorrow.  But it looks like I need a plan C of some sort.

DanNeely
DanNeely
Joined: 4 Sep 05
Posts: 1,364
Credit: 3,562,358,667
RAC: 17,012

DanNeely wrote:DanNeely

DanNeely wrote:
DanNeely wrote:
Gary Roberts wrote:
Richie wrote:
DanNeely wrote:
Is there a way to opt out of the GPU tasks from the engineering run until they're able to perform better while still running CPU work from it?

'ON' for CPU and 'OFF' for all GPUs (AMD , Nvidia , Intel).

I suspect Dan would just want to exclude O1OD1E GPU tasks and not FGRPB1G tasks as well.  Your suggestion excludes all types of GPU crunching.  Off the top of my head (I've never tried it) a possible way would be to use the app_config.xml mechanism and use both the name and plan class tags to identify just the GPU version.  Perhaps setting the cpu_usage and gpu_usage (or maybe the max_concurrent) for that combination to zero might effectively exclude those tasks without affecting anything else.  It would be worth experimenting.

 

 

Setting cpu/gpu usage to 0 threw an error message when I tried reading the config file.  Going the other direction and setting hardware requirements well in excess of what my boxes have worked great though:

 

<app>     <name>einstein_O1OD1E</name>     <gpu_versions>         <gpu_usage>99</gpu_usage>         <cpu_usage>99</cpu_usage>     </gpu_versions> </app>

 

after ~12 hours on each of two systems I'm reasonably confident this is working as expected, I'm getting a mix of O1OD1E CPU tasks and Fermi GPU tasks on both, but not anything I don't want.

 

And apparently I'm not as clever as I thought, the server was just toying with me.  Both of my boxes recently got several 99CPU 99GPU needed tasks.  It's late so I'm not going to screw around and see if boinc will attempt to run them until sometime tomorrow.  But it looks like I need a plan C of some sort.

 

Well the tasks won't run at least.  

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5,838
Credit: 108,510,457,053
RAC: 33,665,341

DanNeely wrote:... it looks

DanNeely wrote:
... it looks like I need a plan C of some sort.

The documention implies that you can use an <app_version> clause to replace an <app> clause.  It says "overrides" but because the <app> clause itself is shown as optional, I suspect you would use one or the other rather than expecting the second to override the first.  Maybe you'll need to try both ways.

<max_concurrent> is an option in an <app> clause but it's not shown at all for <app_version>.  It might be just an oversight so perhaps something like the following might do what you want.  Note that xxxx represents the type of GPU you have, ati or nvidia.  If a <max_concurrent> of zero is accepted, the client might know not to request work for that plan_class.  Maybe you'll get a better idea by checking what actually gets installed in the state file.

<app_version>
    <app_name>einstein_O1OD1E</app_name>
    <plan_class>GW-opencl-xxxx-V1</plan_class>
    <max_concurrent>0</max_concurrent>
    <avg_ncpus>99</avg_ncpus>
    <ngpus>99</ngpus>
</app_version>

The other things that might give some clues are the contents of a sched_request and sched_reply as a result of particular settings used in app_config.xml.  It could also be worthwhile looking at the scheduler logs on the website to see the decision making process the scheduler went through in response to a particular request.

 

Cheers,
Gary.

DanNeely
DanNeely
Joined: 4 Sep 05
Posts: 1,364
Credit: 3,562,358,667
RAC: 17,012

I just tried adding the

I just tried adding the nvidia version of that <app_verion>, loaded the config file, aborted a block of existing nvidia GW tasks, and had a fresh batch of them downloaded afterward.

 

Other than when an error occurs and a URL is listed in the event log, I'm not sure how to see a scheduler request/reply

Zalster
Zalster
Joined: 26 Nov 13
Posts: 3,117
Credit: 4,050,672,230
RAC: 0

Gary Roberts wrote:DanNeely

Gary Roberts wrote:
DanNeely wrote:
... it looks like I need a plan C of some sort.

The documention implies that you can use an <app_version> clause to replace an <app> clause.  It says "overrides" but because the <app> clause itself is shown as optional, I suspect you would use one or the other rather than expecting the second to override the first.  Maybe you'll need to try both ways.

<max_concurrent> is an option in an <app> clause but it's not shown at all for <app_version>.  It might be just an oversight so perhaps something like the following might do what you want.  Note that xxxx represents the type of GPU you have, ati or nvidia.  If a <max_concurrent> of zero is accepted, the client might know not to request work for that plan_class.  Maybe you'll get a better idea by checking what actually gets installed in the state file.

<app_version>
    <app_name>einstein_O1OD1E</app_name>
    <plan_class>GW-opencl-xxxx-V1</plan_class>
    <max_concurrent>0</max_concurrent>
    <avg_ncpus>99</avg_ncpus>
    <ngpus>99</ngpus>
</app_version>

The other things that might give some clues are the contents of a sched_request and sched_reply as a result of particular settings used in app_config.xml.  It could also be worthwhile looking at the scheduler logs on the website to see the decision making process the scheduler went through in response to a particular request.

 

Gary, what about an exclude gpu in the cc_config?  Not sure if you need the device num or not. Also don't konw if need plan class, but can't hurt to try.

<cc_config> <options>  <exclude_gpu>   <url>http://einstein.phys.uwm.edu/</url> <device_num>0</device_num>     <app_name>einstein_O1OD1E</app_name>              <plan_class>GW-opencl-xxxx-V1</plan_class>     </exclude_gpu>  </options></cc_config>

DanNeely
DanNeely
Joined: 4 Sep 05
Posts: 1,364
Credit: 3,562,358,667
RAC: 17,012

Zalster wrote:Gary Roberts

Zalster wrote:
Gary Roberts wrote:
DanNeely wrote:
... it looks like I need a plan C of some sort.

The documention implies that you can use an <app_version> clause to replace an <app> clause.  It says "overrides" but because the <app> clause itself is shown as optional, I suspect you would use one or the other rather than expecting the second to override the first.  Maybe you'll need to try both ways.

<max_concurrent> is an option in an <app> clause but it's not shown at all for <app_version>.  It might be just an oversight so perhaps something like the following might do what you want.  Note that xxxx represents the type of GPU you have, ati or nvidia.  If a <max_concurrent> of zero is accepted, the client might know not to request work for that plan_class.  Maybe you'll get a better idea by checking what actually gets installed in the state file.

<app_version>
    <app_name>einstein_O1OD1E</app_name>
    <plan_class>GW-opencl-xxxx-V1</plan_class>
    <max_concurrent>0</max_concurrent>
    <avg_ncpus>99</avg_ncpus>
    <ngpus>99</ngpus>
</app_version>

The other things that might give some clues are the contents of a sched_request and sched_reply as a result of particular settings used in app_config.xml.  It could also be worthwhile looking at the scheduler logs on the website to see the decision making process the scheduler went through in response to a particular request.

 

Gary, what about an exclude gpu in the cc_config?  Not sure if you need the device num or not. Also don't konw if need plan class, but can't hurt to try.

<cc_config> <options>  <exclude_gpu>   <url>http://einstein.phys.uwm.edu/</url> <device_num>0</device_num>     <app_name>einstein_O1OD1E</app_name>              <plan_class>GW-opencl-xxxx-V1</plan_class>     </exclude_gpu>  </options></cc_config>

 

Your attempt to limit the exclusion to GW GPU tasks didn't work, it also showed GPU missing on my Fermi tasks, failed over to a backup project, and at some point in there process began aborting the fermi GPU tasks (I managed to stop boinc and revert the change before it took out more than 50 or 60 of them).

Betreger
Betreger
Joined: 25 Feb 05
Posts: 987
Credit: 1,402,009,428
RAC: 798,178

A couple of observations.  I

A couple of observations. 

I would much rather run gravity waves than binary pulsars even though the RAC is taking a huge hit. If I wanted credits I would not run Einstein at all but something stupid like Collatz. 

Since the GPU app came most work stalls after completing with a " waiting to acquire lock" They eventually clear and validate. I don't know if this is a feature or a bug. 

I hope my modest efforts with a GTX1060 are helping to develop a more efficient app, The current thing is a real hog.

Richie
Richie
Joined: 7 Mar 14
Posts: 656
Credit: 1,702,989,778
RAC: 0

Betreger wrote:Since the GPU

Betreger wrote:
Since the GPU app came most work stalls after completing with a " waiting to acquire lock" They eventually clear and validate. I don't know if this is a feature or a bug.

Hi! That was a bug on v0.12 which is now deprecated and current version is v0.13. I haven't seen a single v0.12 tasks that run properly. They couldn't validate but some v0.13 tasks run by Nvidia are validating. Also AMD tasks if they were run 1x or maybe 2x..,but I think none of 4x has yet validated. There are plenty of validation inconclusives among them instead.

scole of TSBT
scole of TSBT
Joined: 2 Mar 05
Posts: 10
Credit: 614,056,178
RAC: 75,325

zombie67 [MM wrote:]How to

EDIT: I didn't realize that was a month old post :)

zombie67 [MM wrote:
]How to get the tasks for Gravitational Wave Engineering run on LIGO O1 Open Data?  There is no way to select that app in the project preferences.  Or am I missing it?

Do you have run test applications selected? 

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.