Gamma-ray pulsar binary search #1 on GPUs

TimeLord04
TimeLord04
Joined: 8 Sep 06
Posts: 1442
Credit: 72378840
RAC: 0

CElliott wrote:TimeLord04

Quote:

CElliott wrote:
TimeLord04 wrote:

[1.18 Update on Win XP Pro x64 System.]

The GTX-760 is still crunching two Units at a time.  I've noticed a considerable improvement in crunching times over the 1.17 Units.  Times now down to 2 Hours and 20 Minutes per Unit crunching two Units at a time.  Laughing

 

How to do you make one GPU process two WUs at a time, if you don't mind me asking?

I use "app_config.xml" files for my MAC and my Win XP Pro x64 Systems.

 

[app_config.xml for Win XP Pro x64 with GTX-760 GPU:]

<app_config>
<app>
<name>einsteinbinary_BRP6</name>
<max_concurrent>2</max_concurrent>
<gpu_versions>
<gpu_usage>.5</gpu_usage>
<cpu_usage>.5</cpu_usage>
</gpu_versions>
</app>
<app>
<name>einsteinbinary_BRP5</name>
<max_concurrent>2</max_concurrent>
<gpu_versions>
<gpu_usage>.5</gpu_usage>
<cpu_usage>.5</cpu_usage>
</gpu_versions>
</app>
<app>
<name>einsteinbinary_BRP4G</name>
<max_concurrent>2</max_concurrent>
<gpu_versions>
<gpu_usage>.5</gpu_usage>
<cpu_usage>.2</cpu_usage>
</gpu_versions>
</app>
<app>
<name>hsgamma_FGRP3</name>
<max_concurrent>2</max_concurrent>
<gpu_versions>
<gpu_usage>.5</gpu_usage>
<cpu_usage>.2</cpu_usage>
</gpu_versions>
</app>
<app>
<name>hsgamma_FGRPB1G</name>
<max_concurrent>2</max_concurrent>
<gpu_versions>
<gpu_usage>.5</gpu_usage>
<cpu_usage>.5</cpu_usage>
</gpu_versions>
</app>
</app_config>

 

[app_config.xml for MAC with TWO GTX-750TI SC cards:]

<app_config>
<app>
<name>einsteinbinary_BRP6</name>
<max_concurrent>4</max_concurrent>
<gpu_versions>
<gpu_usage>.5</gpu_usage>
<cpu_usage>.5</cpu_usage>
</gpu_versions>
</app>
<app>
<name>einsteinbinary_BRP4G</name>
<max_concurrent>4</max_concurrent>
<gpu_versions>
<gpu_usage>.5</gpu_usage>
<cpu_usage>.2</cpu_usage>
</gpu_versions>
</app>
<app>
<name>hsgamma_FGRPB1G</name>
<max_concurrent>4</max_concurrent>
<gpu_versions>
<gpu_usage>.5</gpu_usage>
<cpu_usage>.5</cpu_usage>
</gpu_versions>
</app>
</app_config>

 

---------------------------- [End app_config.xml Files] -------------------------

The "max_concurrent" lines tell BOINC the maximum number of instances of an application to run at one time.

The "gpu_usage" lines tell BOINC how many instances to run on a particular GPU.  ".5" translates to 2 GPU Tasks on a GPU card.

The "cpu_usage" lines tell BOINC what percentage of a CPU Core to utilize to feed the GPU.  ".5" translates to 1/2 of a CPU Core per Work Unit.  (This setting; however, is the MINIMUM that BOINC will allow of a CPU Core to use to feed the GPU.  In actuality, BOINC could use more of the CPU Core to feed the GPU.)

This is how things were explained to me.  I've adapted my original SETI app_config.xml file created by Joe Segur to work here at Einstein.

 

TL

[EDIT:]

The app_config.xml file is placed into the BOINC Data ---> Projects ---> einstein.phys.uwm.edu Folder

[EDIT 2:]

Also, for Windows use Notepad to create the app_config.xml file.  Save As, (Not Save), and make sure that "All File Types" are displayed so as NOT to have .txt appended to the filename.  Also, make sure that ANSI is shown in the Save Parameters.

For MAC, it is recommended to use TextWrangler; however, I 'cheated' and copied over my existing Windows SETI app_config.xml file and pasted it into the appropriate Folder.  Then, I opened the newly pasted File with TextEdit, made my modifications to what I've pasted here, and just Saved the File in place.

Also, for MAC, you WILL need to reinstall BOINC to reset Permissions in the OS for BOINC once the app_config.xml file is in place.

 

TL

TimeLord04
Have TARDIS, will travel...
Come along K-9!
Join SETI Refugees

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5872
Credit: 117517076926
RAC: 35368669

TimeLord04 wrote:CElliott

TimeLord04 wrote:
CElliott wrote:

How to do you make one GPU process two WUs at a time, if you don't mind me asking?

I use "app_config.xml" files for my MAC and my Win XP Pro x64 Systems.

 I hope you realise you have a whole bunch of non-relevant stuff in the examples you listed.  Why don't you at least remove all the cruft like the defunct/non-existent/out-of-data searches such as FGRP3, BRP5, BRP6, BRP4G that would confuse the person you are trying to help?

Also, you should point out that to run multiple concurrent GPU tasks, there is a requirement for a minimum amount of GPU RAM - 2GB to run 2 tasks, for example.  With sufficient RAM, by far the easiest way is to use the project preference setting for "GPU utilization factor of FGRP apps" (since FGRPB1G is the only available GPU search at the moment), rather than all the potential complication of getting the syntax right and correctly positioning an app_config.xml file.

So, unless individual (different) customization of multiple machines in the one 'location' were required, just change the GPU utilization factor from '1' to '0.5' and save the change.  The change to running 2 tasks simultaneously will then occur after new work is fetched (NOT by just an 'update').  To achieve this immediately, just increase the work cache setting slightly so the BOINC client will need to do a work fetch.

 

Cheers,
Gary.

TimeLord04
TimeLord04
Joined: 8 Sep 06
Posts: 1442
Credit: 72378840
RAC: 0

Gary Roberts wrote:TimeLord04

Gary Roberts wrote:
TimeLord04 wrote:
CElliott wrote:

How to do you make one GPU process two WUs at a time, if you don't mind me asking?

I use "app_config.xml" files for my MAC and my Win XP Pro x64 Systems.

 I hope you realise you have a whole bunch of non-relevant stuff in the examples you listed.  Why don't you at least remove all the cruft like the defunct/non-existent/out-of-data searches such as FGRP3, BRP5, BRP6, BRP4G that would confuse the person you are trying to help?

Also, you should point out that to run multiple concurrent GPU tasks, there is a requirement for a minimum amount of GPU RAM - 2GB to run 2 tasks, for example.  With sufficient RAM, by far the easiest way is to use the project preference setting for "GPU utilization factor of FGRP apps" (since FGRPB1G is the only available GPU search at the moment), rather than all the potential complication of getting the syntax right and correctly positioning an app_config.xml file.

So, unless individual (different) customization of multiple machines in the one 'location' were required, just change the GPU utilization factor from '1' to '0.5' and save the change.  The change to running 2 tasks simultaneously will then occur after new work is fetched (NOT by just an 'update').  To achieve this immediately, just increase the work cache setting slightly so the BOINC client will need to do a work fetch.

 

While "some" of the old apps, (BRP6), have NOT had any resent resends; I HAVE received BRP4G resends...  ALL the apps I have listed in my Win app_config.xml File were given to me by BOINC in an Error MSG when I first experimented here and listed FGRPB1G simply as FGRP in the config file...  BOINC came back stating CLEARLY that the KNOWN listed apps ARE exactly as I have them in the Win app_config.xml File.

Likewise for the MAC app_config.xml File.  Until Einstein Servers REMOVE the old app data completely; I'm more inclined to keep ALL the apps listed therein for the occasional resend.

 

TL

TimeLord04
Have TARDIS, will travel...
Come along K-9!
Join SETI Refugees

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5872
Credit: 117517076926
RAC: 35368669

TimeLord04 wrote:... Until

TimeLord04 wrote:
... Until Einstein Servers REMOVE the old app data completely; I'm more inclined to keep ALL the apps listed therein for the occasional resend.

But that's really the point.  If you look at the server status page you will find no trace of FGRP3, BRP5, BRP6, as they've long gone and have already been removed.  There's absolutely no point in having them still listed in app_config.xml, particularly if you're trying to help someone wanting to get concurrent tasks going for FGRPB1G.  After all, that's what the query was about.

Good luck with expecting BRP4G resends :-).  There are just 6 pending tasks left in the entire database.  That means there are probably just 6 tasks out there that, sometime in the two week period, might actually get returned.  Of course, one or two might even fail (yet again) so, yes, it is theoretically possible that there still could be a further BRP4G resend or two.  I wouldn't like your chances of actually getting one though :-).

I'm not at all having a go at you.  I think it's great for people to help others less knowledgeable than themselves.  I'm just asking to try to keep it simple and relevant as possible so that an inexperienced person doesn't immediately get bamboozled by unnecessary complexity.  What might work for you and for previous searches with different needs could be a disaster for someone just needing something current.  In this case if you really wanted to promote the use of app_config.xml to run 2 tasks concurrently, all you really should suggest for NVIDIA GPUs would be:-

<app_config>
    <app>
        <name>hsgamma_FGRPB1G</name>
        <gpu_versions>
            <gpu_usage>.5</gpu_usage>
            <cpu_usage>1</cpu_usage>
        </gpu_versions>
    </app>
</app_config>

Normally, this would be quite appropriate.  However, because CElliott has two hosts, one showing as having [3] 2GB GTX760s and the other showing as [3] 4GB GTX970s, and both hosts only having 4 CPU cores each, it's important to ask if what BOINC is reporting is the true state of affairs.  BOINC has a habit of listing the most competent GPU only and implying all three are the same so, really, the 2nd and 3rd GPU in each host could be lesser GPUs not capable of doing 2 tasks simultaneously.  Even if all GPUs were as reported, there would still be a problem because 4 CPU cores may not be sufficient to support 6 GPU tasks attempting to run on each host.

We really should be asking CElliott to spell out exactly what he has in each host so that a workable solution can be suggested.

Cheers,
Gary.

ravenigma
ravenigma
Joined: 20 Aug 10
Posts: 69
Credit: 80558758
RAC: 320

Woke up this morning to find

Woke up this morning to find a large number (66) of tasks failed immediately over night. They seem to all have the following in the stderr:

boinc_get_opencl_ids returned [0000000000000000 , 0000000000000000] 
Failed to get OpenCL platform/device info from BOINC (error: -1)!
initialize_ocl(): Got no suitable OpenCL device information from BOINC - boincPlatformId is NULL - boincDeviceId is NULL
initialize_ocl returned error [2004]
OCL context null
OCL queue null
Error generating generic FFT context object [5]
06:58:12 (10320): [CRITICAL]: ERROR: MAIN() returned with error '5'
TimeLord04
TimeLord04
Joined: 8 Sep 06
Posts: 1442
Credit: 72378840
RAC: 0

Gary Roberts wrote:TimeLord04

Gary Roberts wrote:
TimeLord04 wrote:
... Until Einstein Servers REMOVE the old app data completely; I'm more inclined to keep ALL the apps listed therein for the occasional resend.

But that's really the point.  If you look at the server status page you will find no trace of FGRP3, BRP5, BRP6, as they've long gone and have already been removed.  There's absolutely no point in having them still listed in app_config.xml, particularly if you're trying to help someone wanting to get concurrent tasks going for FGRPB1G.  After all, that's what the query was about.

Good luck with expecting BRP4G resends :-).  There are just 6 pending tasks left in the entire database.  That means there are probably just 6 tasks out there that, sometime in the two week period, might actually get returned.  Of course, one or two might even fail (yet again) so, yes, it is theoretically possible that there still could be a further BRP4G resend or two.  I wouldn't like your chances of actually getting one though :-).

I'm not at all having a go at you.  I think it's great for people to help others less knowledgeable than themselves.  I'm just asking to try to keep it simple and relevant as possible so that an inexperienced person doesn't immediately get bamboozled by unnecessary complexity.  What might work for you and for previous searches with different needs could be a disaster for someone just needing something current.  In this case if you really wanted to promote the use of app_config.xml to run 2 tasks concurrently, all you really should suggest for NVIDIA GPUs would be:-

<app_config>
    <app>
        <name>hsgamma_FGRPB1G</name>
        <gpu_versions>
            <gpu_usage>.5</gpu_usage>
            <cpu_usage>1</cpu_usage>
        </gpu_versions>
    </app>
</app_config>

Normally, this would be quite appropriate.  However, because CElliott has two hosts, one showing as having [3] 2GB GTX760s and the other showing as [3] 4GB GTX970s, and both hosts only having 4 CPU cores each, it's important to ask if what BOINC is reporting is the true state of affairs.  BOINC has a habit of listing the most competent GPU only and implying all three are the same so, really, the 2nd and 3rd GPU in each host could be lesser GPUs not capable of doing 2 tasks simultaneously.  Even if all GPUs were as reported, there would still be a problem because 4 CPU cores may not be sufficient to support 6 GPU tasks attempting to run on each host.

We really should be asking CElliott to spell out exactly what he has in each host so that a workable solution can be suggested.

If his systems are Quad Core CPUs AND he has THREE GPUs in each system, AND he wants to crunch TWO Units at a time on each card, then he would STILL NEED the "max_concurrent" line in his proposed app_config.xml file; and it would look like this:

<app_config>
<app>
<name>hsgamma_FGRPB1G</name>
<max_concurrent>6</max_concurrent>
<gpu_versions>
<gpu_usage>.5</gpu_usage>
<cpu_usage>.5</cpu_usage>
</gpu_versions>
</app>
</app_config>

 

Without the "max_concurrent" line, BOINC will SEE all three GPUs in each system, BUT ONLY utilize the FIRST GPU card.  I know this from experience when I installed the second GTX-750TI SC card in the MAC.  With a Quad Core CPU running three GPU cards and feeding TWO tasks at a time to each card, the CPU would still have ONE Core free to the system, and THREE Cores feeding all the GPUs on each system with the "max_concurrent" line I have in my example, above.

As to BRP4G resends, I received one on each of my systems about two weeks ago...  So, it can, (and does), happen...  So, fine, let him choose how he wants to set up his own app_config.xml File.  He can choose my complete example, (a few posts back), or the new one I have just posted a few lines above here...

[EDIT:]

Also, WITHOUT the complete app_config.xml File, (a few posts back), should a resend of another App Type come in, it will hit his system requiring 1 CPU Core and 1 GPU Core...  With my complete app_config.xml File, the Max Core Usage is 0.50...  (Until a new App Type is released, and THEN that App just needs to be added to the app_config.xml File...)

 

TL

TimeLord04
Have TARDIS, will travel...
Come along K-9!
Join SETI Refugees

solling2
solling2
Joined: 20 Nov 14
Posts: 219
Credit: 1577481319
RAC: 18694

Matt_145 wrote:Woke up this

Matt_145 wrote:
Woke up this morning to find a large number (66) of tasks failed immediately over night. They seem to all have the following in the stderr:

Did your machine possibly have an overnight automatic update?

ravenigma
ravenigma
Joined: 20 Aug 10
Posts: 69
Credit: 80558758
RAC: 320

solling2 wrote:Matt_145

solling2 wrote:
Matt_145 wrote:
Woke up this morning to find a large number (66) of tasks failed immediately over night. They seem to all have the following in the stderr:

Did your machine possibly have an overnight automatic update?

No, I have Windows Update operating manually only. Windows 7.

walton748
walton748
Joined: 1 Mar 10
Posts: 94
Credit: 1485400192
RAC: 2144288

Matt wroteWoke up this

Matt wrote

Woke up this morning to find a large number (66) of tasks failed immediately over night. They seem to all have the following in the stderr: ...

 

This looks like what happens to me, when suspending a task activeley being worked on or, rarely, on shutting down boinc altogether the NVidia graphics driver fails catastrophically (including short screen blackout) and Windows "restores" it, as Windows messages when the desktop becomes visible again. After that, tasks don't seem to be able to "communicate" to the graphics card any longer, and any gamma ray pulsar searches on gpu already in the working cache also fail. To my observation it is not enough to shutdown boinc altogether and restart it then, to resolve the condition I have to reboot the system. Example task is  here, the task worked upon at the moment the video driver was reset reads differently. I have not had errors like that happening on their own, though, and the examples given are on a system running windows 8.1, the failure has been provoked by me in the course of posting here in response to archae86, not to much notice though.

Did everything come back to order by itself in your case?

Regards,

Walton

 

ravenigma
ravenigma
Joined: 20 Aug 10
Posts: 69
Credit: 80558758
RAC: 320

walton748 wrote:Matt

walton748 wrote:

Matt wrote

Woke up this morning to find a large number (66) of tasks failed immediately over night. They seem to all have the following in the stderr: ...

 

This looks like what happens to me, when suspending a task activeley being worked on or, rarely, on shutting down boinc altogether the NVidia graphics driver fails catastrophically (including short screen blackout) and Windows "restores" it, as Windows messages when the desktop becomes visible again. After that, tasks don't seem to be able to "communicate" to the graphics card any longer, and any gamma ray pulsar searches on gpu already in the working cache also fail. To my observation it is not enough to shutdown boinc altogether and restart it then, to resolve the condition I have to reboot the system. Example task is  here, the task worked upon at the moment the video driver was reset reads differently. I have not had errors like that happening on their own, though, and the examples given are on a system running windows 8.1, the failure has been provoked by me in the course of posting here in response to archae86, not to much notice though.

Did everything come back to order by itself in your case?

Regards,

Walton

 

No, unfortunately. My entire cache of tasks failed and Einstein has put me in "time out" as it won't send me any more for a while due to the large number of errors. Running GPUGrid in the meantime.

As far as your observations on shutting down BOINC or suspending tasks, that isn't what happened here. I left the BOINC running overnight to continue working and I found all the failed tasks when I checked this morning.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.