Einstein FGRPB1G Linux/Nvidia Special app "AIO"

biodoc
biodoc
Joined: 30 Aug 09
Posts: 1
Credit: 1884097542
RAC: 2207362

Thanks Ian and Petri. The

Thanks Ian and Petri. The v0.95 app is up and running on my 3 gpus.  Well done! 

tito
tito
Joined: 10 Jun 06
Posts: 28
Credit: 1249920965
RAC: 662974

Ian&Steve C. wrote: just

Ian&Steve C. wrote:

just keep an eye on your error rate. if you see a lot more errors, it might make sense to swap over to the 0.95 app.

Nothing unusual so far. Will keep an eye on error ratio.

gordonbb
gordonbb
Joined: 14 May 19
Posts: 26
Credit: 895570568
RAC: 0

Thanks @petri33 & @Ian&Steve

Thanks @petri33 & @Ian&Steve C. for making this available. I'm fairly new at BOINC so I don't quite get the nuances of configuring the Anonymous Platform.

Though I tried to set:

    <coproc><br />
      <type>NVIDIA</type><br />
      <count>0.5</count><br />
    </coproc>

in app_info.xml on systems with 8GB VRAM once I reloaded boinc-client it would fail the second Task with a Computational Error:

</p>

<pre>
<core_client_version>7.18.1</core_client_version>
<![CDATA[
<message>
process exited with code 65 (0x41, -191)</message>
<stderr_txt>
13:29:23 (3697396): [normal]: This Einstein@home App (v1.0 by petri33) was built at: Apr 28 2022 18:47:15

13:29:23 (3697396): [normal]: Start of BOINC application '../../projects/einstein.phys.uwm.edu/HSgammaPulsar_x86_64-pc-linux-gnu-opencl_v1.0'.
13:29:23 (3697396): [debug]: 1e+16 fp, 6.4e+09 fp/s, 1647640 s, 457h40m40s30
13:29:23 (3697396): [normal]: % CPU usage: 1.000000, GPU usage: 0.500000
command line: ../../projects/einstein.phys.uwm.edu/HSgammaPulsar_x86_64-pc-linux-gnu-opencl_v1.0 --inputfile ../../projects/einstein.phys.uwm.edu/LATeah3012L11.dat --alpha 2.59819959601 --delta -0.694603692878 --skyRadius 1.890770e-06 --ldiBins 15 --f0start 764.0 --f0Band 8.0 --firstSkyPoint 0 --numSkyPoints 1 --f1dot -1e-13 --f1dotBand 1e-13 --df1dot 1.69860773e-15 --ephemdir ../../projects/einstein.phys.uwm.edu/JPLEPH --Tcoh 2097152.0 --toplist 10 --cohFollow 10 --numCells 1 --useWeights 1 --Srefinement 1 --CohSkyRef 1 --cohfullskybox 1 --mmfu 0.1 --reftime 56100 --model 0 --f0orbit 0.005 --mismatch 0.1 --demodbinary 1 --BinaryPointFile ../../projects/einstein.phys.uwm.edu/templates_LATeah3012L11_0772_33406920.dat --debug 0 -o LATeah3012L11_772.0_0_0.0_33406920_0_0.out
output files: 'LATeah3012L11_772.0_0_0.0_33406920_0_0.out' '../../projects/einstein.phys.uwm.edu/LATeah3012L11_772.0_0_0.0_33406920_0_0' 'LATeah3012L11_772.0_0_0.0_33406920_0_0.out.cohfu' '../../projects/einstein.phys.uwm.edu/LATeah3012L11_772.0_0_0.0_33406920_0_1'
13:29:23 (3697396): [debug]: Flags: X64 SSE SSE2 GNUC X86 GNUX86
13:29:23 (3697396): [debug]: glibc version/release: 2.35/stable
13:29:23 (3697396): [debug]: Set up communication with graphics process.
EAH_SLEEP file found, value 0

kernel_compact 256 threads
kernel_raz 256 threads
kernel_ts_2_phase_diff_sorted 64 threads
kernel_prepare_power_toplist 256 threads
kernel_prepareSort 1024 threads
kernel_SortedPhoton 64 threads
kernel_setupPhotonPairsArray 64 threads
kernel_extractPhotonIndex 512 threads
Eah sleep true, 0
boinc_get_opencl_ids returned [0x55efcd653c40 , 0x55efcd649af0] 
Using OpenCL platform provided by: NVIDIA Corporation
Using OpenCL device "NVIDIA GeForce GTX 1070 Ti" by: NVIDIA Corporation
Max allocation limit: 2127691776
Global mem size: 8510767104
Could not open file: /tmp/dep-b9eb4b.d
OpenCL device has FP64 support
Could not open file: /tmp/dep-272e61.d
SemiCoh mode 0 start
skypoints(1)read_checkpoint(): Couldn't open file 'LATeah3012L11_772.0_0_0.0_33406920_0_0.out.cpt': No such file or directory (2)
skypoint loop(1)
S0:
binpoints loop 639
set_up_fft samples:16777216
% fft length: 16777216(0x1000000)
Using alternate fft kernel file: ../../clfft.kernel.Transpose2.cl.alt
Could not open file: /tmp/dep-a7ca6a.d
Using alternate fft kernel file: ../../clfft.kernel.Stockham3.cl.alt
Could not open file: /tmp/dep-ee75f4.d
Using alternate fft kernel file: ../../clfft.kernel.Transpose4.cl.alt
Could not open file: /tmp/dep-3d35bf.d
Using alternate fft kernel file: ../../clfft.kernel.Stockham5.cl.alt
Could not open file: /tmp/dep-e68c2c.d
Using alternate fft kernel file: ../../clfft.kernel.Transpose6.cl.alt
Could not open file: /tmp/dep-743a39.d
% Scratch buffer size: 136314880
ZError in OpenCL context: Unknown error executing clFlush on NVIDIA GeForce GTX 1070 Ti (Device 0).

... {above repeated many times } ...
Failed to allocate tmp buffer for photon data
13:29:28 (3697396): [CRITICAL]: ERROR: MAIN() returned with error '1'
FPU status flags: 
mv: cannot stat 'LATeah3012L11_772.0_0_0.0_33406920_0_0.out': No such file or directory
mv: cannot stat 'LATeah3012L11_772.0_0_0.0_33406920_0_0.out.cohfu': No such file or directory
13:29:28 (3697396): [normal]: done. calling boinc_finish(65).
13:29:28 (3697396): called boinc_finish(65)
Warning:  Program terminating, but clFFT resources not freed.
Please consider explicitly calling clfftTeardown( ).

</stderr_txt>
]]></pre>

<pre>

Still, running just 1 Task/GPU I'm seeing a 45% decrease in time compared to the stock application (EVGA 1070ti @ 90W; Ubuntu 22.04 LTS; NVIDIA 510.73.05) and similar gains on a 2060, 2060 Super and a 1660ti.
 

Keith Myers
Keith Myers
Joined: 11 Feb 11
Posts: 4969
Credit: 18768263734
RAC: 7090329

You don't make the change in

You don't make the change in task concurrence in the coproc_info.xml file.  That file is autogenerated by the client detection of the system gpus. It is not meant to be tampered with by the user.

You make the change to crunch multiple tasks concurrently on the gpu either at the projects Computing Preferences settings or in an app_info.xml file which needs to be written by the user.

So change either here:

Project Preferences >> GPU utilization factor of FGRP apps: 1.00 >> 0.50

or here:

<app_config>
  <app>
    <name>hsgamma_FGRPB1G</name>
      <gpu_versions>
          <gpu_usage>0.5</gpu_usage>
          <cpu_usage>0.9</cpu_usage>
      </gpu_versions>
  </app>
</app_config>
 

The app_config.xml file goes into the project directory >> einstein.phys.uwm.edu

Choose one or the other method.  Not both.  Project Preferences is the easiest.

 

Ian&Steve C.
Ian&Steve C.
Joined: 19 Jan 20
Posts: 3965
Credit: 47194212642
RAC: 65398519

yeah use an app_config to do

Keith, that coproc section he posted is actually from the app_info file, not the coproc_info file.

but yeah, use an app_config to do it.

_________________________________________________________________________

Ian&Steve C.
Ian&Steve C.
Joined: 19 Jan 20
Posts: 3965
Credit: 47194212642
RAC: 65398519

gordonbb wrote: Thanks

gordonbb wrote:

Thanks @petri33 & @Ian&Steve C. for making this available. I'm fairly new at BOINC so I don't quite get the nuances of configuring the Anonymous Platform.

Though I tried to set:

    <coproc><br />
      <type>NVIDIA</type><br />
      <count>0.5</count><br />
    </coproc>

in app_info.xml on systems with 8GB VRAM once I reloaded boinc-client it would fail the second Task with a Computational Error:

</p>

<pre>
<core_client_version>7.18.1</core_client_version>
<![CDATA[
<message>
process exited with code 65 (0x41, -191)</message>
<stderr_txt>
13:29:23 (3697396): [normal]: This Einstein@home App (v1.0 by petri33) was built at: Apr 28 2022 18:47:15

13:29:23 (3697396): [normal]: Start of BOINC application '../../projects/einstein.phys.uwm.edu/HSgammaPulsar_x86_64-pc-linux-gnu-opencl_v1.0'.
13:29:23 (3697396): [debug]: 1e+16 fp, 6.4e+09 fp/s, 1647640 s, 457h40m40s30
13:29:23 (3697396): [normal]: % CPU usage: 1.000000, GPU usage: 0.500000
command line: ../../projects/einstein.phys.uwm.edu/HSgammaPulsar_x86_64-pc-linux-gnu-opencl_v1.0 --inputfile ../../projects/einstein.phys.uwm.edu/LATeah3012L11.dat --alpha 2.59819959601 --delta -0.694603692878 --skyRadius 1.890770e-06 --ldiBins 15 --f0start 764.0 --f0Band 8.0 --firstSkyPoint 0 --numSkyPoints 1 --f1dot -1e-13 --f1dotBand 1e-13 --df1dot 1.69860773e-15 --ephemdir ../../projects/einstein.phys.uwm.edu/JPLEPH --Tcoh 2097152.0 --toplist 10 --cohFollow 10 --numCells 1 --useWeights 1 --Srefinement 1 --CohSkyRef 1 --cohfullskybox 1 --mmfu 0.1 --reftime 56100 --model 0 --f0orbit 0.005 --mismatch 0.1 --demodbinary 1 --BinaryPointFile ../../projects/einstein.phys.uwm.edu/templates_LATeah3012L11_0772_33406920.dat --debug 0 -o LATeah3012L11_772.0_0_0.0_33406920_0_0.out
output files: 'LATeah3012L11_772.0_0_0.0_33406920_0_0.out' '../../projects/einstein.phys.uwm.edu/LATeah3012L11_772.0_0_0.0_33406920_0_0' 'LATeah3012L11_772.0_0_0.0_33406920_0_0.out.cohfu' '../../projects/einstein.phys.uwm.edu/LATeah3012L11_772.0_0_0.0_33406920_0_1'
13:29:23 (3697396): [debug]: Flags: X64 SSE SSE2 GNUC X86 GNUX86
13:29:23 (3697396): [debug]: glibc version/release: 2.35/stable
13:29:23 (3697396): [debug]: Set up communication with graphics process.
EAH_SLEEP file found, value 0

kernel_compact 256 threads
kernel_raz 256 threads
kernel_ts_2_phase_diff_sorted 64 threads
kernel_prepare_power_toplist 256 threads
kernel_prepareSort 1024 threads
kernel_SortedPhoton 64 threads
kernel_setupPhotonPairsArray 64 threads
kernel_extractPhotonIndex 512 threads
Eah sleep true, 0
boinc_get_opencl_ids returned [0x55efcd653c40 , 0x55efcd649af0] 
Using OpenCL platform provided by: NVIDIA Corporation
Using OpenCL device "NVIDIA GeForce GTX 1070 Ti" by: NVIDIA Corporation
Max allocation limit: 2127691776
Global mem size: 8510767104
Could not open file: /tmp/dep-b9eb4b.d
OpenCL device has FP64 support
Could not open file: /tmp/dep-272e61.d
SemiCoh mode 0 start
skypoints(1)read_checkpoint(): Couldn't open file 'LATeah3012L11_772.0_0_0.0_33406920_0_0.out.cpt': No such file or directory (2)
skypoint loop(1)
S0:
binpoints loop 639
set_up_fft samples:16777216
% fft length: 16777216(0x1000000)
Using alternate fft kernel file: ../../clfft.kernel.Transpose2.cl.alt
Could not open file: /tmp/dep-a7ca6a.d
Using alternate fft kernel file: ../../clfft.kernel.Stockham3.cl.alt
Could not open file: /tmp/dep-ee75f4.d
Using alternate fft kernel file: ../../clfft.kernel.Transpose4.cl.alt
Could not open file: /tmp/dep-3d35bf.d
Using alternate fft kernel file: ../../clfft.kernel.Stockham5.cl.alt
Could not open file: /tmp/dep-e68c2c.d
Using alternate fft kernel file: ../../clfft.kernel.Transpose6.cl.alt
Could not open file: /tmp/dep-743a39.d
% Scratch buffer size: 136314880
ZError in OpenCL context: Unknown error executing clFlush on NVIDIA GeForce GTX 1070 Ti (Device 0).

... {above repeated many times } ...
Failed to allocate tmp buffer for photon data
13:29:28 (3697396): [CRITICAL]: ERROR: MAIN() returned with error '1'
FPU status flags: 
mv: cannot stat 'LATeah3012L11_772.0_0_0.0_33406920_0_0.out': No such file or directory
mv: cannot stat 'LATeah3012L11_772.0_0_0.0_33406920_0_0.out.cohfu': No such file or directory
13:29:28 (3697396): [normal]: done. calling boinc_finish(65).
13:29:28 (3697396): called boinc_finish(65)
Warning:  Program terminating, but clFFT resources not freed.
Please consider explicitly calling clfftTeardown( ).

</stderr_txt>
]]></pre>

<pre>

Still, running just 1 Task/GPU I'm seeing a 45% decrease in time compared to the stock application (EVGA 1070ti @ 90W; Ubuntu 22.04 LTS; NVIDIA 510.73.05) and similar gains on a 2060, 2060 Super and a 1660ti.
 

this is the same problem that's popped up for a few folks (mostly Keith) with Ryzen systems.

if you keep getting a lot of errors, you could consider running 2x v0.95 app tasks, which might be faster than 1x v1.0 task.

_________________________________________________________________________

Keith Myers
Keith Myers
Joined: 11 Feb 11
Posts: 4969
Credit: 18768263734
RAC: 7090329

Thanks for the corrections,

Thanks for the corrections, Ian.  I breezed over the post too fast to read what was really going on.

Yes, with the flushing cache errors, you need to back level to the v0.95 version. That stops the errors on my Ryzen hosts.

Still faster than the stock 1.28 application.

 

gordonbb
gordonbb
Joined: 14 May 19
Posts: 26
Credit: 895570568
RAC: 0

Ian&Steve C. wrote: this is

Ian&Steve C. wrote:

this is the same problem that's popped up for a few folks (mostly Keith) with Ryzen systems.

if you keep getting a lot of errors, you could consider running 2x v0.95 app tasks, which might be faster than 1x v1.0 task.

Thank-you @Ian&Steve C. & @Keith Myers.

I'll revert to the 0.95 version (yes, these are Ryzen systems) and put the app_config.xml file that I removed back and give it a try.

gordonbb
gordonbb
Joined: 14 May 19
Posts: 26
Credit: 895570568
RAC: 0

Curious. In my specific

Curious.

In my specific use case: running my GPUs at their lowest Power-Limit (Pascal) or at a reduced graphics clock (Turing), the 1.0 significantly out-performs the 0.95 version to the point that running 1 Task/GPU on the 1.0 version outperforms 2 tasks per GPU on the 0.95 version.

For my 1070Ti, for example, the 1.0 version is 45% faster than the native Application but the 0.95 version is only 24.7% faster with 1 task and 22.2% faster comparing 2 Tasks/GPU.

Keith Myers
Keith Myers
Joined: 11 Feb 11
Posts: 4969
Credit: 18768263734
RAC: 7090329

If I understand correctly,

If I understand correctly, you are able to run the v1.0 application with just a single task per gpu and it doesn't error out?

If so that is a new datapoint for troubleshooting the application on Ryzen systems and 8GB cards.

 

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.