Einstein FGRPB1G Linux/Nvidia Special app "AIO"

Tom M
Tom M
Joined: 2 Feb 06
Posts: 5657
Credit: 7737295473
RAC: 2495904

The brp7 Meerkat app got the

The brp7 Meerkat app got the petri "treatment". And if the latest multi-directional gpu becomes the mainstay like brp7 looked like it was going to, I expect it will get optimized for Nvidia too.

Tom M

A Proud member of the O.F.A.  (Old Farts Association).  Be well, do good work, and keep in touch.® (Garrison Keillor)

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5848
Credit: 110007149324
RAC: 24631069

In testing a refurbished

In testing a refurbished machine of mine (stock app - AMD HD7850 GPU), I noticed an early invalid task listed in this workunit.  The quorum contains 4 tasks in total, 3 for nvidia GPUs and my sole AMD.  Mine was the first resend (_2 task) and the quorum was eventually completed using the second one (_3).

I noticed that the _3 task used the anonymous platform so that implied the AIO app.  The host shows as having an Intel 06/55 CPU (whatever that is) and 112 processors - presumably to override daily task limit issues.  The GPU is an RTX 4090 showing only 4095MB VRAM - seems too low, perhaps misreported by BOINC?

Out of interest, I had a quick look at the full tasks list for that machine and saw that (at the time I looked) there were 1466 pendings, 5009 valids, 588 invalids along with 3 errors.  The high invalids ratio (~1 for every 8.5 valid) quite surprised me.

I'm wondering if other users of the AIO app also get high invalid rates or if the above example is some sort of other issue?  I don't have any nvidia GPUs so I haven't been paying much attention to AIO app results.  My hosts using GCN generation GPUs tend to get about 1 invalid for something like 100 valid.  My host under test has the above single invalid along with 86 valid and 49 pending and no errors, so it looks OK at the moment.

Cheers,
Gary.

GWGeorge007
GWGeorge007
Joined: 8 Jan 18
Posts: 2822
Credit: 4628455529
RAC: 3625461

Gary Roberts

Gary Roberts wrote:

...snip...  The GPU is an RTX 4090 showing only 4095MB VRAM - seems too low, perhaps misreported by BOINC?

Hi Gary,

This I can answer because I had the same issue with my NVIDIA GPUs as Anonymous does with his RTX 4090.

The reported 4095MB VRAM is a default memory for NVIDIA GPUs by BOINC.  With a "special sauce" app from our own Petri (GPUUG member) he would then be showing the full amount of VRAM.

As for the remainder of your post, I'll leave that for someone with more knowledgeable experience than me.  I wouldn't want to steer you astray.

George

Proud member of the Old Farts Association

Keith Myers
Keith Myers
Joined: 11 Feb 11
Posts: 4753
Credit: 17697118508
RAC: 5561384

If a user upgrades BOINC to

If a user upgrades BOINC to the 7.20 branch it incorporated the code fix commit for Client: use cuDeviceTotalMem_v2() if available to get >4GB mem size for NVIDIA GPUs #4757

BOINC commits for branch 7.20

Fix for not reporting more than 4GB on Nvidia cards in Linux committed on June 5, 2022.

 

Ian&Steve C.
Ian&Steve C.
Joined: 19 Jan 20
Posts: 3713
Credit: 34660199759
RAC: 29957967

Gary Roberts wrote: In

Gary Roberts wrote:

In testing a refurbished machine of mine (stock app - AMD HD7850 GPU), I noticed an early invalid task listed in this workunit.  The quorum contains 4 tasks in total, 3 for nvidia GPUs and my sole AMD.  Mine was the first resend (_2 task) and the quorum was eventually completed using the second one (_3).

I noticed that the _3 task used the anonymous platform so that implied the AIO app.  The host shows as having an Intel 06/55 CPU (whatever that is) and 112 processors - presumably to override daily task limit issues.  The GPU is an RTX 4090 showing only 4095MB VRAM - seems too low, perhaps misreported by BOINC?

Out of interest, I had a quick look at the full tasks list for that machine and saw that (at the time I looked) there were 1466 pendings, 5009 valids, 588 invalids along with 3 errors.  The high invalids ratio (~1 for every 8.5 valid) quite surprised me.

I'm wondering if other users of the AIO app also get high invalid rates or if the above example is some sort of other issue?  I don't have any nvidia GPUs so I haven't been paying much attention to AIO app results.  My hosts using GCN generation GPUs tend to get about 1 invalid for something like 100 valid.  My host under test has the above single invalid along with 86 valid and 49 pending and no errors, so it looks OK at the moment.

the CPU is most likely an engineering sample and an undoubtedly real number of threads. It’s either a single socket 56-core part or 2x 28-core part. Either is just as likely, but nothing to do with getting more work. 
 

BOINC reporting 4GB on Nvidia is normal for old versions of BOINC. Before 7.20 as Keith pointed out. It’s a bug in BOINC that was fixed only recently. It’s because BOINC is using an Nvidia api call that responds with a 32-bit value, which can only represent up to 4GB. This never affected AMD. 
 

that user’s invalid rate is unusually high. My systems see ~4% invalid. And that’s what most people see, and also comparable to the stock v1.28 Nvidia app. I’m not sure if his invalid rate is due to his 4090 somehow, or maybe something else wrong with their system. FYI, I’m 99% sure that system along with the other 4090 system in the leaderboard belongs to user Trotador. 

_________________________________________________________________________

Boca Raton Community HS
Boca Raton Comm...
Joined: 4 Nov 15
Posts: 216
Credit: 8432075445
RAC: 194271

I am stuck- I followed the

I am stuck- I followed the steps outlined in the first post exactly- restarted BOINC and these tasks immediately had a computational error. 

https://einsteinathome.org/host/13125265/tasks/6/0

Any ideas? I tried using the 1.0 version.

I think they are all the same error:

Stderr output

<core_client_version>7.22.0</core_client_version>
<![CDATA[
<message>
process exited with code 13 (0xd, -243)</message>
<stderr_txt>
Process creation (../../projects/einstein.phys.uwm.edu/HSgammaPulsar_x86_64-pc-linux-gnu-opencl_v1.0) failed: Error -1, errno=13
execv: Permission denied

</stderr_txt>
]]>



Ian&Steve C.
Ian&Steve C.
Joined: 19 Jan 20
Posts: 3713
Credit: 34660199759
RAC: 29957967

you don't have proper

you don't have proper executable permissions set on the application binary. this can happen if you've moved the file across a network drive or with a USB drive or something. my package has the execute permission already set, it only gets wiped if you move it across drives after extracting the archive.

right click the file, click Properties, go to the permissions tab, there should be a checkbox for something like "allow running file as a program"

if that's not there, then you can use the command line. make sure you are in the same path as the file when executing this command.

chmod +x HSgammaPulsar_x86_64-pc-linux-gnu-opencl_v1.0

_________________________________________________________________________

Boca Raton Community HS
Boca Raton Comm...
Joined: 4 Nov 15
Posts: 216
Credit: 8432075445
RAC: 194271

Ian&Steve C. wrote: you

Ian&Steve C. wrote:

you don't have proper executable permissions set on the application binary. this can happen if you've moved the file across a network drive or with a USB drive or something. my package has the execute permission already set, it only gets wiped if you move it across drives after extracting the archive.

right click the file, click Properties, go to the permissions tab, there should be a checkbox for something like "allow running file as a program"

if that's not there, then you can use the command line. make sure you are in the same path as the file when executing this command.

chmod +x HSgammaPulsar_x86_64-pc-linux-gnu-opencl_v1.0

 

Just to make sure (since I did download the file on a seperate PC then flash drive to the new one), I downloaded the file directly to this PC where I want to run the app. I then deleted the "old" files and replaced them with these.

The checkbox is there- I made sure it was "checked". The owner (me), group, and others have read and write access.

Once I did this, I received the following error:

<core_client_version>7.22.0</core_client_version>
<![CDATA[
<message>
couldn't start app: Input file hsgamma_FGRPB1G_1.28_x86_64-pc-linux-gnu__FGRPopencl2Pup-nvidia missing or invalid: file missing</message>
]]>

Isn't this the old executable that I replaced with the new file (which I did do)?

Ian&Steve C.
Ian&Steve C.
Joined: 19 Jan 20
Posts: 3713
Credit: 34660199759
RAC: 29957967

please post the contents of

please post the contents of your app_info.xml file. is it still there? did you restart boinc?

 

also make sure the file names are correct. the special app should start with capital 'HS'

_________________________________________________________________________

Boca Raton Community HS
Boca Raton Comm...
Joined: 4 Nov 15
Posts: 216
Credit: 8432075445
RAC: 194271

Ian&Steve C. wrote: please

Ian&Steve C. wrote:

please post the contents of your app_info.xml file. is it still there? did you restart boinc?

 

I did restart it- I made sure I was only making changes when BOINC was closed. Also, I am not in time out, so it will be hard to know if it works (the work units error out immediately, so I try to suspend as fast as possible, but I hit my quota).

 

<app_info>
  <file>
    <name>HSgammaPulsar_x86_64-pc-linux-gnu-opencl_v1.0</name>
    <executable/>
  </file>
 
  <app>
    <name>hsgamma_FGRPB1G</name>
    <user_friendly_name>Gamma-ray pulsar binary search #1 on GPUs</user_friendly_name>
    <non_cpu_intensive>0</non_cpu_intensive>
  </app>
 
  <app_version>
    <app_name>hsgamma_FGRPB1G</app_name>
    <version_num>128</version_num>
    <platform>x86_64-pc-linux-gnu</platform>
    <avg_ncpus>1.00</avg_ncpus>
    <plan_class>FGRPopencl2Pup-nvidia</plan_class>
    <api_version>7.17.0</api_version>
    <file_ref>
      <file_name>HSgammaPulsar_x86_64-pc-linux-gnu-opencl_v1.0</file_name>
      <main_program/>
    </file_ref>
    <coproc>
      <type>NVIDIA</type>
      <count>1</count>
    </coproc>
  </app_version>
 
</app_info>

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.