Einstein FGRPB1G Linux/Nvidia Special app "AIO"

Boca Raton Comm...

Joined: 4 Nov 15

Posts: 240

Credit: 10556415586

RAC: 25133997

Ian&Steve C. wrote: you can

13 Mar 2023 13:41:01 UTC

Message 209666 in response to message 209664

(moderation:

)

Ian&Steve C. wrote:

you can give it a shot. but it also will likely slow down the computation. you'd have to test if the decrease in invalids (if at all) offsets the slower crunch times.

Giving it a try. For testing purposes, I will enable ECC on one of the "twin" 4090 systems, and then leave the other with ECC off. We should be able to see a nice comparison of time difference(s) and error rate(s).

DF1DX

Joined: 14 Aug 10

Posts: 105

Credit: 3852226854

RAC: 4875122

With my 4090 I am currently

14 Mar 2023 11:19:14 UTC

Message 209710

(moderation:

)

With my 4090 I am currently testing using a reduced gpu clock of 1900 MHz and memory clock -500 MHz. Power limit at 200 W results in a gpu utilization of 90%.

The error rate drops from almost 20% to about 10%. Still a bit too high...

One WU takes ~80 s.

Boca Raton Comm...

Joined: 4 Nov 15

Posts: 240

Credit: 10556415586

RAC: 25133997

DF1DX wrote: With my 4090 I

14 Mar 2023 12:06:29 UTC

Message 209713 in response to message 209710

(moderation:

)

DF1DX wrote:

With my 4090 I am currently testing using a reduced gpu clock of 1900 MHz and memory clock -500 MHz. Power limit at 200 W results in a gpu utilization of 90%.

The error rate drops from almost 20% to about 10%. Still a bit too high...

One WU takes ~80 s.

That is really interesting. It was 20% when running at 100%?

Tom M

Joined: 2 Feb 06

Posts: 6439

Credit: 9568827128

RAC: 8561093

I am running a two GPU system

14 Mar 2023 17:33:26 UTC

Message 209723

(moderation:

)

I am running a two GPU system here. And I keep noticing after a period of time that the 2nd GPU listed (aka 01) keeps ending up with the percentages calculated equal.

My understanding is for best production the tasks on the GPU should be staggered apart. Is there anything I can tinker with to encourage that one GPU to stop converging on the two different tasks processing?

Tom M

A Proud member of the O.F.A. (Old Farts Association). Be well, do good work, and keep in touch.® (Garrison Keillor) I want some more patience. RIGHT NOW!

Keith Myers

Joined: 11 Feb 11

Posts: 4964

Credit: 18716802384

RAC: 6385937

Stagger the tasks by pausing

14 Mar 2023 17:54:29 UTC

Message 209725

(moderation:

)

Stagger the tasks by pausing one for a minute before resuming.

Run other projects concurrently sharing the gpus.

DF1DX

Joined: 14 Aug 10

Posts: 105

Credit: 3852226854

RAC: 4875122

Boca Raton Community HS

14 Mar 2023 18:01:17 UTC

Message 209726 in response to message 209713

(moderation:

)

Boca Raton Community HS wrote:

That is really interesting. It was 20% when running at 100%?

Yes, up to 20 %. The errors become less on my host when only one wu is running, currently about 10%.

The parameters of the aio application in the file EAH_SLEEP are IMHO mainly used the last calculation phase, the result sorting.

i have not had a single "error while computing", only "marked as invalid".

Linux Mint 21.1 Xfce, Driver 525.85.05, AMD 3700X, X570 Aorus Ultra.

Very strange. i have no problems at all with this card on Primegrid, Folding@home and Asteroids.

Boca Raton Comm...

Joined: 4 Nov 15

Posts: 240

Credit: 10556415586

RAC: 25133997

DF1DX wrote: Boca Raton

14 Mar 2023 19:03:07 UTC

Message 209729 in response to message 209726

(moderation:

)

DF1DX wrote:

Boca Raton Community HS wrote:

That is really interesting. It was 20% when running at 100%?

Yes, up to 20 %. The errors become less on my host when only one wu is running, currently about 10%.

The parameters of the aio application in the file EAH_SLEEP are IMHO mainly used the last calculation phase, the result sorting.

i have not had a single "error while computing", only "marked as invalid".

Linux Mint 21.1 Xfce, Driver 525.85.05, AMD 3700X, X570 Aorus Ultra.

Very strange. i have no problems at all with this card on Primegrid, Folding@home and Asteroids.

How many are you running concurrently, just out of curiosity?

I have about a 10% invalid rate on our two 4090 systems. They are running three concurrently, and at 100%. I get all "errors" when I adjust ANY of the speeds (slow down or speed up), really no matter what.

Threadripper 2970WX, driver: 525.85, Linux Mint 21.1

What model of 4090 is it?

Ian&Steve C.

Joined: 19 Jan 20

Posts: 3945

Credit: 46766702642

RAC: 64040800

While the EAH_SLEEP file has

14 Mar 2023 19:16:48 UTC

Message 209730

(moderation:

)

While the EAH_SLEEP file has some kernel tuning parameters, there are additional optimizations in the .alt FFt files, as well as optimizations baked into the source code itself and even with compilation arguments when the app is built.

you can try running the app without the .alt files (just move them somewhere else) to see if those impact invalid rates, but the app will run slower as a result.

you could also even run the stock gamma ray app (remove your app_info.xml file). Again this will run much slower, but you could at least check the invalid ratio. It’s possible that it’s something even in the default code from Einstein which doesn’t play well with the 40-series hardware.

just wanted to stress that Petri did all development on his personal system, and only had access to Volta/Turing/Ampere cards to check the behavior and performance. 40-series was not even released yet. Petri stopped development of this app before 40-series was released. FGRPB1G’s days are numbered, and petri doesn’t seem interested in revisiting this app with limited life. Enjoy gamma ray while it lasts and move to BRP7 when it’s gone.

_________________________________________________________________________

Boca Raton Comm...

Joined: 4 Nov 15

Posts: 240

Credit: 10556415586

RAC: 25133997

Ian&Steve C. wrote: While

14 Mar 2023 19:22:05 UTC

Message 209732 in response to message 209730

(moderation:

)

Ian&Steve C. wrote:

While the EAH_SLEEP file has some kernel tuning parameters, there are additional optimizations in the .alt FFt files, as well as optimizations baked into the source code itself and even with compilation arguments when the app is built.

you can try running the app without the .alt files (just move them somewhere else) to see if those impact invalid rates, but the app will run slower as a result.

you could also even run the stock gamma ray app (remove your app_info.xml file). Again this will run much slower, but you could at least check the invalid ratio. It’s possible that it’s something even in the default code from Einstein which doesn’t play well with the 40-series hardware.

just wanted to stress that Petri did all development on his personal system, and only had access to Volta/Turing/Ampere cards to check the behavior and performance. 40-series was not even released yet. Petri stopped development of this app before 40-series was released. FGRPB1G’s days are numbered, and petri doesn’t seem interested in revisiting this app with limited life. Enjoy gamma ray while it lasts and move to BRP7 when it’s gone.

For sure- I am incredibly impressed by the app and what Petri did- it really is amazing! I am just in the mindset of constant improvement and like to tinker to see the impact.

stiwi

Joined: 16 Jun 12

Posts: 3

Credit: 30631068

RAC: 0

Strange a few month ago

13 May 2023 2:36:12 UTC

Message 212462

(moderation:

)

Strange a few month ago everything works fine but now all tasks failed immediately

<core_client_version>7.20.5</core_client_version>
<![CDATA[
<message>
process exited with code 11 (0xb, -245)</message>
<stderr_txt>
03:32:51 (3888): [normal]: This Einstein@home App (v1.0 by petri33) was built at: Apr 28 2022 18:47:15
03:32:51 (3888): [normal]: Start of BOINC application '../../projects/einstein.phys.uwm.edu/HSgammaPulsar_x86_64-pc-linux-gnu-opencl_v1.0'.
03:32:51 (3888): [debug]: 1e+16 fp, 7.2e+09 fp/s, 1452987 s, 403h36m27s13
03:32:51 (3888): [normal]: % CPU usage: 1.000000, GPU usage: 1.000000
command line: ../../projects/einstein.phys.uwm.edu/HSgammaPulsar_x86_64-pc-linux-gnu-opencl_v1.0 --inputfile ../../projects/einstein.phys.uwm.edu/LATeah4021L07.dat --alpha 0.943218186562 --delta 1.30995332125 --skyRadius 8.726650e-08 --ldiBins 30 --f0start 892.0 --f0Band 8.0 --firstSkyPoint 0 --numSkyPoints 1 --f1dot -1e-13 --f1dotBand 1e-13 --df1dot 1.413729381e-15 --ephemdir ../../projects/einstein.phys.uwm.edu/JPLEPH --Tcoh 2097152.0 --toplist 10 --cohFollow 10 --numCells 1 --useWeights 1 --Srefinement 1 --CohSkyRef 1 --cohfullskybox 1 --mmfu 0.1 --reftime 56100 --model 0 --f0orbit 0.005 --mismatch 0.1 --demodbinary 1 --BinaryPointFile ../../projects/einstein.phys.uwm.edu/templates_LATeah4021L07_0900_9781653.dat --debug 0 -o LATeah4021L07_900.0_0_0.0_9781653_1_0.out
output files: 'LATeah4021L07_900.0_0_0.0_9781653_1_0.out' '../../projects/einstein.phys.uwm.edu/LATeah4021L07_900.0_0_0.0_9781653_1_0' 'LATeah4021L07_900.0_0_0.0_9781653_1_0.out.cohfu' '../../projects/einstein.phys.uwm.edu/LATeah4021L07_900.0_0_0.0_9781653_1_1'
03:32:51 (3888): [debug]: Flags: X64 SSE SSE2 GNUC X86 GNUX86
03:32:51 (3888): [debug]: glibc version/release: 2.37/stable
03:32:51 (3888): [debug]: Set up communication with graphics process.
Eah sleep false, -1
boinc_get_opencl_ids returned [0x559abcc7ce20 , 0x559abcc7e6e0]
Using OpenCL platform provided by: NVIDIA Corporation
Using OpenCL device "NVIDIA GeForce RTX 2080 Ti" by: NVIDIA Corporation
Max allocation limit: 2884550656
Global mem size: 11538202624
OpenCL device has FP64 support
20 warnings generated.
SemiCoh mode 0 start
skypoints(1)read_checkpoint(): Couldn't open file 'LATeah4021L07_900.0_0_0.0_9781653_1_0.out.cpt': No such file or directory (2)
skypoint loop(1)
S0:dpleph[initephem]: Cannot open file .405, result = 104
dpleph[state]: Time 2454683.289515 outside range of ephemeris
dpleph[state]: Time 2454683.289515 outside range of ephemeris

-- signal handler called: signal 1
9 stack frames obtained for this thread:

End of stcaktrace
03:32:52 (3888): called boinc_finish(11)

</stderr_txt>
]]>

Has anyone an idea whats wrong? Standard Einstein App works fine.

Einstein FGRPB1G Linux/Nvidia Special app "AIO"

Forums › Cruncher's Corner

Comment viewing options

Forums › Cruncher's Corner