CUDA App einsteinbinary 1.10 for Linux available for Beta Test

Bikeman (Heinz-Bernd Eggenstein)
Bikeman (Heinz-...
Moderator
Joined: 28 Aug 06
Posts: 3522
Credit: 686127184
RAC: 579488

Well, actually I got two

Well, actually I got two error results in a row after weeks of flawless operation. I rebooted, and the next result was OK again. Go figure. This could well be a hardware or driver issue, from what I heard from the developers, it doesn't look like an application bug so far.

CU
Bikeman

Ed1934158
Ed1934158
Joined: 10 Nov 04
Posts: 62
Credit: 14481483
RAC: 0

RE: Well, actually I got

Message 94525 in response to message 94524

Quote:

Well, actually I got two error results in a row after weeks of flawless operation. I rebooted, and the next result was OK again. Go figure. This could well be a hardware or driver issue, from what I heard from the developers, it doesn't look like an application bug so far.

CU
Bikeman


I'm not sure what to think. I returned to gpugrid with 185.18.36 drivers and there everything is working fine. I guess I'll try again when I finish units that I have.

Could you please just tell me how do your processes look like. I have a feeling that nothing is happening while I run einstein@home. For example when running gpugrid (gpu units) and einstein (not GPU units) my processes on 4 processor machine look something like this:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
17988 boinc 39 19 77908 73m 1484 R 97 0.9 36:59.72 einstein_S5R5_1
29305 boinc 39 19 76628 72m 1484 R 97 0.9 351:29.07 einstein_S5R5_1
16354 boinc 39 19 77900 73m 1484 R 87 0.9 70:39.78 einstein_S5R5_1
30222 boinc 39 19 76624 72m 1484 R 52 0.9 332:55.69 einstein_S5R5_1
14624 boinc 30 10 97516 55m 23m S 47 0.7 51:00.66 acemd_6.66_x86_ (gpugrid unit)

And the gpu temperature rises from about 50°C to about 66-68°C. While running einstein@home I can see four processes at 100% and gpu stays at idle temperature (or the difference is within normal oscillations) as if cpu was doing all the work (and of course in the end the error that I was talking about).
Do you have notable temperature increase of gpu temperature?

Bikeman (Heinz-Bernd Eggenstein)
Bikeman (Heinz-...
Moderator
Joined: 28 Aug 06
Posts: 3522
Credit: 686127184
RAC: 579488

RE: Could you please just

Message 94526 in response to message 94525

Quote:


Could you please just tell me how do your processes look like. I have a feeling that nothing is happening while I run einstein@home. For example when running gpugrid (gpu units) and einstein (not GPU units) my processes on 4 processor machine look something like this:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
17988 boinc 39 19 77908 73m 1484 R 97 0.9 36:59.72 einstein_S5R5_1
29305 boinc 39 19 76628 72m 1484 R 97 0.9 351:29.07 einstein_S5R5_1
16354 boinc 39 19 77900 73m 1484 R 87 0.9 70:39.78 einstein_S5R5_1
30222 boinc 39 19 76624 72m 1484 R 52 0.9 332:55.69 einstein_S5R5_1
14624 boinc 30 10 97516 55m 23m S 47 0.7 51:00.66 acemd_6.66_x86_ (gpugrid unit)

And the gpu temperature rises from about 50°C to about 66-68°C. While running einstein@home I can see four processes at 100% and gpu stays at idle temperature (or the difference is within normal oscillations) as if cpu was doing all the work (and of course in the end the error that I was talking about).
Do you have notable temperature increase of gpu temperature?

There are two different kind of Einstein@Home scienec apps: "S5R5" (search for gravitational waves in LIGO data) and "ABP1" (search for binary pulsars in Arecibo radio astronomy data). Only the ABP1 Beta test app (that's what this thread is about) will use the GPU, the S5R5 app is CPU only. Whether you get jobs for S5R5 or ABP1 is more a random thing.

The ABP1 search will probably show up in you top output as "einsteinbinary_". If you see this one, you should notice a modest rise in GPU temperature, probably not as high as that of the GPUgrid app, tho.

CU
Bikeman

ralph
ralph
Joined: 11 Dec 08
Posts: 1
Credit: 65651
RAC: 0

RE: RE: Could you please

Message 94527 in response to message 94526

Quote:
Quote:


Could you please just tell me how do your processes look like. I have a feeling that nothing is happening while I run einstein@home. For example when running gpugrid (gpu units) and einstein (not GPU units) my processes on 4 processor machine look something like this:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
17988 boinc 39 19 77908 73m 1484 R 97 0.9 36:59.72 einstein_S5R5_1
29305 boinc 39 19 76628 72m 1484 R 97 0.9 351:29.07 einstein_S5R5_1
16354 boinc 39 19 77900 73m 1484 R 87 0.9 70:39.78 einstein_S5R5_1
30222 boinc 39 19 76624 72m 1484 R 52 0.9 332:55.69 einstein_S5R5_1
14624 boinc 30 10 97516 55m 23m S 47 0.7 51:00.66 acemd_6.66_x86_ (gpugrid unit)

And the gpu temperature rises from about 50°C to about 66-68°C. While running einstein@home I can see four processes at 100% and gpu stays at idle temperature (or the difference is within normal oscillations) as if cpu was doing all the work (and of course in the end the error that I was talking about).
Do you have notable temperature increase of gpu temperature?

There are two different kind of Einstein@Home scienec apps: "S5R5" (search for gravitational waves in LIGO data) and "ABP1" (search for binary pulsars in Arecibo radio astronomy data). Only the ABP1 Beta test app (that's what this thread is about) will use the GPU, the S5R5 app is CPU only. Whether you get jobs for S5R5 or ABP1 is more a random thing.

The ABP1 search will probably show up in you top output as "einsteinbinary_". If you see this one, you should notice a modest rise in GPU temperature, probably not as high as that of the GPUgrid app, tho.

CU
Bikeman

Gpugrid experienced a series of errors when the Nvidia Linux 185+ drivers came out. They managed a work around that solved the problem. It looks like the errors that are occurring here are similar in nature. People with 180 drivers can process the WUs but people with 185 or 190 drivers cannot.
The programmers may want to contact the Gpugrid people to see how they fixed their issue with the 185+ Linux Nvidia drivers.
I was able to process WUs with the 1.09 version of the application but the new 1.1 version goes to 100% immediately and stays stuck there. This is identical to the type of error that I used to experience with Gpugrid when the new Nvidia drivers were released.
Good luck in sorting out the problem.

Stephan Goll
Stephan Goll
Joined: 13 Dec 05
Posts: 25
Credit: 27834196
RAC: 0

Dear Bernd, I tried CUDA

Dear Bernd,

I tried CUDA ... but I got only limited success. Only CUDA 2.3 will get detected on my computer, older nvidia driver will load, but the CUDA toolkit will not compile (2.1) or simply not work (2.2).

It's this little box:
http://einsteinathome.org/host/2069906
http://setiathome.berkeley.edu/show_host_detail.php?hostid=5102101

Intel Atom 330, nVidia Ion chipset, 2 GB ram, debian 64, kernel 2.6.30 from debian backports, CUDA software from http://www.nvidia.com/object/cuda_get.html.

s@h wus seems to work, e@h wus will not even start.

19-Sep-2009 09:57:41 [---] Starting BOINC client version 6.6.36 for x86_64-pc-linux-gnu
19-Sep-2009 09:57:41 [---] log flags: task, file_xfer, sched_ops
19-Sep-2009 09:57:41 [---] Libraries: libcurl/7.18.0 OpenSSL/0.9.8g zlib/1.2.3.3 c-ares/1.5.1
19-Sep-2009 09:57:41 [---] Running as a daemon
19-Sep-2009 09:57:41 [---] Data directory: /home/boinc
19-Sep-2009 09:57:41 [---] Processor: 4 GenuineIntel Intel(R) Atom(TM) CPU 330 @ 1.60GHz [Family 6 Model 28 Stepping 2]
19-Sep-2009 09:57:41 [---] Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good pni dtes64 monitor ds_cpl tm2 ssse3 cx16 xtpr pdcm lahf_lm
19-Sep-2009 09:57:41 [---] OS: Linux: 2.6.30
19-Sep-2009 09:57:41 [---] Memory: 1.47 GB physical, 250.98 MB virtual
19-Sep-2009 09:57:41 [---] Disk: 4.58 GB total, 2.12 GB free
19-Sep-2009 09:57:41 [---] Local time is UTC +1 hours
19-Sep-2009 09:57:42 [---] CUDA device: ION (driver version 0, compute capability 1.1, 509MB, est. 6GFLOPS)
19-Sep-2009 09:57:42 [Einstein@Home] Found app_info.xml; using anonymous platform
19-Sep-2009 09:57:42 [---] Not using a proxy
19-Sep-2009 09:57:42 [SETI@home] URL: http://setiathome.berkeley.edu/; Computer ID: 5102101; location: home; project prefs: default
19-Sep-2009 09:57:42 [Einstein@Home] URL: http://einstein.phys.uwm.edu/; Computer ID: 2069906; location: home; project prefs: default
19-Sep-2009 09:57:42 [Einstein@Home] General prefs: from Einstein@Home (last modified 02-Mar-2007 22:05:08)
19-Sep-2009 09:57:42 [Einstein@Home] Computer location: home
19-Sep-2009 09:57:42 [---] General prefs: using separate prefs for home
19-Sep-2009 09:57:42 [---] Preferences limit memory usage when active to 752.68MB
19-Sep-2009 09:57:42 [---] Preferences limit memory usage when idle to 1354.82MB
19-Sep-2009 09:57:42 [---] Preferences limit disk usage to 2.29GB

Best regards,
Stephan

Jos van Wolput
Jos van Wolput
Joined: 11 Feb 05
Posts: 47
Credit: 800840
RAC: 0

I installed Boinc 6.10.6 wich

I installed Boinc 6.10.6 wich detects ATI GPU.
Does this CUDA app 1.10 work with ATI GPU?

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 2139
Credit: 2752801780
RAC: 1435084

RE: I installed Boinc

Message 94530 in response to message 94529

Quote:
I installed Boinc 6.10.6 wich detects ATI GPU.
Does this CUDA app 1.10 work with ATI GPU?


No.

'CUDA' is specifically a trade name for the NVidia architecture.

koubi
koubi
Joined: 22 Sep 09
Posts: 1
Credit: 1776176
RAC: 0

hello i tried cuda app

hello i tried cuda app 1.10:

sam 26 sep 2009 03:09:47 CEST Starting BOINC client version 6.10.4 for x86_64-pc-linux-gnu
sam 26 sep 2009 03:09:47 CEST log flags: task, file_xfer, sched_ops
sam 26 sep 2009 03:09:47 CEST Libraries: libcurl/7.18.0 OpenSSL/0.9.8g zlib/1.2.3.3 c-ares/1.5.1
sam 26 sep 2009 03:09:47 CEST Data directory: /home/koubi/Desktop/BOINC
sam 26 sep 2009 03:09:47 CEST Processor: 2 AuthenticAMD AMD Athlon(tm) 64 X2 Dual Core Processor 5000+ [Family 15 Model 107 Stepping 2]
sam 26 sep 2009 03:09:47 CEST Processor: 512.00 KB cache
sam 26 sep 2009 03:09:47 CEST Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt rdtscp lm 3dnowext 3dnow rep_good extd_apicid pni cx16 lahf_lm cmp_legacy svm extapic cr8_legacy 3dnowprefe
sam 26 sep 2009 03:09:47 CEST OS: Linux: 2.6.30.7
sam 26 sep 2009 03:09:47 CEST Memory: 3.86 GB physical, 956.93 MB virtual
sam 26 sep 2009 03:09:47 CEST Disk: 145.79 GB total, 16.66 GB free
sam 26 sep 2009 03:09:47 CEST Local time is UTC +2 hours
sam 26 sep 2009 03:09:47 CEST NVIDIA GPU 0: GeForce GTX 260 (driver version 0, CUDA version 2020, compute capability 1.3, 895MB, est. 117GFLOPS)
sam 26 sep 2009 03:09:47 CEST Can't load library libaticalrt.so
sam 26 sep 2009 03:09:47 CEST Einstein@Home Found app_info.xml; using anonymous platform

Task ID 140776480
Name p2030_53837_39307_0070_G63.81+00.12.C_6.dm_619_1
Workunit 59072881
Created 25 Sep 2009 1:57:53 UTC
Sent 25 Sep 2009 21:29:58 UTC
Received 26 Sep 2009 7:50:18 UTC
Server state Over
Outcome Success
Client state Done
Exit status 0 (0x0)
Computer ID 2093030
Report deadline 9 Oct 2009 21:29:58 UTC
CPU time 15969.72
stderr out

6.10.4

[23:30:19][21754][INFO ] Starting data processing...
[23:30:19][21754][INFO ] Using CUDA device #0 "GeForce GTX 260" (979.78 GFLOPS)
[23:30:19][21754][INFO ] Checkpoint file unavailable: status.cpt (No such file or directory).
------> Starting from scratch...
[23:30:19][21754][INFO ] Header contents:
------> Original WAPP file: p2030_53837_39307_0072_G63.81+00.12.C_6.wapp
------> Sample time in microseconds: 128
------> Observation time in seconds: 268.9792
------> Time stamp (MJD): 53837.454942129632
------> Number of samples/record: 512
------> Center freq in MHz: 1440
------> Channel band in MHz: 0.390625
------> Number of channels/record: 256
------> Nifs: 1
------> RA (J2000): 195144.090994
------> DEC (J2000): 270852.772618
------> Galactic l: 63.7035
------> Galactic b: 0.1204
------> Name: G63.81+00.12.C
------> Lagformat: 0
------> Sum: 1
------> Level: 3
------> AZ at start: 182.7928
------> ZA at start: 8.9304
------> AST at start: 0
------> LST at start: 0
------> Project ID: p2030
------> Observers: JD
------> File size (bytes): 16190754
------> Data size (bytes): 16179201
------> Number of samples: 2097152
------> Trial dispersion measure: 954.4 cm^-3 pc
------> Scale factor: 7394.48
[23:30:21][21754][INFO ] Seed for random number generator is -1148624978.
[23:30:22][21754][INFO ] Derived global search parameters:
------> f_A probability = 0.04
------> single bin prob(P_noise > P_thr) = 1.2977e-08
------> thr1 = 18.1601
------> thr2 = 21.263
------> thr4 = 26.2923
------> thr8 = 34.674
------> thr16 = 48.9881
[23:31:19][21754][INFO ] Checkpoint committed!

[00:26:09][27652][INFO ] Starting data processing...
[00:26:09][27652][INFO ] Using CUDA device #0 "GeForce GTX 260" (979.78 GFLOPS)
[00:26:09][27652][INFO ] Continuing work on ../../projects/einstein.phys.uwm.edu/p2030_53837_39307_0070_G63.81+00.12.C_6_619.binary at template no. 4375
[00:26:09][27652][INFO ] Header contents:
------> Original WAPP file: p2030_53837_39307_0072_G63.81+00.12.C_6.wapp
------> Sample time in microseconds: 128
------> Observation time in seconds: 268.9792
------> Time stamp (MJD): 53837.454942129632
------> Number of samples/record: 512
------> Center freq in MHz: 1440
------> Channel band in MHz: 0.390625
------> Number of channels/record: 256
------> Nifs: 1
------> RA (J2000): 195144.090994
------> DEC (J2000): 270852.772618
------> Galactic l: 63.7035
------> Galactic b: 0.1204
------> Name: G63.81+00.12.C
------> Lagformat: 0
------> Sum: 1
------> Level: 3
------> AZ at start: 182.7928
------> ZA at start: 8.9304
------> AST at start: 0
------> LST at start: 0
------> Project ID: p2030
------> Observers: JD
------> File size (bytes): 16190754
------> Data size (bytes): 16179201
------> Number of samples: 2097152
------> Trial dispersion measure: 954.4 cm^-3 pc
------> Scale factor: 7394.48
[00:26:11][27652][INFO ] Seed for random number generator is -1148624978.
[00:26:12][27652][INFO ] Derived global search parameters:
------> f_A probability = 0.04
------> single bin prob(P_noise > P_thr) = 1.2977e-08
------> thr1 = 18.1601
------> thr2 = 21.263
------> thr4 = 26.2923
------> thr8 = 34.674
------> thr16 = 48.9881
[00:27:10][27652][INFO ] Checkpoint committed!
[00:28:10][27652][INFO ] Checkpoint committed!
[00:29:11][27652][INFO ] Checkpoint committed!
[00:30:11][27652][INFO ] Checkpoint committed!
[01:15:26][32689][INFO ] Starting data processing...
[01:15:26][32689][INFO ] Using CUDA device #0 "GeForce GTX 260" (979.78 GFLOPS)
[01:15:26][32689][INFO ] Continuing work on ../../projects/einstein.phys.uwm.edu/p2030_53837_39307_0070_G63.81+00.12.C_6_619.binary at template no. 4698
[01:15:26][32689][INFO ] Header contents:
------> Original WAPP file: p2030_53837_39307_0072_G63.81+00.12.C_6.wapp
------> Sample time in microseconds: 128
------> Observation time in seconds: 268.9792
------> Time stamp (MJD): 53837.454942129632
------> Number of samples/record: 512
------> Center freq in MHz: 1440
------> Channel band in MHz: 0.390625
------> Number of channels/record: 256
------> Nifs: 1
------> RA (J2000): 195144.090994
------> DEC (J2000): 270852.772618
------> Galactic l: 63.7035
------> Galactic b: 0.1204
------> Name: G63.81+00.12.C
------> Lagformat: 0
------> Sum: 1
------> Level: 3
------> AZ at start: 182.7928
------> ZA at start: 8.9304
------> AST at start: 0
------> LST at start: 0
------> Project ID: p2030
------> Observers: JD
------> File size (bytes): 16190754
------> Data size (bytes): 16179201
------> Number of samples: 2097152
------> Trial dispersion measure: 954.4 cm^-3 pc
------> Scale factor: 7394.48
[01:15:28][32689][INFO ] Seed for random number generator is -1148624978.
[01:15:29][32689][INFO ] Derived global search parameters:
------> f_A probability = 0.04
------> single bin prob(P_noise > P_thr) = 1.2977e-08
------> thr1 = 18.1601
------> thr2 = 21.263
------> thr4 = 26.2923
------> thr8 = 34.674
------> thr16 = 48.9881
[01:52:58][8338][INFO ] Starting data processing...
[01:52:58][8338][INFO ] Using CUDA device #0 "GeForce GTX 260" (979.78 GFLOPS)
[01:52:58][8338][INFO ] Continuing work on ../../projects/einstein.phys.uwm.edu/p2030_53837_39307_0070_G63.81+00.12.C_6_619.binary at template no. 7081
[01:52:58][8338][INFO ] Header contents:
------> Original WAPP file: p2030_53837_39307_0072_G63.81+00.12.C_6.wapp
------> Sample time in microseconds: 128
------> Observation time in seconds: 268.9792
------> Time stamp (MJD): 53837.454942129632
------> Number of samples/record: 512
------> Center freq in MHz: 1440
------> Channel band in MHz: 0.390625
------> Number of channels/record: 256
------> Nifs: 1
------> RA (J2000): 195144.090994
------> DEC (J2000): 270852.772618
------> Galactic l: 63.7035
------> Galactic b: 0.1204
------> Name: G63.81+00.12.C
------> Lagformat: 0
------> Sum: 1
------> Level: 3
------> AZ at start: 182.7928
------> ZA at start: 8.9304
------> AST at start: 0
------> LST at start: 0
------> Project ID: p2030
------> Observers: JD
------> File size (bytes): 16190754
------> Data size (bytes): 16179201
------> Number of samples: 2097152
------> Trial dispersion measure: 954.4 cm^-3 pc
------> Scale factor: 7394.48
[01:53:00][8338][INFO ] Seed for random number generator is -1148624978.
[01:53:01][8338][INFO ] Derived global search parameters:
------> f_A probability = 0.04
------> single bin prob(P_noise > P_thr) = 1.2977e-08
------> thr1 = 18.1601
------> thr2 = 21.263
------> thr4 = 26.2923
------> thr8 = 34.674
------> thr16 = 48.9881
[01:53:58][8338][INFO ] Checkpoint committed!
[06:47:37][27973][INFO ] Starting data processing...
[06:47:37][27973][INFO ] Using CUDA device #0 "GeForce GTX 260" (979.78 GFLOPS)
[06:47:37][27973][INFO ] Continuing work on ../../projects/einstein.phys.uwm.edu/p2030_53837_39307_0070_G63.81+00.12.C_6_619.binary at template no. 7157
[06:47:37][27973][INFO ] Header contents:
------> Original WAPP file: p2030_53837_39307_0072_G63.81+00.12.C_6.wapp
------> Sample time in microseconds: 128
------> Observation time in seconds: 268.9792
------> Time stamp (MJD): 53837.454942129632
------> Number of samples/record: 512
------> Center freq in MHz: 1440
------> Channel band in MHz: 0.390625
------> Number of channels/record: 256
------> Nifs: 1
------> RA (J2000): 195144.090994
------> DEC (J2000): 270852.772618
------> Galactic l: 63.7035
------> Galactic b: 0.1204
------> Name: G63.81+00.12.C
------> Lagformat: 0
------> Sum: 1
------> Level: 3
------> AZ at start: 182.7928
------> ZA at start: 8.9304
------> AST at start: 0
------> LST at start: 0
------> Project ID: p2030
------> Observers: JD
------> File size (bytes): 16190754
------> Data size (bytes): 16179201
------> Number of samples: 2097152
------> Trial dispersion measure: 954.4 cm^-3 pc
------> Scale factor: 7394.48
[06:47:39][27973][INFO ] Seed for random number generator is -1148624978.
[06:47:40][27973][INFO ] Derived global search parameters:
------> f_A probability = 0.04
------> single bin prob(P_noise > P_thr) = 1.2977e-08
------> thr1 = 18.1601
------> thr2 = 21.263
------> thr4 = 26.2923
------> thr8 = 34.674
------> thr16 = 48.9881

[09:50:13][27973][INFO ] Data processing finished successfully!
called boinc_finish

]]>

Validate state Valid
Claimed credit 77.6536025819315
Granted credit 250
application version 1.10

gtx 260 216sp gpu is overclocked: core@756mhz memory@1096mhz shaders@1512mhz

Bikeman (Heinz-Bernd Eggenstein)
Bikeman (Heinz-...
Moderator
Joined: 28 Aug 06
Posts: 3522
Credit: 686127184
RAC: 579488

RE: hello i tried cuda app

Message 94532 in response to message 94531

Quote:

hello i tried cuda app 1.10:

Looks good!!!
CU
Bikeman

Skip Da Shu
Skip Da Shu
Joined: 18 Jan 05
Posts: 140
Credit: 654755967
RAC: 2651315

I'm getting comp errors on

I'm getting comp errors on between 1/3 and 1/2 of the WUs so far. Lowered the clock on the card a bit tonight so will see if that makes any diff.

Most recent invalid is HERE.

Linux, kernel 2.6.28, 64b, GTX-260 running 190.36 driver.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.