Some VERY weird happenings on my system running E@H CUDA.
SETI and E@H are not playing very will with each other. Mostly, SETI is a bully and takes over the system so far as the gpu is concerned. The only way I've been able to get E@H CUDA to run is to suspend SETI completely.
Then last night I had the system clock reset itself backwards about 5 hrs. See the log segment below. I restarted BOINC, then reset the clock (probably not the best sequence to do it) and it looks OK for right now.
[edit]
Actually, I don't know exactly how far back the clock was reset. I did the update listed after 23:15 at about 5:00 am. There are NO log entries between 23:15 and that update. I looked at Task Manager before I restarted BOINC and it looked like everything was idle (System Idle 99%), however Boincview said things were running. I did not monitor the cpu% complete to see if it was increasing however.
[/edit]
06-Aug-2009 19:46:12 [Einstein@Home] Restarting task p2030_53648_84220_0074_G62.52-00.55.N_0.dm_560_0 using einsteinbinary_ABP1 version 307
06-Aug-2009 20:45:51 [Einstein@Home] Resuming task h1_0805.90_S5R4__334_S5R5a_2 using einstein_S5R5 version 305
06-Aug-2009 22:07:46 [SETI@home] task 18fe09ac.4381.1708.7.10.34_0 suspended by user
06-Aug-2009 22:07:52 [SETI@home] resumed by user
06-Aug-2009 22:07:52 [Einstein@Home] Resuming task h1_0805.90_S5R4__315_S5R5a_2 using einstein_S5R5 version 305
06-Aug-2009 22:07:53 [SETI@home] Restarting task 18fe09aa.518.9479.6.10.218_1 using setiathome_enhanced version 608
06-Aug-2009 22:07:54 [SETI@home] Computation for task 18fe09aa.518.9479.6.10.218_1 finished
06-Aug-2009 22:07:54 [SETI@home] Output file 18fe09aa.518.9479.6.10.218_1_0 for task 18fe09aa.518.9479.6.10.218_1 absent
06-Aug-2009 22:07:54 [SETI@home] Starting 18fe09ab.21304.6616.11.10.86_0
06-Aug-2009 22:07:54 [SETI@home] Starting task 18fe09ab.21304.6616.11.10.86_0 using setiathome_enhanced version 608
06-Aug-2009 22:08:51 [SETI@home] task 18fe09ac.4381.1708.7.10.34_0 resumed by user
06-Aug-2009 22:08:51 [SETI@home] Restarting task 18fe09ac.4381.1708.7.10.34_0 using setiathome_enhanced version 608
06-Aug-2009 22:08:59 [SETI@home] update requested by user
06-Aug-2009 22:09:01 [SETI@home] Sending scheduler request: Requested by user.
06-Aug-2009 22:09:01 [SETI@home] Reporting 1 completed tasks, not requesting new tasks
06-Aug-2009 22:09:06 [SETI@home] Scheduler request completed: got 0 new tasks
06-Aug-2009 22:56:17 [SETI@home] Computation for task 18fe09ac.4381.1708.7.10.34_0 finished
06-Aug-2009 22:56:17 [SETI@home] Resuming task 18fe09ab.21304.6616.11.10.86_0 using setiathome_enhanced version 608
06-Aug-2009 22:56:19 [SETI@home] Started upload of 18fe09ac.4381.1708.7.10.34_0_0
06-Aug-2009 22:56:23 [SETI@home] Finished upload of 18fe09ac.4381.1708.7.10.34_0_0
06-Aug-2009 22:57:17 [SETI@home] Sending scheduler request: To fetch work.
06-Aug-2009 22:57:17 [SETI@home] Reporting 1 completed tasks, requesting new tasks
06-Aug-2009 22:57:22 [SETI@home] Scheduler request completed: got 2 new tasks
06-Aug-2009 22:57:22 [SETI@home] Message from server: No work can be sent for the applications you have selected
06-Aug-2009 22:57:22 [SETI@home] Message from server: No work is available for Astropulse v5
06-Aug-2009 22:57:22 [SETI@home] Message from server: You have selected to receive work from other applications if no work is available for the applications you selected
06-Aug-2009 22:57:22 [SETI@home] Message from server: Sending work from other applications
06-Aug-2009 22:57:24 [SETI@home] Started download of 18fe09ad.11473.154468.9.10.173
06-Aug-2009 22:57:24 [SETI@home] Started download of 17oc08ab.19925.8252.6.10.231
06-Aug-2009 22:57:28 [SETI@home] Finished download of 17oc08ab.19925.8252.6.10.231
06-Aug-2009 22:57:29 [SETI@home] Finished download of 18fe09ad.11473.154468.9.10.173
06-Aug-2009 23:06:56 [SETI@home] Resuming task 24fe09ab.21278.9070.15.10.221_1 using setiathome_enhanced version 603
06-Aug-2009 23:09:33 [SETI@home] Resuming task 18fe09ab.21304.7843.11.10.51_1 using setiathome_enhanced version 603
06-Aug-2009 23:14:59 [SETI@home] Computation for task 24fe09ab.21278.9070.15.10.221_1 finished
06-Aug-2009 23:14:59 [SETI@home] Starting 18fe09ae.15593.8253.5.10.75_0
06-Aug-2009 23:14:59 [SETI@home] Starting task 18fe09ae.15593.8253.5.10.75_0 using setiathome_enhanced version 603
06-Aug-2009 23:15:01 [SETI@home] Started upload of 24fe09ab.21278.9070.15.10.221_1_0
06-Aug-2009 23:15:05 [SETI@home] Finished upload of 24fe09ab.21278.9070.15.10.221_1_0
06-Aug-2009 19:50:51 [SETI@home] update requested by user
06-Aug-2009 19:51:12 [SETI@home] suspended by user
06-Aug-2009 19:51:13 [Einstein@Home] Resuming task h1_0805.90_S5R4__334_S5R5a_2 using einstein_S5R5 version 305
06-Aug-2009 19:54:38 [---] Exit requested by user
Does anybody happen to know or has the patience to try out how to write an app_info.xml that can run a CUDA version of an application on each CUDA device and a CPU version of the same application (here: einsteinbinary_ABP1) on the remaining CPU cores?
Does anybody happen to know or has the patience to try out how to write an app_info.xml that can run a CUDA version of an application on each CUDA device and a CPU version of the same application (here: einsteinbinary_ABP1) on the remaining CPU cores?
BM
Is that possible at all? I thought that's why there are different applications (6.03 / 6.08) at SETI for the same data to be processed. However, I have no practical experience with CUDA processing (and app_info.xml writing :-).
Gruß,
Gundolf
Computer sind nicht alles im Leben. (Kleiner Scherz)
Does anybody happen to know or has the patience to try out how to write an app_info.xml that can run a CUDA version of an application on each CUDA device and a CPU version of the same application (here: einsteinbinary_ABP1) on the remaining CPU cores?
BM
No problem at all - it works exactly as you might expect provided the host is running BOINC v6.6.14 or later.
I think it's good practice, but not strictly necessary (except for compatibility with BOINC 6.4 - see below), to keep the version numbers distinct: and at the moment we're lucky with v3.07 for CUDA and v3.08 for CPU. Perhaps you could stick to the odd/even numbering throughout the Beta phase? I'll test one of those and see how it works - report later.
If an app_info like that is run under BOINC v6.4.5/7, only the higher version number will fetch work - so currently 308/CPU would be preferred. With the next release, 309/CUDA would get precedence, and so on.
Edit - Bernd, could you post a direct download link for the 3.08 CPU application, please? None of my hosts have updated themselves yet - they all seem to be busy with S5R5.
Never mind - found it myself. The following app_info has loaded without errors on my test host, and downloaded two new CUDA tasks. It won't fetch any CPU work at the moment, because I'm wrestling with a weird AQUA multi-threading bug (see boinc_alpha) that runs in EDF for no apparent reason. I should be freed from that millstone in about an hour and a half, and I'll let you know what downloads then.
I've added two new file infos (308.exe and 308_graphics), and a whole new app_version at the end: all that's missing is the api_version line, but frankly I've never noticed it making the slightest diffence, and never known what it was for.
I've just downloaded and installed CUDA version and got errors immediatelly. Check Tasks 135844155, 135908112, 135914178. Here are details for one of these tasks
Activated exception handling...
[17:42:27][5224][INFO ] Starting data processing...
[17:42:29][5224][INFO ] Checkpoint file unavailable: status.cpt (No such file or directory).
------> Starting from scratch...
[17:42:29][5224][INFO ] Header contents:
------> Original WAPP file: p2030_53703_72629_0179_G49.79-01.73.N_5.wapp
------> Sample time in microseconds: 128
------> Observation time in seconds: 268.9792
------> Time stamp (MJD): 53703.840613425928
------> Number of samples/record: 512
------> Center freq in MHz: 1440
------> Channel band in MHz: 0.390625
------> Number of channels/record: 256
------> Nifs: 1
------> RA (J2000): 192936.834455
------> DEC (J2000): 141150.559412
------> Galactic l: 49.8928
------> Galactic b: -1.7891
------> Name: G49.79-01.73.N
------> Lagformat: 0
------> Sum: 1
------> Level: 3
------> AZ at start: 432.2594
------> ZA at start: 12.5091
------> AST at start: 0
------> LST at start: 0
------> Project ID: p2030
------> Observers: VickyKaspi
------> File size (bytes): 16190702
------> Data size (bytes): 16179201
------> Number of samples: 2097152
------> Trial dispersion measure: 303.4 cm^-3 pc
------> Scale factor: 6441.3
[17:42:31][5224][INFO ] Seed for random number generator is 1015217833.
[17:42:33][5224][ERROR] Error creating CUDA FFT plan (error code: 8)
[17:42:33][5224][ERROR] Demodulation failed (error: 3)!
called boinc_finish
]]>
Platform is Windows. Boinc says
CUDA device: GeForce 9600 GT (driver version 17779, compute capability 1.1, 512 MB)
I've added two new file infos (308.exe and 308_graphics), and a whole new app_version at the end.....
That worked OK: now that AQUA has finished, BOINC has downloaded a couple of ABP1/308/CPU to match the CUDAs it already has.
Got a CUDA finishing in a couple of minutes - see you in the 3.09 thread. These 3.07/3.08/3.09 timings are going to be interesting.
Edit - ooops, misread. 309 is not a CUDA release. I'll let those 308s run on the existing app, and only then replace 308 with 309 to get comparative timings.
Result p2030_53617_03095_0027_G53.81-00.16.N_2.dm_615_0 finished (App 3.07). Host ID 2028119: CPU E6550 @ 2.33GHz + GF 9600GT (drivers 19038 ASUS), Win XP x86. Wall time 21,812.95s, granted credit 250. Nothing interesting for credithunters, OK for testers.
Activated exception handling...
[21:43:15][2672][INFO ] Starting data processing...
[21:43:15][2672][INFO ] Using CUDA device #0 "GeForce 8500 GT" (54.43 GFLOPS)
[21:43:15][2672][INFO ] Checkpoint file unavailable: status.cpt (No such file or directory).
------> Starting from scratch...
[21:43:15][2672][INFO ] Header contents:
------> Original WAPP file: p2030_53835_35106_0039_G36.72-00.43.C_5.wapp
------> Sample time in microseconds: 128
------> Observation time in seconds: 268.9792
------> Time stamp (MJD): 53835.406319444446
------> Number of samples/record: 512
------> Center freq in MHz: 1440
------> Channel band in MHz: 0.390625
------> Number of channels/record: 256
------> Nifs: 1
------> RA (J2000): 190014.12007
------> DEC (J2000): 31010.070266
------> Galactic l: 36.7553
------> Galactic b: -0.5094
------> Name: G36.72-00.43.C
------> Lagformat: 0
------> Sum: 1
------> Level: 3
------> AZ at start: 334.3573
------> ZA at start: 16.662
------> AST at start: 0
------> LST at start: 0
------> Project ID: p2030
------> Observers: JD
------> File size (bytes): 16190754
------> Data size (bytes): 16179201
------> Number of samples: 2097152
------> Trial dispersion measure: 14.4 cm^-3 pc
------> Scale factor: 7081.6
[21:43:17][2672][INFO ] Seed for random number generator is 1009009674.
[21:43:17][2672][ERROR] Error creating CUDA FFT plan (error code: 2)
[21:43:17][2672][ERROR] Demodulation failed (error: 3)!
called boinc_finish
Some VERY weird happenings on
)
Some VERY weird happenings on my system running E@H CUDA.
SETI and E@H are not playing very will with each other. Mostly, SETI is a bully and takes over the system so far as the gpu is concerned. The only way I've been able to get E@H CUDA to run is to suspend SETI completely.
Then last night I had the system clock reset itself backwards about 5 hrs. See the log segment below. I restarted BOINC, then reset the clock (probably not the best sequence to do it) and it looks OK for right now.
[edit]
Actually, I don't know exactly how far back the clock was reset. I did the update listed after 23:15 at about 5:00 am. There are NO log entries between 23:15 and that update. I looked at Task Manager before I restarted BOINC and it looked like everything was idle (System Idle 99%), however Boincview said things were running. I did not monitor the cpu% complete to see if it was increasing however.
[/edit]
Seti Classic Final Total: 11446 WU.
Does anybody happen to know
)
Does anybody happen to know or has the patience to try out how to write an app_info.xml that can run a CUDA version of an application on each CUDA device and a CPU version of the same application (here: einsteinbinary_ABP1) on the remaining CPU cores?
BM
BM
RE: Does anybody happen to
)
Is that possible at all? I thought that's why there are different applications (6.03 / 6.08) at SETI for the same data to be processed. However, I have no practical experience with CUDA processing (and app_info.xml writing :-).
Gruß,
Gundolf
Computer sind nicht alles im Leben. (Kleiner Scherz)
RE: Does anybody happen to
)
No problem at all - it works exactly as you might expect provided the host is running BOINC v6.6.14 or later.
I think it's good practice, but not strictly necessary (except for compatibility with BOINC 6.4 - see below), to keep the version numbers distinct: and at the moment we're lucky with v3.07 for CUDA and v3.08 for CPU. Perhaps you could stick to the odd/even numbering throughout the Beta phase? I'll test one of those and see how it works - report later.
If an app_info like that is run under BOINC v6.4.5/7, only the higher version number will fetch work - so currently 308/CPU would be preferred. With the next release, 309/CUDA would get precedence, and so on.
Edit - Bernd, could you post a direct download link for the 3.08 CPU application, please? None of my hosts have updated themselves yet - they all seem to be busy with S5R5.
Never mind - found it myself.
)
Never mind - found it myself. The following app_info has loaded without errors on my test host, and downloaded two new CUDA tasks. It won't fetch any CPU work at the moment, because I'm wrestling with a weird AQUA multi-threading bug (see boinc_alpha) that runs in EDF for no apparent reason. I should be freed from that millstone in about an hour and a half, and I'll let you know what downloads then.
I've added two new file infos (308.exe and 308_graphics), and a whole new app_version at the end: all that's missing is the api_version line, but frankly I've never noticed it making the slightest diffence, and never known what it was for.
einstein_S5R5
einstein_S5R5_3.05_windows_intelx86.exe
einstein_S5R5_3.05_windows_intelx86_0.exe
einstein_S5R5_3.05_windows_intelx86_1.exe
einstein_S5R5_3.05_windows_intelx86_2.exe
einstein_S5R5_3.05_graphics_windows_intelx86.exe
einsteinbinary_ABP1
einsteinbinary_ABP1_3.07_graphics_windows_intelx86.exe
einsteinbinary_ABP1_3.07_windows_intelx86_cuda.exe
einsteinbinary_ABP1_3.08_graphics_windows_intelx86.exe
einsteinbinary_ABP1_3.08_windows_intelx86.exe
cudart.dll
cufft.dll
einstein_S5R5
305
6.3.0
einstein_S5R5_3.05_windows_intelx86.exe
einstein_S5R5_3.05_windows_intelx86_0.exe
einstein_S5R5_3.05_windows_intelx86_1.exe
einstein_S5R5_3.05_windows_intelx86_2.exe
einstein_S5R5_3.05_graphics_windows_intelx86.exe
graphics_app
einsteinbinary_ABP1
307
cuda
1.0
1.0
CUDA
1
6.7.0
einsteinbinary_ABP1_3.07_windows_intelx86_cuda.exe
einsteinbinary_ABP1_3.07_graphics_windows_intelx86.exe
graphics_app
cudart.dll
cufft.dll
einsteinbinary_ABP1
308
einsteinbinary_ABP1_3.08_windows_intelx86.exe
einsteinbinary_ABP1_3.08_graphics_windows_intelx86.exe
graphics_app
Hello, I've just
)
Hello,
I've just downloaded and installed CUDA version and got errors immediatelly. Check Tasks 135844155, 135908112, 135914178. Here are details for one of these tasks
6.6.36
������� �� ������� ����� ��������� ����. (0x3) - exit code 3 (0x3)
Activated exception handling...
[17:42:27][5224][INFO ] Starting data processing...
[17:42:29][5224][INFO ] Checkpoint file unavailable: status.cpt (No such file or directory).
------> Starting from scratch...
[17:42:29][5224][INFO ] Header contents:
------> Original WAPP file: p2030_53703_72629_0179_G49.79-01.73.N_5.wapp
------> Sample time in microseconds: 128
------> Observation time in seconds: 268.9792
------> Time stamp (MJD): 53703.840613425928
------> Number of samples/record: 512
------> Center freq in MHz: 1440
------> Channel band in MHz: 0.390625
------> Number of channels/record: 256
------> Nifs: 1
------> RA (J2000): 192936.834455
------> DEC (J2000): 141150.559412
------> Galactic l: 49.8928
------> Galactic b: -1.7891
------> Name: G49.79-01.73.N
------> Lagformat: 0
------> Sum: 1
------> Level: 3
------> AZ at start: 432.2594
------> ZA at start: 12.5091
------> AST at start: 0
------> LST at start: 0
------> Project ID: p2030
------> Observers: VickyKaspi
------> File size (bytes): 16190702
------> Data size (bytes): 16179201
------> Number of samples: 2097152
------> Trial dispersion measure: 303.4 cm^-3 pc
------> Scale factor: 6441.3
[17:42:31][5224][INFO ] Seed for random number generator is 1015217833.
[17:42:33][5224][ERROR] Error creating CUDA FFT plan (error code: 8)
[17:42:33][5224][ERROR] Demodulation failed (error: 3)!
called boinc_finish
]]>
Platform is Windows. Boinc says
CUDA device: GeForce 9600 GT (driver version 17779, compute capability 1.1, 512 MB)
Regards,
Andrew
RE: CUDA device: GeForce
)
Read the opening post again, and update your GeForce driver to a matching version - 181.20 or higher.
RE: I've added two new file
)
That worked OK: now that AQUA has finished, BOINC has downloaded a couple of ABP1/308/CPU to match the CUDAs it already has.
Got a CUDA finishing in a couple of minutes - see you in the 3.09 thread. These 3.07/3.08/3.09 timings are going to be interesting.
Edit - ooops, misread. 309 is not a CUDA release. I'll let those 308s run on the existing app, and only then replace 308 with 309 to get comparative timings.
Result
)
Result p2030_53617_03095_0027_G53.81-00.16.N_2.dm_615_0 finished (App 3.07). Host ID 2028119: CPU E6550 @ 2.33GHz + GF 9600GT (drivers 19038 ASUS), Win XP x86. Wall time 21,812.95s, granted credit 250. Nothing interesting for credithunters, OK for testers.
10 Tasks and 10 errors for
)
10 Tasks and 10 errors for host 1275368. From 57029869 to 57030024...
Update: BOINC 6.6.36. NVIDIA GeForce 8500 GT.
СUDA device: GeForce 8500 GT (driver version 19038, compute capability 1.1, 256MB, est. 6GFLOPS)