exited with zero status but no 'finished' file

Sector
Sector
Joined: 15 Dec 05
Posts: 11
Credit: 68711652
RAC: 39529
Topic 195949

Quote:
06-Sep-2011 20:37:23 [Einstein@Home] Task p2030.20090419.G61.17-01.48.S.b2s0g0.00000.dm_2144_0 exited with zero status but no 'finished' file

All CUDA units are doing this for a few days now, I've tried resetting the project didn't help.

"NVIDIA GPU 0: GeForce GT 240 (driver version unknown, CUDA version 4000, compute capability 1.2, 511MB, 257 GFLOPS peak)"

BOINC: 6.12.34
GPU: Nvidia 240GT (512mb)
Nvidia drivers: 270.41.19

CPU units are completing normally, GPU units used to be working fine, without any changes on my end this started happening.

06-Sep-2011 21:34:07 [---] NVIDIA GPU 0: GeForce GT 240 (driver version unknown, CUDA version 4000, compute capability 1.2, 511MB, 257 GFLOPS peak)

I updated the Nvidia drivers to: 275.09.07, no change.

Does anyone have any ideas?

Gundolf Jahn
Gundolf Jahn
Joined: 1 Mar 05
Posts: 1079
Credit: 341280
RAC: 0

exited with zero status but no 'finished' file

Did you see the error message in the erroneous CUDA tasks?

[21:09:27][15072][INFO ] Starting data processing...
Error: API mismatch: the NVIDIA kernel module has version 270.41.19,
but this NVIDIA driver component has version 275.09.07.  Please make
sure that the kernel module and all NVIDIA driver components
have the same version.
[21:09:27][15072][ERROR] Couldn't initialize CUDA driver API (error: 100)!
[21:09:27][15072][ERROR] Demodulation failed (error: 1020)!
21:09:27 (15072): called boinc_finish


Gruß,
Gundolf

Computer sind nicht alles im Leben. (Kleiner Scherz)

Sector
Sector
Joined: 15 Dec 05
Posts: 11
Credit: 68711652
RAC: 39529

Nope, no version mismatch

Nope, no version mismatch problem. This started before I updated the Nvidia drivers. I did that to see if it would help, it didn't. Note if those didn't match wouldn't have even let me restart X11 after I updated the drivers. Though it can happen if I update the drivers but don't restart X11 and leave the old Nvidia module in memory.

Jord
Joined: 26 Jan 05
Posts: 2952
Credit: 5779100
RAC: 0

Did you try a reboot of the

Did you try a reboot of the system? The only way to reset anything wrong with (something stuck in the memory of) a GPU (on a videocard) is to reboot the whole computer.

Gundolf Jahn
Gundolf Jahn
Joined: 1 Mar 05
Posts: 1079
Credit: 341280
RAC: 0

RE: Nope, no version

Quote:
Nope, no version mismatch problem.


That excerpt was from your task 246074743 sent back on 7 Sep 2011 4:38:27 UTC.

Gruß,
Gundolf

Computer sind nicht alles im Leben. (Kleiner Scherz)

Jord
Joined: 26 Jan 05
Posts: 2952
Credit: 5779100
RAC: 0

His tasks on September

His tasks on September 7th

Quote:
[20:25:47][5094][INFO ] Seed for random number generator is -1018690778.
[20:25:49][5094][ERROR] Error creating CUDA FFT plan (error code: 2)
[20:25:49][5094][ERROR] Demodulation failed (error: 1011)!
[20:25:49][5094][WARN ] CUDA memory allocation problem encountered!
------> Returning control to BOINC, delaying restart for at least five minutes...
------> If this problem persists you should consider aborting this task.
[20:31:04][10374][INFO ] Application startup - thank you for supporting Einstein@Home!
[20:31:04][10374][INFO ] Starting data processing...
[20:31:04][10374][INFO ] CUDA global memory status (initial GPU state, including context):
------> Used in total: 426 MB (86 MB free / 512 MB total) -> Used by this application (assuming a single GPU task): 0 MB


September 6th

Quote:
[17:44:29][31634][INFO ] Seed for random number generator is 1109422677.
[17:44:31][31634][ERROR] Error creating CUDA FFT plan (error code: 2)
[17:44:31][31634][ERROR] Demodulation failed (error: 1011)!
[17:44:31][31634][WARN ] CUDA memory allocation problem encountered!
------> Returning control to BOINC, delaying restart for at least five minutes...
------> If this problem persists you should consider aborting this task.
[17:44:32][31639][INFO ] Application startup - thank you for supporting Einstein@Home!
[17:44:32][31639][INFO ] Starting data processing...
[17:44:32][31639][INFO ] CUDA global memory status (initial GPU state, including context):
------> Used in total: 383 MB (129 MB free / 512 MB total) -> Used by this application (assuming a single GPU task): 0 MB


September 5th

Quote:
[18:46:42][11321][INFO ] Seed for random number generator is -1021593951.
[18:46:44][11321][ERROR] Error allocating power spectrum device memory: 25166336 bytes (error: 2)
[18:46:44][11321][ERROR] Demodulation failed (error: 1006)!
[18:46:44][11321][WARN ] CUDA memory allocation problem encountered!
------> Returning control to BOINC, delaying restart for at least five minutes...
------> If this problem persists you should consider aborting this task.
[18:46:45][11323][INFO ] Application startup - thank you for supporting Einstein@Home!
[18:46:45][11323][INFO ] Starting data processing...
[18:46:45][11323][INFO ] CUDA global memory status (initial GPU state, including context):
------> Used in total: 315 MB (197 MB free / 512 MB total) -> Used by this application (assuming a single GPU task): 0 MB


and September 4th, show a memory problem.

Quote:
[13:15:46][2366][INFO ] Seed for random number generator is 1141547449.
[13:15:48][2366][ERROR] Error allocating power spectrum device memory: 25166336 bytes (error: 2)
[13:15:48][2366][ERROR] Demodulation failed (error: 1006)!
[13:15:48][2366][WARN ] CUDA memory allocation problem encountered!
------> Returning control to BOINC, delaying restart for at least five minutes...
------> If this problem persists you should consider aborting this task.
[13:15:49][2368][INFO ] Application startup - thank you for supporting Einstein@Home!
[13:15:49][2368][INFO ] Starting data processing...
[13:15:49][2368][INFO ] CUDA global memory status (initial GPU state, including context):
------> Used in total: 315 MB (197 MB free / 512 MB total) -> Used by this application (assuming a single GPU task): 0 MB
[13:15:49][2368][INFO ] Using CUDA device #0 "GeForce GT 240" (96 CUDA cores / 385.92 GFLOPS)

Now, of course you can ignore that and continue to try to run tasks, while hoping for a miracle, but it will only be fixed when there's enough GPU memory free. Which can only reliably be done with a system reboot. It cannot be done in any other way, only with a full power cycle.

Sector
Sector
Joined: 15 Dec 05
Posts: 11
Credit: 68711652
RAC: 39529

RE: RE: Nope, no version

Quote:
Quote:
Nope, no version mismatch problem.

That excerpt was from your task 246074743 sent back on 7 Sep 2011 4:38:27 UTC.

Gruß,
Gundolf

Right, after the driver update but before I did a reboot. Wasn't in there that long, just left BOINC running while I updated a few other things.

And problem started before then.

Note the CUDA units seem to be completing normally now, without changing anything on this end.

Yes the system has been restarted since the problem began.

Jord
Joined: 26 Jan 05
Posts: 2952
Credit: 5779100
RAC: 0

RE: Yes the system has been

Quote:
Yes the system has been restarted since the problem began.


Which is what fixed it. Next time you see loads of errors, don't ask, just reboot first. Chances are high that'll instantly fix the problem.

Sector
Sector
Joined: 15 Dec 05
Posts: 11
Credit: 68711652
RAC: 39529

RE: RE: Yes the system

Quote:
Quote:
Yes the system has been restarted since the problem began.

Which is what fixed it. Next time you see loads of errors, don't ask, just reboot first. Chances are high that'll instantly fix the problem.

Was still doing it for awhile after the reboot.

Sector
Sector
Joined: 15 Dec 05
Posts: 11
Credit: 68711652
RAC: 39529

It started doing it again.

It started doing it again. Just to be clear: A cold boot DID NOT help.

Jord
Joined: 26 Jan 05
Posts: 2952
Credit: 5779100
RAC: 0

Well, if your videocard has

Well, if your videocard has those errors repeatedly, and it continues immediately after a reboot, there's a good chance it's a problem with the videocard (broken memory, broken capacitors, too much heat, etc.). I don't see any other option but to either replace that videocard, or test with another.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.