CUDA failes -- before and after update of BOINC and restart of project

Wurgl (speak^Wcrunching for Special: Off-Topic)
Wurgl (speak^Wc...
Joined: 11 Feb 05
Posts: 321
Credit: 140550008
RAC: 0
Topic 195724

Last week, on the 21the or March, I got a lot of errors like:

Quote:
Einstein@Home If this happens repeatedly you may need to reset the project.

Okay. I turned of downloading new WUs and waited until the last E@H, which was yesterday evening, then reported all E@H-WUs, resetted the E@H project as advised. Then stopped boinc, upgraded to version 6.10.58 restarted the whole thing and got again the same error messages.

http://einsteinathome.org/host/1967526/tasks&offset=0&show_names=0&state=5

The only success: It seems to crash faster and seems to waste fewer CPU-cycles.

Fine! I like CUDA!

What experiment is the best to get a better success?

Switch from the newest stable CUDA-Driver 260.19.44 to the older experimental version 270.26?

Gundolf Jahn
Gundolf Jahn
Joined: 1 Mar 05
Posts: 1079
Credit: 341280
RAC: 0

CUDA failes -- before and after update of BOINC and restart of p

Quote:
Last week, on the 21the or March, I got a lot of errors like:
Quote:
Einstein@Home If this happens repeatedly you may need to reset the project.


That message is not the problem (and it's BS by the way). The problem is:

[05:58:55][12678][ERROR] Error allocating modulated time series device memory: 50332672 bytes (error: 2)
[05:58:55][12678][ERROR] Demodulation failed (error: 1006)!
[05:58:55][12678][WARN ] CUDA memory allocation problem encountered!

And, sorry, I can't tell you what to do against that.

Gruß,
Gundolf

Computer sind nicht alles im Leben. (Kleiner Scherz)

Wurgl (speak^Wcrunching for Special: Off-Topic)
Wurgl (speak^Wc...
Joined: 11 Feb 05
Posts: 321
Credit: 140550008
RAC: 0

6-11% of 512MB memory is

6-11% of 512MB memory is used.

$ while `sleep 1`; do nvidia-smi -a| grep Memory; done
            Memory              : 6%
            Memory              : 6%
            Memory              : 6%
            Memory              : 6%
            Memory              : 6%
            Memory              : 9%
            Memory              : 10%
            Memory              : 7%
            Memory              : 6%
            Memory              : 9%
            Memory              : 11%
            Memory              : 8%
            Memory              : 10%
            Memory              : 8%
            Memory              : 6%
            Memory              : 6%
            Memory              : 9%
            Memory              : 9%
            Memory              : 8%
            Memory              : 7%
            Memory              : 11%

and I always see these values, I never see much larger ones.

mikey
mikey
Joined: 22 Jan 05
Posts: 12705
Credit: 1839110349
RAC: 3608

RE: 6-11% of 512MB memory

Quote:

6-11% of 512MB memory is used.

and I always see these values, I never see much larger ones.

That is not a newer model gpu, what other projects have you tried using it on? If none try Collatz or PrimeGrid and see if it works there. Seti also has a gpu app so you might try that too. I am thinking it is just not up to what Einstein wants or is looking for.

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 2143
Credit: 2961452603
RAC: 690751

RE: Seti also has a gpu app

Quote:
Seti also has a gpu app so you might try that too.


No, SETI doesn't have a stock CUDA app for Linux.

Wurgl (speak^Wcrunching for Special: Off-Topic)
Wurgl (speak^Wc...
Joined: 11 Feb 05
Posts: 321
Credit: 140550008
RAC: 0

RE: RE: 6-11% of 512MB

Quote:
Quote:

6-11% of 512MB memory is used.

and I always see these values, I never see much larger ones.

That is not a newer model gpu, what other projects have you tried using it on? If none try Collatz or PrimeGrid and see if it works there. Seti also has a gpu app so you might try that too. I am thinking it is just not up to what Einstein wants or is looking for.

This machine is a business machine. I need it for those nasty things like earning money. And it is what I got out of the box. Yes, it is close to two years old and it will still have this configuration for the next three years. Then it might be replaced.

There is only one chance for a new graphics card: When my monitor breaks and I buy a newer model which needs something like two DVI-Channels (with two monitors I would then need a card with 4 channels).

BTW: This machine runs E@H, SETI, Rectilinear Crossing (currently dead) and orbit@home (currently no work).

mickydl*
mickydl*
Joined: 7 Oct 08
Posts: 39
Credit: 200374822
RAC: 0

RE: RE: Last week, on the

Quote:
Quote:
Last week, on the 21the or March, I got a lot of errors like:
Quote:
Einstein@Home If this happens repeatedly you may need to reset the project.


That message is not the problem (and it's BS by the way). The problem is:
[05:58:55][12678][ERROR] Error allocating modulated time series device memory: 50332672 bytes (error: 2)
[05:58:55][12678][ERROR] Demodulation failed (error: 1006)!
[05:58:55][12678][WARN ] CUDA memory allocation problem encountered!

And, sorry, I can't tell you what to do against that.

Gruß,
Gundolf

I believe the real reason becomes obvious if you continue to read the log just few more lines.

[06:09:52][13372][INFO ] Seed for random number generator is 1069646598.
[06:09:54][13372][ERROR] Error allocating modulated time series device memory: 50332672 bytes (error: 2)
[06:09:54][13372][ERROR] Demodulation failed (error: 1006)!
[06:09:54][13372][WARN ] CUDA memory allocation problem encountered!
------> Returning control to BOINC, delaying restart for at least five minutes...
------> If this problem persists you should consider aborting this task.
[06:09:54][13377][INFO ] Application startup - thank you for supporting Einstein@Home!
[06:09:54][13377][INFO ] Starting data processing...
[06:09:54][13377][INFO ] CUDA global memory status (initial GPU state, including context):
------> Used in total: 332 MB (180 MB free / 512 MB total) -> Used by this application (assuming a single GPU task): 0 MB
[06:09:54][13377][INFO ] Using CUDA device #0 "GeForce 9500 GT" (32 CUDA cores / 129.60 GFLOPS)
[06:09:54][13377][INFO ] Version of installed CUDA driver: 3020
[06:09:54][13377][INFO ] Version of CUDA driver API used: 3020
[06:09:54][13377][INFO ] Checkpoint file unavailable: status.cpt (No such file or directory).

It says that there are only 180MB of your 512MB free. If I recall the Einstein requirements correctly this is not enough. The only way I see to make it work to reduce your memory consumption.

Michael

Wurgl (speak^Wcrunching for Special: Off-Topic)
Wurgl (speak^Wc...
Joined: 11 Feb 05
Posts: 321
Credit: 140550008
RAC: 0

RE: It says that there are

Quote:


It says that there are only 180MB of your 512MB free. If I recall the Einstein requirements correctly this is not enough. The only way I see to make it work to reduce your memory consumption.

Michael

$ nvidia-smi -a

==============NVSMI LOG==============

Timestamp : Fri Mar 25 22:30:34 2011

Driver Version : 260.19.44

GPU 0:
Product Name : GeForce 9500 GT
PCI Device/Vendor ID : 64010de
PCI Location ID : 0:4:0
Board Serial : 1266914972570
Display : Connected
Temperature : 43 C
Fan Speed : 50%
Utilization
GPU : 0%
Memory : 6%
Power State : PSTATE 0
Power Capping : Disabled

If 6% of 512 MB would be 332 MB, then I would totally agree.

What could be is a continuos block of only 180 MB. Fragmented Memory ...

But whenever I tried this 'nvidia-smi -a' I have never seen any memory usage higher that 16%. And 16% ist still just 82MB Memory usage with 430 MB free.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.