New BRP CUDA Apps 1.07 / 1.08

Bikeman (Heinz-Bernd Eggenstein)
Bikeman (Heinz-...
Moderator
Joined: 28 Aug 06
Posts: 3522
Credit: 686138627
RAC: 563921

RE: That's bad. Anyone

Quote:

That's bad. Anyone knows what's wrong with BOINC there, or how the project can work around that?

BM

Hi!

Given the output from BOINC manager, while it can't determine the driver version, it does seem to detect that the driver supports Cuda 4.0 (coded as 4000).

270.* seems to coincide with CUDA 4.0 , so it should be safe to use this as an indication that a driver is used that fixes the pre-270 version bug.

Right?

CU
HB

Jord
Joined: 26 Jan 05
Posts: 2952
Credit: 5779100
RAC: 0

I don't know what the X

I don't know what the X config options do, other than that they're configuration options for the x-server. Someone with Linux better check the man pages on that. Or check for example this one for Ubuntu.

I have furthermore found that there was an earlier 270.x beta driver, which included the following:

Quote:
* New upstream beta release. Changes:
- Updated the NVIDIA kernel module to ensure that all system memory allocated by it for use with GPUs or within user-space components of the NVIDIA driver stack is initialized to zero. A new NVIDIA kernel module option, InitializeSystemMemoryAllocations, allows administrators to revert to the previous behavior.
- Added preliminary support for xserver 1.10.
- Reorganized the NVIDIA driver's /proc file system layout to better reflect current needs: /proc/driver/nvidia/cards/0..N has been moved to /proc/driver/nvidia/gpus/0..N/information
- Added new shared library: libnvidia-ml.so.
NVML provides programmatic access to static information and monitoring data for NVIDIA GPUs, as well as limited management capabilities. It is intended for use with Tesla compute products.
- Added a new X configuration option "3DVisionDisplayType" to specify the display type when NVIDIA 3D Vision is enabled with a non 3D Vision ready display.
- Fixed several bugs relating to hardware-accelerated gradients, which were causing visual corruption in some of the default Ubuntu GNOME themes.
- Modified colormap updates to no longer be synchronized to vblank. This allows applications to send XStoreColor and XStoreColors requests faster than the screen's refresh rate.
* Install libnvidia-ml.so links.
* Drop X ABI provides because it works with multiple video ABI's.
* Add libxv1 to build deps.1

It's probably that these are beta drivers that there's all these problems. Just as with beta BOINC, you just know there's something broken in them. ;-)

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4267
Credit: 244930768
RAC: 16526

RE: Given the output from

Quote:

Given the output from BOINC manager, while it can't determine the driver version, it does seem to detect that the driver supports Cuda 4.0 (coded as 4000).

270.* seems to coincide with CUDA 4.0 , so it should be safe to use this as an indication that a driver is used that fixes the pre-270 version bug.

Indeed this looks like a valid workaround for now. I'll look into this tomorrow.

BM

BM

Ver Greeneyes
Ver Greeneyes
Joined: 26 Mar 09
Posts: 140
Credit: 9562235
RAC: 0

With the new apps out this

With the new apps out this shouldn't come up anymore, but what will happen to WUs like this one? The CPU that was marked invalid was mine, and I think it should probably be valid (if only because it never otherwise gets invalid results). I don't care about the credits, but is the WU a valid scientific result?

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4267
Credit: 244930768
RAC: 16526

The validation problems arise

The validation problems arise from a couple of numerical problems that were in the old CUDA Apps. These errors affected the technical validation process, but Benjamin confirmed that these don't affect the scientific validity.

BM

BM

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4267
Credit: 244930768
RAC: 16526

RE: RE: Given the output

Quote:
Quote:

Given the output from BOINC manager, while it can't determine the driver version, it does seem to detect that the driver supports Cuda 4.0 (coded as 4000).

270.* seems to coincide with CUDA 4.0 , so it should be safe to use this as an indication that a driver is used that fixes the pre-270 version bug.

Indeed this looks like a valid workaround for now. I'll look into this tomorrow.

BM

Done.

Unfortunately I didn't see any such requests in the scheduler log since the change, so I'm relying on your feedback here on whether it's working now.

BM

BM

Bikeman (Heinz-Bernd Eggenstein)
Bikeman (Heinz-...
Moderator
Joined: 28 Aug 06
Posts: 3522
Credit: 686138627
RAC: 563921

Hi! Seems to work, I got

Hi!

Seems to work, I got the nv270 1.08 app on a 270 driver equipped Linux box.

Unfortunately all my Linux boxes have this driver now, so somebody else needs to confirm that those Linux boxes with the old driver (< 270.x) are getting the 1.08 fullCPU app.

[send] [HOST#1538267] Sending app_version einsteinbinary_BRP3 1 108 BRP3cuda32nv270; 22.61 GFLOPS
Ver Greeneyes
Ver Greeneyes
Joined: 26 Mar 09
Posts: 140
Credit: 9562235
RAC: 0

RE: The validation problems

Quote:
The validation problems arise from a couple of numerical problems that were in the old CUDA Apps. These errors affected the technical validation process, but Benjamin confirmed that these don't affect the scientific validity.


Great, that puts my mind at ease :)

Saenger
Saenger
Joined: 15 Feb 05
Posts: 403
Credit: 33009522
RAC: 0

Today at 14:30 CET I got this

Today at 14:30 CET I got this messages:

[pre]Mi 23 Feb 2011 14:30:56 CET Einstein@Home [sched_op_debug] Starting scheduler request
Mi 23 Feb 2011 14:30:56 CET Einstein@Home Sending scheduler request: To fetch work.
Mi 23 Feb 2011 14:30:56 CET Einstein@Home Reporting 1 completed tasks, requesting new tasks for GPU
Mi 23 Feb 2011 14:30:56 CET Einstein@Home [sched_op_debug] CPU work request: 0.00 seconds; 0 idle CPUs
Mi 23 Feb 2011 14:30:56 CET Einstein@Home [sched_op_debug] NVIDIA GPU work request: 101.91 seconds; 0 idle GPUs
Mi 23 Feb 2011 14:31:01 CET Einstein@Home Scheduler request completed: got 1 new tasks
Mi 23 Feb 2011 14:31:01 CET Einstein@Home [sched_op_debug] Server version 611
Mi 23 Feb 2011 14:31:01 CET Einstein@Home Project requested delay of 60 seconds
Mi 23 Feb 2011 14:31:01 CET Einstein@Home [sched_op_debug] estimated total CPU job duration: 0 seconds
Mi 23 Feb 2011 14:31:01 CET Einstein@Home [sched_op_debug] estimated total NVIDIA CPU job duration: 5283 seconds
Mi 23 Feb 2011 14:31:01 CET Einstein@Home [sched_op_debug] handle_scheduler_reply(): got ack for result PM0047_03311.dm_316_0
Mi 23 Feb 2011 14:31:01 CET Einstein@Home [sched_op_debug] Deferring communication for 1 min 0 sec
Mi 23 Feb 2011 14:31:01 CET Einstein@Home [sched_op_debug] Reason: requested by project
Mi 23 Feb 2011 14:31:03 CET Einstein@Home Started download of einsteinbinary_BRP3_1.08_i686-pc-linux-gnu__BRP3cuda32nv270
Mi 23 Feb 2011 14:31:03 CET Einstein@Home Started download of PM0047_03521_56.binary
Mi 23 Feb 2011 14:31:03 CET Einstein@Home Started download of PM0047_03521_57.binary
Mi 23 Feb 2011 14:31:16 CET Einstein@Home Finished download of PM0047_03521_57.binary
Mi 23 Feb 2011 14:31:16 CET Einstein@Home Started download of PM0047_03521_58.binary
Mi 23 Feb 2011 14:31:24 CET Einstein@Home Finished download of einsteinbinary_BRP3_1.08_i686-pc-linux-gnu__BRP3cuda32nv270
Mi 23 Feb 2011 14:31:24 CET Einstein@Home Started download of PM0047_03521_59.binary
Mi 23 Feb 2011 14:31:31 CET Einstein@Home Finished download of PM0047_03521_58.binary
Mi 23 Feb 2011 14:31:32 CET Einstein@Home Finished download of PM0047_03521_56.binary
Mi 23 Feb 2011 14:31:36 CET Einstein@Home Finished download of PM0047_03521_59.binary
[/pre]

It's now at 58% and got this messages since start:

[pre]Mi 23 Feb 2011 16:53:56 CET Einstein@Home Computation for task PM0047_03471.dm_400_0 finished
Mi 23 Feb 2011 16:53:56 CET Einstein@Home Starting PM0047_03521.dm_56_1
Mi 23 Feb 2011 16:54:59 CET Einstein@Home [checkpoint_debug] result PM0047_03521.dm_56_1 checkpointed
Mi 23 Feb 2011 16:56:00 CET Einstein@Home [checkpoint_debug] result PM0047_03521.dm_56_1 checkpointed
snip
Mi 23 Feb 2011 17:44:20 CET Einstein@Home [checkpoint_debug] result PM0047_03521.dm_56_1 checkpointed
Mi 23 Feb 2011 17:45:22 CET Einstein@Home [checkpoint_debug] result PM0047_03521.dm_56_1 checkpointed
snip
Mi 23 Feb 2011 17:54:35 CET Einstein@Home [checkpoint_debug] result PM0047_03521.dm_56_1 checkpointed
Mi 23 Feb 2011 17:55:36 CET Einstein@Home [checkpoint_debug] result PM0047_03521.dm_56_1 checkpointed
[/pre]

Seems fine so far.

Edith says:
It's finished fine as well: 220809559. Needed more clock time, but less CPU, and 4 WUs ran in parallel. It took ~16% of one core + something for Xorg.

Grüße vom Sänger

Pete Burgess
Pete Burgess
Joined: 7 Dec 05
Posts: 21
Credit: 318570870
RAC: 0

RE: Seems to work, I got

Quote:

Seems to work, I got the nv270 1.08 app on a 270 driver equipped Linux box.

Unfortunately all my Linux boxes have this driver now, so somebody else needs to confirm that those Linux boxes with the old driver (< 270.x) are getting the 1.08 fullCPU app.

Still on the 260 driver for the moment here and can confirm getting the 1.08 fullCPU app

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.