Pascal again available, Turing may be coming soon

archae86
archae86
Joined: 6 Dec 05
Posts: 3,066
Credit: 5,904,999,322
RAC: 3,314,941

Richie wrote:Nvidia driver

Richie wrote:
Nvidia driver 416.16 is available

Thanks.  I had not spotted that.  I've downloaded it, and will try installing it and make another high-pay WU attempt after I get back from the dentist a few hours from now.

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 2,059
Credit: 1,010,445,427
RAC: 1,326,123

BOINC v7.14.0 was also

BOINC v7.14.0 was also released for alpha testing today (Windows and Mac only, so far). Would anybody mind testing the GFlops peak report at startup, please?

Richie
Richie
Joined: 7 Mar 14
Posts: 651
Credit: 1,702,976,395
RAC: 485

4.10.2018 20:36:22 | | CUDA:

4.10.2018 20:36:20 | | Starting BOINC client version 7.14.0 for windows_x86_64

...

4.10.2018 20:36:22 | | CUDA: NVIDIA GPU 0: GeForce GTX 1060 6GB (driver version 411.63, CUDA version 10.0, compute capability 6.1, 4096MB, 3044MB available, 4568 GFLOPS peak)
4.10.2018 20:36:22 | | OpenCL: NVIDIA GPU 0: GeForce GTX 1060 6GB (driver version 411.63, device version OpenCL 1.2 CUDA, 6144MB, 3044MB available, 4568 GFLOPS peak)

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 2,059
Credit: 1,010,446,427
RAC: 1,326,220

Thanks for the reply, but I

Thanks for the reply, but I was thinking specifically of the RTX 2xxx range (and matching Turing Teslas). Previous versions of BOINC will have overstated the speed by a factor of x2, but this new one should correct that.

archae86
archae86
Joined: 6 Dec 05
Posts: 3,066
Credit: 5,904,999,322
RAC: 3,314,941

My log has: 10/4/2018

My log has:

10/4/2018 1:26:54 PM | | CUDA: NVIDIA GPU 0: GeForce RTX 2080 (driver version 411.70, CUDA version 10.0, compute capability 7.5, 4096MB, 3553MB available, 10510 GFLOPS peak)
10/4/2018 1:26:54 PM | | OpenCL: NVIDIA GPU 0: GeForce RTX 2080 (driver version 411.70, device version OpenCL 1.2 CUDA, 8192MB, 3553MB available, 10510 GFLOPS peak)

My notes from my initial installation log that the same location in the startup sequence reported 21020 then, exactly double the current value of 10510.  Mission accomplished. Thanks.

Now if you could just get my card, Einstein code, and the Nvidia driver to cooperate so the card could run the high-pay flavor of FGRP work units, I'd be a happy camper. Unfortunately, I don't think BOINC rates on my list of plausible suspects very highly at all.

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 2,059
Credit: 1,010,446,427
RAC: 1,326,220

And thanks for the report. I

And thanks for the report. I can go out to the pub and sleep easy!

archae86
archae86
Joined: 6 Dec 05
Posts: 3,066
Credit: 5,904,999,322
RAC: 3,314,941

Driver version 416.16 did not

Driver version 416.16 did not fix my high-pay WU failure.

The stderr lines I suppose to be most interesting read:

% Filling array of photon pairs
ERROR: /home/bema/fermilat/src/bridge_fft_clfft.c:948: clFinish failed. status=-36
ERROR: opencl_ts_2_phase_diff_sorted() returned with error 661281056
13:59:39 (5980): [CRITICAL]: ERROR: MAIN() returned with error '-36'
FPU status flags:  PRECISION
13:59:51 (5980): [normal]: done. calling boinc_finish(28).
Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5,551
Credit: 78,922,744,609
RAC: 64,789,980

archae86 wrote:Driver version

archae86 wrote:
Driver version 416.16 did not fix my high-pay WU failure.

I'm sorry that nothing seems to be working for you.

You could try sending Bernd a PM to see if he has any thoughts.

The author of the GPU app is a volunteer, Christophe Choquet, who has an account here.  He also has been involved in a number of other projects.  He doesn't appear to be active in any at the moment.  It would also be worthwhile seeing if he would respond to a PM.  I reckon there might be a good chance that he may have ideas about what is causing this.

In any case, Bernd would know how to contact him if that were necessary so try PM'ing both :-).

 

Cheers,
Gary.

archae86
archae86
Joined: 6 Dec 05
Posts: 3,066
Credit: 5,904,999,322
RAC: 3,314,941

It seems unlikely there will

It seems unlikely there will be a near-term fix for my problem of my 2080 Turing card generating prompt failures on high-pay WUs.  My two primary solution avenues are: 1. using the RMA procedure to trade my sample of the card for another sample of the same model (this is hoping I have a defective sample--which seems unlikely) 2. testing each newly released NVidia driver, hoping that by happy accident some developer somewhere stumbled on a reportable bug, Nvidia acted on it, and that fixing it fixes my issue.

Oddly, this puts me on two sides of a fence regarding Einstein WU generation.  Continued delivery of work of the most recent flavor (such as work from file LATeah1026L) seems likely to have my card work, so I can leave it plugged in and continue to contribute to Einstein using this card.  But only work of the immediate previous flavor (such as work from file LATeah0104Q) has been seen to generate the problem.  Without such WUs in stock on my machine, I currently lack a means to test whether a new driver has fixed the problem.

I vaguely recall that years ago, very early in my Einstein participation, I dealt with a related problem by creating a sort of "walled garden".  The basic idea was to store on my computer an image of a sufficient portion of the Einstein environment to allow new processing of old work with newly released code versions.  Anyone remember the remarkable speedups obtained by code revisions by the person posting as akosf?  

I still have 22 of the "high-pay" units in cache, all received on September 30.  So for the moment I can keep them in stock by a somewhat tedious procedure of keeping them suspended save for brief periods when I download a gulp of fresh work.  I don't think their shelf life will expire for about another week.  But I'd like to free myself from my procedure, and suspect I may need a much longer shelf life.  So I need a refrigeration method.

Does anyone posting here know a method to accomplish the end I have in mind?  My vague idea is to stop BOINC, then do a full copy of the directory tree below c:\ProgramData\BOINC to somewhere else.  Then, I suppose, for each trial I'd need to copy the "refrigerated" copy to a third place just for that trial, and somehow start up BOINC pointing to that place instead of to ProgramData\BOINC.  Maybe instead of all of BOINC I'd need to start just the Einstein executable--with the same switches active?  I'l probably want to disable Internet access for my host during each such trial.

Any tips or advice on this would be warmly appreciated.

Keith Myers
Keith Myers
Joined: 11 Feb 11
Posts: 2,314
Credit: 5,942,733,624
RAC: 13,246,067

Did anyone ever develop

Did anyone ever develop offline benchmark tools for Einstein like we have for Seti?  It is simple enough to copy the workunits over to a separate directory to run the benchmark apps in.  We do so all the time for Seti apps.  We even have tools to get the WU from the download directory with a fanout generator tool to grab suspicious WU's that others report before they clear the database so we can run them offline with different apps.

Anything similar here?

 

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.