Pascal again available, Turing may be coming soon

archae86
archae86
Joined: 6 Dec 05
Posts: 3,116
Credit: 6,402,108,642
RAC: 3,152,348

Short story: I updated the

Short story: I updated the card BIOS, no help.

Longer story.  I've never updated a graphics card BIOS.  Gigabyte's site had various conflicting references as to method, all of which indicated the need for an overhead application.  But the download contents had an executable instead of an obvious BIOS image, so I shut down most applications, held my breath, and double-clicked that executable.  A little window popped up, which asked if I really wanted to do this, then gave moderately reassuring progress indications.  I think the very last text had a comforting message that the deed had been done, but it vanished too quickly to get a confident read.

Anyway, I'm pretty sure it updated the BIOS, and my test case still terminates in a little over 20 elapsed seconds, with the same old primary error indication in stderr.txt.

 

bluestang
bluestang
Joined: 13 Apr 15
Posts: 30
Credit: 2,396,784,529
RAC: 294,744

Sorry to hear.  Hope it gets

Sorry to hear.  Hope it gets straightened out for you through RMA.

Whenit was working what kind of power draw did the GPU have when running E@H? How many concurrent tasks for that wattage?  Tx!

archae86
archae86
Joined: 6 Dec 05
Posts: 3,116
Credit: 6,402,108,642
RAC: 3,152,348

bluestang wrote:Whenit was

bluestang wrote:
Whenit was working what kind of power draw did the GPU have when running E@H

With the 2080 installed, but system idling, total power draw from the wall for my PC is 80 watts.  I don't know how much of that is idle power to the GPU, but think it is probably at least 20 watts.

Running 1X on Einstein Gamma-Ray-Pulsar at stock clocks the system draw is 236 watts, so an additional 156 over idle.  Most of that would be on the GPU card, but some would be in other system components drawing extra power to support the GPU, and a little would be in the system power supply for operating at less than 100% efficiency.

The highest system operating power condition I observed was running Einstein at 2X, with the clocks raised the highest that I could get them to work properly for at least a few hours.  258 watts system power, so 178 added over base idle.

Almost as high was for running 4X at stock clocks, 253 system power, so 173 added.  I believe I could push it to somewhat higher than 258 total by running 4X at overclock, probably up near my max 2X stable condition, but I did not explore that one.  

For this card running this application, the gain in productivity even for running 2x is rather small, and the further gains going to yet higher multiplicity are really tiny.  Given that the current Windows application wants a whole CPU instance in support, and that having several of them running noticeably degrades interactive performance on the PC, and that this is my personal primary use PC, I've actually decided to run the 2080 + 1060 configuration at 1X.  Right now both cards are running at 3/4 of the long-term overclocks I previously found for them, at 1X, and the box is burning about 338 watts.

 

 

bluestang
bluestang
Joined: 13 Apr 15
Posts: 30
Credit: 2,396,784,529
RAC: 294,744

Ok, thanks for the great

Ok, thanks for the great info, really ppreciate it.

Juha
Juha
Joined: 27 Nov 14
Posts: 49
Credit: 4,952,246
RAC: 39

archae86 wrote:Separately, it

archae86 wrote:

Separately, it occurred to me a day or two ago that the walled garden directory seems to be something that I can burn onto a CD and submit as a test case for an RMA or for a submission on Driver error.  

I've reviewed Newegg's rules, and currently plan to initiate an RMA on Monday, October 15, assuming I've not come across a solution by then.  This won't help unless my problem is a defective sample of the card, but that is an open possibility.

You know. Now that you have set up a way to test the app without BOINC you could reach out to other 2080 owners and ask them to run the test. At least Seti and GPUGRID seem to have people talking about 2080's.

I think it's more likely to be a driver problem and if other people can confirm that the test doesn't run on their cards either then that would save you from RMA hassle.

archae86
archae86
Joined: 6 Dec 05
Posts: 3,116
Credit: 6,402,108,642
RAC: 3,152,348

I did a burn CD with what I

I did a burn CD with what I thought to be a portable test directory on it, but it is not portable enough to work properly on another of my own machines which also runs Einstein using Pascal cards.  My test case fails fast, and not in the way it fails fast on my Turing card.  Here is the end of stderr.txt.  Maybe someone can guess where I have erred.

19:21:08 (5828): [debug]: Set up communication with graphics process.
Running in standalone mode, so we take care of OpenCL platform/device management...
set_opencl_params(): ocl_config is NULL
Using OpenCL platform provided by: Intel(R) Corporation
Couldn't retrieve list of OpenCL GPU devices (error: -1)!
initialize_ocl returned error [2004]
OCL context null
OCL queue null
Error generating generic FFT context object [5]
19:21:09 (5828): [CRITICAL]: ERROR: MAIN() returned with error '5'
FPU status flags:
19:21:19 (5828): [normal]: done. calling boinc_finish(69).
19:21:19 (5828): called boinc_finish
Juha
Juha
Joined: 27 Nov 14
Posts: 49
Credit: 4,952,246
RAC: 39

Could you compare

Could you compare init_data.xml of running tasks between the machines?

According to the host list none of your machines have OpenCL drivers for Intel GPU installed. Is that correct? Have they had the driver installed previously?

archae86
archae86
Joined: 6 Dec 05
Posts: 3,116
Credit: 6,402,108,642
RAC: 3,152,348

Juha wrote:Could you compare

Juha wrote:
Could you compare init_data.xml of running tasks between the machines?

Lots and lots of differences, most of which probably don't matter (say, for example, differences in minor preferences arising from the fact that I have the two machines in different locations (venues).  But I suspect my portability problem dwells in there, somewhere.  See first thought below.

Quote:
According to the host list none of your machines have OpenCL drivers for Intel GPU installed. Is that correct? Have they had the driver installed previously?

All I have done, now in the past, is to install Drivers directly from the Nvidia web site.  However the Nvidia driver on the non-2080 machine one which I did the portability test has not been updated in some months, while the 2080 machine I've updated four times since the 2080 arrived.  Quite possibly the driver difference gives rise to this difference in a possibly important init_data.xml portion as I grabbed it today from the two machines.

 

2080 machine as it is right now (416.34 most current driver available)

<opencl_platform_version>OpenCL 1.2 CUDA 10.0.132</opencl_platform_version>
 <opencl_device_version>OpenCL 1.2 CUDA</opencl_device_version>
 <opencl_driver_version>416.34</opencl_driver_version>

Compatibility test machine as it is right now (388.13 driver, from a while back)

<opencl_platform_version>OpenCL 1.2 CUDA 9.1.75</opencl_platform_version>
 <opencl_device_version>OpenCL 1.2 CUDA</opencl_device_version>
 <opencl_driver_version>388.13</opencl_driver_version>

I think right now the first move is for me to update the compatibility test machine to be on the same driver version as my 2080 machine, and check again for compatibility.  I also have some suspect lines regarding slot number and device number, but I'll go after the driver difference first.

I failed until this moment to notice that your OpenCL question related to the Intel GPU (so my Nvidia driver comments above are not responsive).  I made some efforts to use Intel GPUs for Einstein computation perhaps a couple of years ago and gave it up as a bad proposition.  Low performance, poor compatibility with other tasks running on the system, questionable resource consumption reporting (I'm giving impressions from memory).  So I've not made any attempt to keep my Intel GPUs on drivers that might be able to run Einstein.

 

mmonnin
mmonnin
Joined: 29 May 16
Posts: 289
Credit: 3,047,521,292
RAC: 97,912

I noticed this morning that

I noticed this morning that my queue has a higher # of tasks due to tasks completing a couple minutes quicker. Looks like for about the past day tasks have been quicker. Is it erroring out on the 2080 again? I see a LATeah0104R.dat was modified in the project directory at 11PM on the 16th which would have been around my 0.5 day queue time before the faster tasks started.

archae86
archae86
Joined: 6 Dec 05
Posts: 3,116
Credit: 6,402,108,642
RAC: 3,152,348

Yes, the new 104R tasks are,

Yes, the new 104R tasks are, so far as I can tell, just like the previous N, P, and Q tasks.  On non 2080 cards they take substantially less elapsed time (hence I call them "high-pay").  They have somewhat different computational characteristics, and they fail on my 2080.  As I postponed the RMA decision thinking it unlikely my problem is actually a defective card, I no longer have that option.  So today I need to work more on the portability problem of my test case, then remove the 2080 card from service pending the return of low-pay work units.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.