Observations on FGRBP1 1.17 for Windows

archae86
archae86
Joined: 6 Dec 05
Posts: 3157
Credit: 7225828263
RAC: 1055620

eeqmc2_52 wrote:I started

eeqmc2_52 wrote:
I started running FGRPopencl-beta-nvidia 1.17 WUs recently and am getting 90% ERRORs.  Is this typical of Beta WUs?  

No it is not.  I suspect your system changed to a non-usable state because of some transient error.  If you have not already, I'd try a reboot, and if that does not help a full power-down with some dwell time on the disconnected from power state.

These lines in a typical one of your many stderr files may be a clue:

boinc_get_opencl_ids returned [0000000000000000 , 0000000000000000]
Failed to get OpenCL platform/device info from BOINC (error: -1)!
initialize_ocl(): Got no suitable OpenCL device information from BOINC - boincPlatformId is NULL - boincDeviceId is NULL
initialize_ocl returned error [2004]
Richie
Richie
Joined: 7 Mar 14
Posts: 656
Credit: 1702989778
RAC: 0

Depending on what kind of

Depending on what kind of history of GPU driver installations and updates that computer might have gone through, I would possibly start also by completely reinstalling Nvidia driver. Hints for that procedure:

1. Download Display Driver Uninstaller (DDU): http://www.wagnardsoft.com/?q=node/134

2. Extract DDU's installer package and run the program file as administrator.

3. Let DDU restart Windows into safe mode, let it do the cleaning and reboot.

4. Let Windows restart back into normal mode and install Nvidia driver: Download driver manually from Nvidia website and choose "clean install" on the Nvidia installer options. Let it reboot Windows in the end or do that manually.

eeqmc2_52
eeqmc2_52
Joined: 10 May 05
Posts: 38
Credit: 3688530183
RAC: 913630

I was able to resolve my 90%

IGNORE the message below:

I ran all night with zero failed WUs, but as soon as I started using my computer this morning, they returned in mass. Reverting back to Nvidia 376.09 HAS NOT RESOLVED MY ISSUES.

I was able to resolve my 90% ERROR problem on FGRPopencl-beta-nvidia WUs by reverting my video driver off 376.33 back to 376.09.  Nvidia 376.33 doesn't seem to like Einstein@Home GPU CUDA WUs.

There are only 10 kind of people in the world, those that understand binary and those that don't!

archae86
archae86
Joined: 6 Dec 05
Posts: 3157
Credit: 7225828263
RAC: 1055620

Roughly 5 hours ago new 1.17

Roughly 5 hours ago new 1.17 work started being issued without the beta Classification. I suspect this means that Windows work can be paired with Windows work, instead of requiring a trusted Linux quorum partner. This may cut down somewhat the rate of invalid result determinations caused by slight numeric differences arising from slight calculation order differences.

I have not had any of those yet on 1.17, but I have reached 11 on 1.16. That is just a few tenths of a percent, but it is far more than I am use to on previous Einstein applications running on the same hardware.

I doubt very much that 1.17 differs from 1.16 in this respect. My lack of 1.17 invalid results so far is probably just a pipeline delay effect as the final declaration of such a result requires returns from at least two different quorum partners in series.

archae86
archae86
Joined: 6 Dec 05
Posts: 3157
Credit: 7225828263
RAC: 1055620

This morning my host with the

This morning my host with the shortest task queue suddenly started taking much longer to finish 1.17 tasks.  As I've had some downclocking trouble lately, I assumed the GTX 1050 had downclocked, but on review temperature and monitoring agree it is running about normally.  

Perhaps the longer WUs which were foreshadowed by Bernd have begun to be distributed?

The long string of work which finished in just over 24 minutes each had task IDs starting with LATeah2003L.

The new work which is taking longer and has much more WU to WU variation in time has task IDs starting with LATeah0010L.

Reviewing pending work queues for my other hosts, I see that LATeah0010L work is arriving at them with much longer estimated completion times than the considerable remaining 2003 work.  So that suggests this extra time was expected.  Some of the batches are just about at 5X longer predicted elapsed time.

 

Mumak
Joined: 26 Feb 13
Posts: 325
Credit: 3525037867
RAC: 1526504

archae86Perhaps the longer

archae86 wrote:
Perhaps the longer WUs which were foreshadowed by Bernd have begun to be distributed?

Yes: https://einsteinathome.org/goto/comment/153199

 

-----

Mad_Max
Mad_Max
Joined: 2 Jan 10
Posts: 154
Credit: 2214571377
RAC: 421551

Yes, i also see new batch of

Yes, i also see new batch of much longer WUs from LATeah0010L series. And it is expected - they marked as 525 000 GFLOP estimated computation size vs 105 000 GFLOP on initial beta batch.

So exactly 5X longer

archae86
archae86
Joined: 6 Dec 05
Posts: 3157
Credit: 7225828263
RAC: 1055620

Mad_Max wrote:Yes, i also see

Mad_Max wrote:

Yes, i also see new batch of much longer WUs from LATeah0010L series. And it is expected - they marked as 525 000 GFLOP estimated computation size vs 105 000 GFLOP on initial beta batch.

So exactly 5X longer

Well, the estimate is exactly 5X longer.  I've seen some variability among the units on a given machine, and definitely seen units that took a good bit less than 5X.  What I've not seen variability in so far is the predicted GFLOP content, nor the credit award.

So far all my credit awards on these units are exactly 700.  While it is possible that 693 for the previous day's work was somewhat generous, I think 700 is very, very skimpy on average--based on what I've seen so far.

Perhaps on review of the data the staff will adjust both the GFLOP estimates and the credit, perhaps including more unit-to-unit variability.

Betreger
Betreger
Joined: 25 Feb 05
Posts: 992
Credit: 1591862356
RAC: 770466

IMO if the new work is 5X

IMO if the new work is 5X longer the credits should be 5X

Logforme
Logforme
Joined: 13 Aug 10
Posts: 332
Credit: 1714373961
RAC: 0

archae86 wrote:Perhaps on

archae86 wrote:
Perhaps on review of the data the staff will adjust both the GFLOP estimates and the credit, perhaps including more unit-to-unit variability.

Herr Beer stated just that in the technical news section: https://einsteinathome.org/content/gamma-ray-pulsar-binary-search-1-gpus?page=17#comment-153199

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.