Observations on FGRBP1 1.17 for Windows

archae86

Joined: 6 Dec 05

Posts: 3157

Credit: 7221444931

RAC: 979360

eeqmc2_52 wrote:I started

20 Dec 2016 0:26:30 UTC

Message 153121 in response to message 153120

(moderation:

)

eeqmc2_52 wrote:

I started running FGRPopencl-beta-nvidia 1.17 WUs recently and am getting 90% ERRORs. Is this typical of Beta WUs?

No it is not. I suspect your system changed to a non-usable state because of some transient error. If you have not already, I'd try a reboot, and if that does not help a full power-down with some dwell time on the disconnected from power state.

These lines in a typical one of your many stderr files may be a clue:

boinc_get_opencl_ids returned [0000000000000000 , 0000000000000000]
Failed to get OpenCL platform/device info from BOINC (error: -1)!
initialize_ocl(): Got no suitable OpenCL device information from BOINC - boincPlatformId is NULL - boincDeviceId is NULL
initialize_ocl returned error [2004]

Richie

Joined: 7 Mar 14

Posts: 656

Credit: 1702989778

RAC: 0

Depending on what kind of

20 Dec 2016 0:51:26 UTC

Message 153124

(moderation:

)

Depending on what kind of history of GPU driver installations and updates that computer might have gone through, I would possibly start also by completely reinstalling Nvidia driver. Hints for that procedure:

1. Download Display Driver Uninstaller (DDU): http://www.wagnardsoft.com/?q=node/134

2. Extract DDU's installer package and run the program file as administrator.

3. Let DDU restart Windows into safe mode, let it do the cleaning and reboot.

4. Let Windows restart back into normal mode and install Nvidia driver: Download driver manually from Nvidia website and choose "clean install" on the Nvidia installer options. Let it reboot Windows in the end or do that manually.

eeqmc2_52

Joined: 10 May 05

Posts: 38

Credit: 3685656852

RAC: 967898

I was able to resolve my 90%

20 Dec 2016 13:38:15 UTC

Message 153148

(moderation:

)

IGNORE the message below:

I ran all night with zero failed WUs, but as soon as I started using my computer this morning, they returned in mass. Reverting back to Nvidia 376.09 HAS NOT RESOLVED MY ISSUES.

I was able to resolve my 90% ERROR problem on FGRPopencl-beta-nvidia WUs by reverting my video driver off 376.33 back to 376.09. Nvidia 376.33 doesn't seem to like Einstein@Home GPU CUDA WUs.

There are only 10 kind of people in the world, those that understand binary and those that don't!

archae86

Joined: 6 Dec 05

Posts: 3157

Credit: 7221444931

RAC: 979360

Roughly 5 hours ago new 1.17

20 Dec 2016 13:59:07 UTC

Message 153149

(moderation:

)

Roughly 5 hours ago new 1.17 work started being issued without the beta Classification. I suspect this means that Windows work can be paired with Windows work, instead of requiring a trusted Linux quorum partner. This may cut down somewhat the rate of invalid result determinations caused by slight numeric differences arising from slight calculation order differences.

I have not had any of those yet on 1.17, but I have reached 11 on 1.16. That is just a few tenths of a percent, but it is far more than I am use to on previous Einstein applications running on the same hardware.

I doubt very much that 1.17 differs from 1.16 in this respect. My lack of 1.17 invalid results so far is probably just a pipeline delay effect as the final declaration of such a result requires returns from at least two different quorum partners in series.

archae86

Joined: 6 Dec 05

Posts: 3157

Credit: 7221444931

RAC: 979360

This morning my host with the

21 Dec 2016 15:33:34 UTC

Message 153209 in response to message 153149

(moderation:

)

This morning my host with the shortest task queue suddenly started taking much longer to finish 1.17 tasks. As I've had some downclocking trouble lately, I assumed the GTX 1050 had downclocked, but on review temperature and monitoring agree it is running about normally.

Perhaps the longer WUs which were foreshadowed by Bernd have begun to be distributed?

The long string of work which finished in just over 24 minutes each had task IDs starting with LATeah2003L.

The new work which is taking longer and has much more WU to WU variation in time has task IDs starting with LATeah0010L.

Reviewing pending work queues for my other hosts, I see that LATeah0010L work is arriving at them with much longer estimated completion times than the considerable remaining 2003 work. So that suggests this extra time was expected. Some of the batches are just about at 5X longer predicted elapsed time.

Mumak

Joined: 26 Feb 13

Posts: 325

Credit: 3520115026

RAC: 1607026

archae86Perhaps the longer

21 Dec 2016 16:19:55 UTC

Message 153211 in response to message 153209

(moderation:

)

archae86 wrote:

Perhaps the longer WUs which were foreshadowed by Bernd have begun to be distributed?

Yes: https://einsteinathome.org/goto/comment/153199

-----

Mad_Max

Joined: 2 Jan 10

Posts: 154

Credit: 2212334738

RAC: 338684

Yes, i also see new batch of

21 Dec 2016 16:23:17 UTC

Message 153212

(moderation:

)

Yes, i also see new batch of much longer WUs from LATeah0010L series. And it is expected - they marked as 525 000 GFLOP estimated computation size vs 105 000 GFLOP on initial beta batch.

So exactly 5X longer

archae86

Joined: 6 Dec 05

Posts: 3157

Credit: 7221444931

RAC: 979360

Mad_Max wrote:Yes, i also see

21 Dec 2016 20:34:16 UTC

Message 153221 in response to message 153212

(moderation:

)

Mad_Max wrote:

Yes, i also see new batch of much longer WUs from LATeah0010L series. And it is expected - they marked as 525 000 GFLOP estimated computation size vs 105 000 GFLOP on initial beta batch.

So exactly 5X longer

Well, the estimate is exactly 5X longer. I've seen some variability among the units on a given machine, and definitely seen units that took a good bit less than 5X. What I've not seen variability in so far is the predicted GFLOP content, nor the credit award.

So far all my credit awards on these units are exactly 700. While it is possible that 693 for the previous day's work was somewhat generous, I think 700 is very, very skimpy on average--based on what I've seen so far.

Perhaps on review of the data the staff will adjust both the GFLOP estimates and the credit, perhaps including more unit-to-unit variability.

Betreger

Joined: 25 Feb 05

Posts: 992

Credit: 1588935728

RAC: 759606

IMO if the new work is 5X

21 Dec 2016 20:55:03 UTC

Message 153223 in response to message 153221

(moderation:

)

IMO if the new work is 5X longer the credits should be 5X

Logforme

Joined: 13 Aug 10

Posts: 332

Credit: 1714373961

RAC: 0

archae86 wrote:Perhaps on

21 Dec 2016 20:55:33 UTC

Message 153224 in response to message 153221

(moderation:

)

archae86 wrote:

Perhaps on review of the data the staff will adjust both the GFLOP estimates and the credit, perhaps including more unit-to-unit variability.

Herr Beer stated just that in the technical news section: https://einsteinathome.org/content/gamma-ray-pulsar-binary-search-1-gpus?page=17#comment-153199

Observations on FGRBP1 1.17 for Windows

Forums › Cruncher's Corner

Comment viewing options

Forums › Cruncher's Corner