I started running FGRPopencl-beta-nvidia 1.17 WUs recently and am getting 90% ERRORs. Is this typical of Beta WUs?
No it is not. I suspect your system changed to a non-usable state because of some transient error. If you have not already, I'd try a reboot, and if that does not help a full power-down with some dwell time on the disconnected from power state.
These lines in a typical one of your many stderr files may be a clue:
boinc_get_opencl_ids returned [0000000000000000 , 0000000000000000]
Failed to get OpenCL platform/device info from BOINC (error: -1)!
initialize_ocl(): Got no suitable OpenCL device information from BOINC - boincPlatformId is NULL - boincDeviceId is NULL
initialize_ocl returned error [2004]
Depending on what kind of history of GPU driver installations and updates that computer might have gone through, I would possibly start also by completely reinstalling Nvidia driver. Hints for that procedure:
2. Extract DDU's installer package and run the program file as administrator.
3. Let DDU restart Windows into safe mode, let it do the cleaning and reboot.
4. Let Windows restart back into normal mode and install Nvidia driver: Download driver manually from Nvidia website and choose "clean install" on the Nvidia installer options. Let it reboot Windows in the end or do that manually.
I ran all night with zero failed WUs, but as soon as I started using my computer this morning, they returned in mass. Reverting back to Nvidia 376.09 HAS NOT RESOLVED MY ISSUES.
I was able to resolve my 90% ERROR problem on FGRPopencl-beta-nvidia WUs by reverting my video driver off 376.33 back to 376.09. Nvidia 376.33 doesn't seem to like Einstein@Home GPU CUDA WUs.
There are only 10 kind of people in the world, those that understand binary and those that don't!
Roughly 5 hours ago new 1.17 work started being issued without the beta Classification. I suspect this means that Windows work can be paired with Windows work, instead of requiring a trusted Linux quorum partner. This may cut down somewhat the rate of invalid result determinations caused by slight numeric differences arising from slight calculation order differences.
I have not had any of those yet on 1.17, but I have reached 11 on 1.16. That is just a few tenths of a percent, but it is far more than I am use to on previous Einstein applications running on the same hardware.
I doubt very much that 1.17 differs from 1.16 in this respect. My lack of 1.17 invalid results so far is probably just a pipeline delay effect as the final declaration of such a result requires returns from at least two different quorum partners in series.
This morning my host with the shortest task queue suddenly started taking much longer to finish 1.17 tasks. As I've had some downclocking trouble lately, I assumed the GTX 1050 had downclocked, but on review temperature and monitoring agree it is running about normally.
Perhaps the longer WUs which were foreshadowed by Bernd have begun to be distributed?
The long string of work which finished in just over 24 minutes each had task IDs starting with LATeah2003L.
The new work which is taking longer and has much more WU to WU variation in time has task IDs starting with LATeah0010L.
Reviewing pending work queues for my other hosts, I see that LATeah0010L work is arriving at them with much longer estimated completion times than the considerable remaining 2003 work. So that suggests this extra time was expected. Some of the batches are just about at 5X longer predicted elapsed time.
Yes, i also see new batch of much longer WUs from LATeah0010L series. And it is expected - they marked as 525 000 GFLOP estimated computation size vs 105 000 GFLOP on initial beta batch.
Yes, i also see new batch of much longer WUs from LATeah0010L series. And it is expected - they marked as 525 000 GFLOP estimated computation size vs 105 000 GFLOP on initial beta batch.
So exactly 5X longer
Well, the estimate is exactly 5X longer. I've seen some variability among the units on a given machine, and definitely seen units that took a good bit less than 5X. What I've not seen variability in so far is the predicted GFLOP content, nor the credit award.
So far all my credit awards on these units are exactly 700. While it is possible that 693 for the previous day's work was somewhat generous, I think 700 is very, very skimpy on average--based on what I've seen so far.
Perhaps on review of the data the staff will adjust both the GFLOP estimates and the credit, perhaps including more unit-to-unit variability.
eeqmc2_52 wrote:I started
)
No it is not. I suspect your system changed to a non-usable state because of some transient error. If you have not already, I'd try a reboot, and if that does not help a full power-down with some dwell time on the disconnected from power state.
These lines in a typical one of your many stderr files may be a clue:
Depending on what kind of
)
Depending on what kind of history of GPU driver installations and updates that computer might have gone through, I would possibly start also by completely reinstalling Nvidia driver. Hints for that procedure:
1. Download Display Driver Uninstaller (DDU): http://www.wagnardsoft.com/?q=node/134
2. Extract DDU's installer package and run the program file as administrator.
3. Let DDU restart Windows into safe mode, let it do the cleaning and reboot.
4. Let Windows restart back into normal mode and install Nvidia driver: Download driver manually from Nvidia website and choose "clean install" on the Nvidia installer options. Let it reboot Windows in the end or do that manually.
I was able to resolve my 90%
)
IGNORE the message below:
I ran all night with zero failed WUs, but as soon as I started using my computer this morning, they returned in mass. Reverting back to Nvidia 376.09 HAS NOT RESOLVED MY ISSUES.
I was able to resolve my 90% ERROR problem on FGRPopencl-beta-nvidia WUs by reverting my video driver off 376.33 back to 376.09. Nvidia 376.33 doesn't seem to like Einstein@Home GPU CUDA WUs.
There are only 10 kind of people in the world, those that understand binary and those that don't!
Roughly 5 hours ago new 1.17
)
Roughly 5 hours ago new 1.17 work started being issued without the beta Classification. I suspect this means that Windows work can be paired with Windows work, instead of requiring a trusted Linux quorum partner. This may cut down somewhat the rate of invalid result determinations caused by slight numeric differences arising from slight calculation order differences.
I have not had any of those yet on 1.17, but I have reached 11 on 1.16. That is just a few tenths of a percent, but it is far more than I am use to on previous Einstein applications running on the same hardware.
I doubt very much that 1.17 differs from 1.16 in this respect. My lack of 1.17 invalid results so far is probably just a pipeline delay effect as the final declaration of such a result requires returns from at least two different quorum partners in series.
This morning my host with the
)
This morning my host with the shortest task queue suddenly started taking much longer to finish 1.17 tasks. As I've had some downclocking trouble lately, I assumed the GTX 1050 had downclocked, but on review temperature and monitoring agree it is running about normally.
Perhaps the longer WUs which were foreshadowed by Bernd have begun to be distributed?
The long string of work which finished in just over 24 minutes each had task IDs starting with LATeah2003L.
The new work which is taking longer and has much more WU to WU variation in time has task IDs starting with LATeah0010L.
Reviewing pending work queues for my other hosts, I see that LATeah0010L work is arriving at them with much longer estimated completion times than the considerable remaining 2003 work. So that suggests this extra time was expected. Some of the batches are just about at 5X longer predicted elapsed time.
archae86Perhaps the longer
)
Yes: https://einsteinathome.org/goto/comment/153199
-----
Yes, i also see new batch of
)
Yes, i also see new batch of much longer WUs from LATeah0010L series. And it is expected - they marked as 525 000 GFLOP estimated computation size vs 105 000 GFLOP on initial beta batch.
So exactly 5X longer
Mad_Max wrote:Yes, i also see
)
Well, the estimate is exactly 5X longer. I've seen some variability among the units on a given machine, and definitely seen units that took a good bit less than 5X. What I've not seen variability in so far is the predicted GFLOP content, nor the credit award.
So far all my credit awards on these units are exactly 700. While it is possible that 693 for the previous day's work was somewhat generous, I think 700 is very, very skimpy on average--based on what I've seen so far.
Perhaps on review of the data the staff will adjust both the GFLOP estimates and the credit, perhaps including more unit-to-unit variability.
IMO if the new work is 5X
)
IMO if the new work is 5X longer the credits should be 5X
archae86 wrote:Perhaps on
)
Herr Beer stated just that in the technical news section: https://einsteinathome.org/content/gamma-ray-pulsar-binary-search-1-gpus?page=17#comment-153199