Overwhelming proportion of all tasks don't validate

CElliott

Joined: 9 Feb 05

Posts: 28

Credit: 1154670376

RAC: 1798313

5 Apr 2023 17:04:37 UTC

Topic 229346

(moderation:

)

Platform:

AuthenticAMD AMD Ryzen 7 5800X 8-Core Processor [Family 25 Model 33 Stepping 2]

Number of processors:8

Coprocessors:[2] AMD AMD Radeon (TM) R9 390 Series (8192MB)

Operating system: Microsoft Windows 11 Professional x64 Edition, (10.00.22621.00)

BOINC client version:7.20.2

Memory:65450.66 MiB

Cache:512 KiB

Swap space:69546.66 MiB

Total disk space:930.78 GiB

Application: 1.03 Multi-Directional Gravitational Wave search on O3 (GPU) (GW-opencl-ati)

Normally I process WUs for MilkyWay@Home at RPI, where ALL my WUs validate, but recently that site went down and my work room was cold, so I returned to Einstein@Home for a few days. As far as I can tell, from March 31 to April 5 I processed about 176 WUs on this computer, of which 39 were judged valid and 137 invalid. On my Intel machine, 145 were deemed valid and 55 invalid.

My request is this: When a high proportion of a user's WUs are held invalid, could Einstein@Home send out a notice, similar to the notice sent out when a new version of the BOINC client is released, giving the user's validate/invalidate statistics and one or more diagnostic messages telling the user why his or her WUs don't pass muster. This forum, in a different thread, claims that most validate errors occur due to deficiencies in the user's computer. So prove it by giving us information pointing to the cause of the errors. This is a brand new computer that I just built. If it is causing errors, I would much rather know sooner than later. In addition, in Southeast Pennsylvania the power company compares consumers' use of electricity by neighborhood. Every month I get a letter from the power company telling me that I use much more electricity than my neighbors. I import electricity from a power company in another state that uses all renewable sources, so I'm not worried about climate change. Still, generating a high proportion of invalid work units is a terrible waste of electricity, not to mention money. It would help a lot if Einstein@Home could tell me what I'm doing wrong.

Thank you in advance for your thoughtful consideration of this request.

Charles Elliott

CElliott

Joined: 9 Feb 05

Posts: 28

Credit: 1154670376

RAC: 1798313

Will you please rename the

7 Apr 2023 15:05:17 UTC

Message 210795

(moderation:

)

Will you please rename the applications listed on the user's account page, preferences tab, to correspond to the column names on the Einstein@Home server status page. Now it is not possible to tell what statistics apply to which application.

mikey

Joined: 22 Jan 05

Posts: 12935

Credit: 1884470453

RAC: 33807

CElliott wrote: Will you

7 Apr 2023 23:06:18 UTC

Message 210807 in response to message 210795

(moderation:

)

CElliott wrote:

Will you please rename the applications listed on the user's account page, preferences tab, to correspond to the column names on the Einstein@Home server status page. Now it is not possible to tell what statistics apply to which application.

This has been asked for many many times yet it hasn't been done so far, it would be nice if this time is the time it gets done.

mikey

Joined: 22 Jan 05

Posts: 12935

Credit: 1884470453

RAC: 33807

CElliott

7 Apr 2023 23:14:00 UTC

Message 210808

(moderation:

)

CElliott wrote:

Platform:

AuthenticAMD AMD Ryzen 7 5800X 8-Core Processor [Family 25 Model 33 Stepping 2]

Number of processors:8

Coprocessors:[2] AMD AMD Radeon (TM) R9 390 Series (8192MB)

Operating system: Microsoft Windows 11 Professional x64 Edition, (10.00.22621.00)

BOINC client version:7.20.2

Memory:65450.66 MiB

Cache:512 KiB

Swap space:69546.66 MiB

Total disk space:930.78 GiB

Application: 1.03 Multi-Directional Gravitational Wave search on O3 (GPU) (GW-opencl-ati)

Normally I process WUs for MilkyWay@Home at RPI, where ALL my WUs validate, but recently that site went down and my work room was cold, so I returned to Einstein@Home for a few days. As far as I can tell, from March 31 to April 5 I processed about 176 WUs on this computer, of which 39 were judged valid and 137 invalid. On my Intel machine, 145 were deemed valid and 55 invalid.

My request is this: When a high proportion of a user's WUs are held invalid, could Einstein@Home send out a notice, similar to the notice sent out when a new version of the BOINC client is released, giving the user's validate/invalidate statistics and one or more diagnostic messages telling the user why his or her WUs don't pass muster. This forum, in a different thread, claims that most validate errors occur due to deficiencies in the user's computer. So prove it by giving us information pointing to the cause of the errors. This is a brand new computer that I just built. If it is causing errors, I would much rather know sooner than later. In addition, in Southeast Pennsylvania the power company compares consumers' use of electricity by neighborhood. Every month I get a letter from the power company telling me that I use much more electricity than my neighbors. I import electricity from a power company in another state that uses all renewable sources, so I'm not worried about climate change. Still, generating a high proportion of invalid work units is a terrible waste of electricity, not to mention money. It would help a lot if Einstein@Home could tell me what I'm doing wrong.

Thank you in advance for your thoughtful consideration of this request.

Charles Elliott

You may get more answers in the Crunchers Corner forum than this one.

I have no idea what's wrong and why it's failing for you but have you tried updating the Visual C runtimes all in one 2.23 version yet? I have no diea if this will fix this but I've updated it before to get other projects gpu tasks working again.

https://www.majorgeeksoft.com/visual-c-runtime-installer-all-in-one/

CElliott

Joined: 9 Feb 05

Posts: 28

Credit: 1154670376

RAC: 1798313

I downloaded and installed

2 May 2023 10:12:46 UTC

Message 211822

(moderation:

)

I downloaded and installed Visual C++ 2015-2022 / 14.24.31938 Redistributable Package (x64) as you suggested. Apparently only File Explorer and BitDefender use the new cruntime.dll. All the other programs use msvcrt.dll, which has an old file date. These results are according to the Windows built-in program, Resource Monitor.

It is the MeerKAT workunits that are failing, by over 50%. Hundreds of hours of processing time and thousands of kilowatts of electricity are being wasted every day. Why can't project administrators take an interest in this crap they are purveying? If these failures are the users' fault, then tell us what we can do about it. It it is project's fault, then stop distributing a broken application.

Users are making great sacrifices to conserve electricity, water, gasoline, and other resources. Why can't Einstein@Home and the University of Wisconsin -- Milwaukee get with the program?

mikey

Joined: 22 Jan 05

Posts: 12935

Credit: 1884470453

RAC: 33807

CElliott

2 May 2023 11:01:11 UTC

Message 211826 in response to message 211822

(moderation:

)

CElliott wrote:

I downloaded and installed Visual C++ 2015-2022 / 14.24.31938 Redistributable Package (x64) as you suggested. Apparently only File Explorer and BitDefender use the new cruntime.dll. All the other programs use msvcrt.dll, which has an old file date. These results are according to the Windows built-in program, Resource Monitor.

It is the MeerKAT workunits that are failing, by over 50%. Hundreds of hours of processing time and thousands of kilowatts of electricity are being wasted every day. Why can't project administrators take an interest in this crap they are purveying? If these failures are the users' fault, then tell us what we can do about it. It it is project's fault, then stop distributing a broken application.

Users are making great sacrifices to conserve electricity, water, gasoline, and other resources. Why can't Einstein@Home and the University of Wisconsin -- Milwaukee get with the program?

On the other hand I am also running the Meerkat gpu tasks on my own gpu's and these are my stats for those:

So they can be run to completion and give valid results you just have to figure out why yours aren't. The 50 Error tasks are mostly from me aborting tasks that were running on a gpu that I didn't want to run tasks on.

Keith Myers

Joined: 11 Feb 11

Posts: 5059

Credit: 19259110392

RAC: 7061421

Bernd made an interesting

2 May 2023 16:56:28 UTC

Message 211847 in response to message 211822

(moderation:

)

Bernd made an interesting comment on the issues with 4090 thread that the BRP7 applications for each platform use different FFT files when compiled.

So each application will produce a slightly different answer because of the differences in the FFT files and lead to much higher invalids compared to the other gpu applications.

So much, much higher chance of invalids when each wingmen uses a different card type. Nvidia won't match against AMD or Intel. Intel won't match against Nvidia or AMD and AMD won't match against Nvidia or Intel.

Poor application design of the BRP7 application NOT to use the same code path for each card type.

Scrooge McDuck

Joined: 2 May 07

Posts: 1130

Credit: 18869445

RAC: 11503

Keith Myers schrieb:Bernd

2 May 2023 23:00:54 UTC

Message 211870 in response to message 211847

(moderation:

)

Maybe it's the hardware differences of Nvidia, AMD, Intel enforcing this? Internal number representation, bit widths, precision of vectorized floating point numbers? Rounding rules? Do we need a 21st century version of a "Father of floating point" for GPUs, as William Kahan did in 1979 when designing the i8087 and afterwards standardizing an universal floating point arithmetics (IEEE754) for scientific computers?

Forum - Cruncher's Corner: Generic CPU discussion - ARCHAE86's comment on IEEE754 / i8087

mikey

Joined: 22 Jan 05

Posts: 12935

Credit: 1884470453

RAC: 33807

Scrooge McDuck wrote: Maybe

2 May 2023 23:58:57 UTC

Message 211880 in response to message 211870

(moderation:

)

Scrooge McDuck wrote:

Maybe it's the hardware differences of Nvidia, AMD, Intel enforcing this? Internal number representation, bit widths, precision of vectorized floating point numbers? Rounding rules? Do we need a 21st century version of a "Father of floating point" for GPUs, as William Kahan did in 1979 when designing the i8087 and afterwards standardizing an universal floating point arithmetics (IEEE754) for scientific computers?

Forum - Cruncher's Corner: Generic CPU discussion - ARCHAE86's comment on IEEE754 / i8087

An easy answer in the meantime would be to only validate each brand of gpu against itself, telling people that this could/WILL slow down validation a bit but in the end should produce more valid tasks for each individual user.

Keith Myers

Joined: 11 Feb 11

Posts: 5059

Credit: 19259110392

RAC: 7061421

Has nothing to do with the

3 May 2023 17:36:02 UTC

Message 211888 in response to message 211870

(moderation:

)

Has nothing to do with the precision likely, though the rounding issue has been brought up before.

Its the fact that different FFT libraries were used for each application. This Bernd's comment.

FGRP - HIGH INVALID RATE ON NVIDIA 4090?

Quote:

However, the different Apps use different libraries for the FFT:

* FGRP uses "clFFT", originally developed by AMD for their cards, now OpenSource on GitHub

* BRP CUDA (BRP7 Windows) uses cuFFT

* BRP OpenCL uses an own development based on an Apple OpenCL code example, which seems to be derived from an early cuFFT version

N.B.

The GPU Users Group also uses one of Petri's custom Linux BRP7 applications using CUDA12 so that app uses Nvidia's cuFFT library. So again a mismatch in FFT libraries against the stock BRP7 apps.

Keith Myers

Joined: 11 Feb 11

Posts: 5059

Credit: 19259110392

RAC: 7061421

mikey wrote: An easy answer

3 May 2023 0:59:55 UTC

Message 211891 in response to message 211880

(moderation:

)

mikey wrote:

An easy answer in the meantime would be to only validate each brand of gpu against itself, telling people that this could/WILL slow down validation a bit but in the end should produce more valid tasks for each individual user.

That involves the project admins and developers to run three different validator processes to segregate the results. Then still have to compare those results against the other app card types.

Overwhelming proportion of all tasks don't validate

Forums › Problems and Bug Reports

Comment viewing options

Forums › Problems and Bug Reports