Validate error - What this really means!

Keith Myers
Keith Myers
Joined: 11 Feb 11
Posts: 4968
Credit: 18764958960
RAC: 7162945

Beercules48 wrote: The

Beercules48 wrote:

The output file I have access to does not show any ERRORs. Only INFOs.

 

"

[01:44:25][2092][INFO ] Checkpoint committed!
[01:45:32][2092][INFO ] Checkpoint committed!
[01:46:08][2092][INFO ] OpenCL shutdown complete!
[01:46:08][2092][INFO ] Statistics: count dirty SumSpec pages 67364 (not checkpointed), Page Size 1024, fundamental_idx_hi-window_2: 1926214
[01:46:08][2092][INFO ] Data processing finished successfully!
01:46:08 (2092): called boinc_finish(0)

</stderr_txt>
]]>"

 

This is an excerpt of the output file of a task marked as "validate error". I checked the whole thing. No mention of an error.

 

Source: https://einsteinathome.org/task/1519993958

 

I have downloaded BoincTasks after I read that it solved the issue for another user, so we'll see. Still curious where you found that ERROR msg bc I couldn't. And I checked thoroughly, or so I thought....

 

Thanks again for the reply, it is appreciated! While that boosted my confidence in the BOINC community the same cannot be said about the internal structure of BOINC I experience xD

This task did not error.  It just computed a result that did not agree with the consensus wingmen.

You will get invalids even with no explicit task errors usually because you are paired with two wingmen who are using the same platform and application.  There can be small differences in the output file because of differences in the API, hardware, software implementions, or overclocking which don't match the consensus.

It is just part of the crunching process.  Most projects accept a 5% invalid/error rate in the data returned as normal.

 

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5872
Credit: 117754792099
RAC: 34835748

Beercules48 wrote:The output

Beercules48 wrote:
The output file I have access to does not show any ERRORs. Only INFOs.

A validate error is not a compute error so the output file can't contain a message about what went wrong since your boinc client was unaware that anything had gone wrong.

A validate error only occurs when the compute results (not the stderr messages) are assessed for sanity when returned to the project.  A validate error basically says that the actual compute results were total rubbish or had a format that was outside the specs for a result file and so no comparison with the results from the duplicate task sent to your quorum partner could even be attempted.  This is all covered in excruciating detail in the opening message of this long running thread first posted back in 2011.

Probably the most common reason for this happening is either too aggressive over/under-clocking or too aggressive over/under-volting.  If none of those things apply to you, the next most likely problem could be related to PSU output stability or cleanliness, etc., or to inadequate cooling of CPU or GPU.

I had a very quick look at your tasks list and saw that validate errors seemed to come from MeerKAT tasks done on the GPU.  The interesting bit was that there were long strings of 'good' results followed by long strings of validate errors.  You commented that you 'update' remotely.  Why do that?  The boinc client on that machine is quite capable of updating a batch of results when needed.

I know nothing about Windows - I've always used Unix/Linux - so maybe there is some corruption happening to results files when you force an update via a remote connection.  You should experiment with not forcing a remote update and see if the validate errors suddenly stop.

Cheers,
Gary.

Scrooge McDuck
Scrooge McDuck
Joined: 2 May 07
Posts: 1057
Credit: 17955983
RAC: 11712

Giorgio Cannella

Keith Myers wrote:
Your computers are still hidden when I just checked.

Giorgio Cannella wrote:

since September 4th, 2023, Einstein@Home projects on my pc show the report "Calculation error (1 CPU + 1 Intel GPU)".

[...]

"My current suspicion is that these validate errors do happen 'preferably' on 64Bit machines, either Linux or recent Mac OS versions.

Hello Giorgio,

Two things to distinguish:

  • Validate error: Task finished successfully but results do not match those of wingman(s) (who crunch the same task with identical parameters). Majority decision; minority looses: 'validate error'. Gary explained it perfectly here.
  • Computation error: science app exits with failure. BOINC client reports a specific error ID to the einstein servers. Detailed logfile of failed task can be accessed[*] in your account at einstein website: click on affected computer, look into its tasklist, look for this specific task (task's name), select this task, to check the full logfile (messages window of BOINC manager shows only few information). Example task (I took a failed task from one of Keith's computers as I have none). It failed because of memory allocation problems:
    Error in OpenCL context: CL_MEM_OBJECT_ALLOCATION_FAILURE error executing CL_COMMAND_NDRANGE_KERNEL on NVIDIA GeForce RTX 2080 (Device 0). 
    
    That's insufficient memory, missing VRAM presumably... too many tasks on GPU in parallel (I don't know anything about BOINC configuration for GPUs).

That's the way how to look for causes of computation errors. Sometimes it's easy, sometimes not.

[*] To let others look into your task list and corresponding result logs you should enable (tick 'yes') at "show your computers on ... website" in einstein's privacy prefs. Don't forget to "save changes" at the bottom of such preferences sites. I could access Keith's computers for the example above because Keith allows it.
Giorgio Cannella
Giorgio Cannella
Joined: 2 May 19
Posts: 4
Credit: 658477
RAC: 0

"Your computers are still

"Your computers are still hidden when I just checked."

 

Hello Scrooge,

you are right.

I clicked Yes on "Should Einstein@Home show your computers on its website?", but I forgot to click on "Save changes".

Now I did it.

Pardon for my mistake.

You should be able to see my computer now.

Giorgio

Giorgio Cannella
Giorgio Cannella
Joined: 2 May 19
Posts: 4
Credit: 658477
RAC: 0

"Computation error: science

"Computation error: science app exits with failure. BOINC client reports a specific error ID to the einstein servers. Detailed logfile of failed task can be accessed[*] in your account at einstein website: click on affected computer, look into its tasklist, look for this specific task (task's name), select this task, to check the full logfile (messages window of BOINC manager shows only few information)."

 

Hello again Scrooge,

here under I paste the full log of the last task of my pc in Heinstein@Home website. 

Giorgio

 

 

TASK 1520959712

Name:LATeah3012L06_764.0_0_0.0_13423473_1

Workunit ID:754766805

Created:4 Sep 2023 6:03:21 UTC

Sent:4 Sep 2023 7:54:17 UTC

Report deadline:18 Sep 2023 7:54:17 UTC

Received:5 Sep 2023 11:41:48 UTC

Server state:Over

Outcome:Computation error

Client state:Compute error

Exit status:198 (0x000000C6) EXIT_MEM_LIMIT_EXCEEDED

Computer:12775856

Run time (sec):54.29

CPU time (sec):51.91

Peak working set size (MB):891.15

Peak swap size (MB):929.19

Peak disk usage (MB):0.01

Validation state:Invalid

Granted credit:0

Application:Gamma-ray pulsar binary search #1 on GPUs v1.22 (FGRPopencl-intel_gpu)
windows_x86_64


Stderr output

<core_client_version>7.22.2</core_client_version>
<![CDATA[
<message>
working set size > client RAM limit: 804.00MB > 802.38MB</message>
<stderr_txt>
10:28:01 (9580): [normal]: This Einstein@home App was built at: May  8 2019 13:29:27

10:28:01 (9580): [normal]: Start of BOINC application 'projects/einstein.phys.uwm.edu/hsgamma_FGRPB1G_1.22_windows_x86_64__FGRPopencl-intel_gpu.exe'.
10:28:01 (9580): [debug]: 1e+016 fp, 2.2e+009 fp/s, 4716797 s, 1310h13m16s87
10:28:01 (9580): [normal]: % CPU usage: 1.000000, GPU usage: 1.000000
command line: projects/einstein.phys.uwm.edu/hsgamma_FGRPB1G_1.22_windows_x86_64__FGRPopencl-intel_gpu.exe --inputfile ../../projects/einstein.phys.uwm.edu/LATeah3012L06.dat --alpha 2.59819959601 --delta -0.694603692878 --skyRadius 1.890770e-06 --ldiBins 15 --f0start 756.0 --f0Band 8.0 --firstSkyPoint 0 --numSkyPoints 1 --f1dot -1e-13 --f1dotBand 1e-13 --df1dot 1.69860773e-15 --ephemdir ..\..\projects\einstein.phys.uwm.edu\JPLEPH --Tcoh 2097152.0 --toplist 10 --cohFollow 10 --numCells 1 --useWeights 1 --Srefinement 1 --CohSkyRef 1 --cohfullskybox 1 --mmfu 0.1 --reftime 56100 --model 0 --f0orbit 0.005 --mismatch 0.1 --demodbinary 1 --BinaryPointFile ../../projects/einstein.phys.uwm.edu/templates_LATeah3012L06_0764_13423473.dat --debug 0 --device 0 -o LATeah3012L06_764.0_0_0.0_13423473_1_0.out
output files: 'LATeah3012L06_764.0_0_0.0_13423473_1_0.out' '../../projects/einstein.phys.uwm.edu/LATeah3012L06_764.0_0_0.0_13423473_1_0' 'LATeah3012L06_764.0_0_0.0_13423473_1_0.out.cohfu' '../../projects/einstein.phys.uwm.edu/LATeah3012L06_764.0_0_0.0_13423473_1_1'
10:28:01 (9580): [debug]: Flags: X64 SSE SSE2 GNUC X86 GNUX86
10:28:01 (9580): [debug]: Set up communication with graphics process.
boinc_get_opencl_ids returned [00000000052aefe0 , 0000000005250850]
Using OpenCL platform provided by: Intel(R) Corporation
Using OpenCL device "Intel(R) HD Graphics 520" by: Intel(R) Corporation
Max allocation limit: 841357312
Global mem size: 1682714624
OpenCL device has FP64 support
read_checkpoint(): Couldn't open file 'LATeah3012L06_764.0_0_0.0_13423473_1_0.out.cpt': No such file or directory (2)
% fft length: 16777216 (0x1000000)
% Scratch buffer size: 136314880
13:39:26 (8872): [normal]: This Einstein@home App was built at: May 8 2019 13:29:27

13:39:26 (8872): [normal]: Start of BOINC application 'projects/einstein.phys.uwm.edu/hsgamma_FGRPB1G_1.22_windows_x86_64__FGRPopencl-intel_gpu.exe'.
13:39:26 (8872): [debug]: 1e+016 fp, 2.2e+009 fp/s, 4716797 s, 1310h13m16s87
13:39:26 (8872): [normal]: % CPU usage: 1.000000, GPU usage: 1.000000
command line: projects/einstein.phys.uwm.edu/hsgamma_FGRPB1G_1.22_windows_x86_64__FGRPopencl-intel_gpu.exe --inputfile ../../projects/einstein.phys.uwm.edu/LATeah3012L06.dat --alpha 2.59819959601 --delta -0.694603692878 --skyRadius 1.890770e-06 --ldiBins 15 --f0start 756.0 --f0Band 8.0 --firstSkyPoint 0 --numSkyPoints 1 --f1dot -1e-13 --f1dotBand 1e-13 --df1dot 1.69860773e-15 --ephemdir ..\..\projects\einstein.phys.uwm.edu\JPLEPH --Tcoh 2097152.0 --toplist 10 --cohFollow 10 --numCells 1 --useWeights 1 --Srefinement 1 --CohSkyRef 1 --cohfullskybox 1 --mmfu 0.1 --reftime 56100 --model 0 --f0orbit 0.005 --mismatch 0.1 --demodbinary 1 --BinaryPointFile ../../projects/einstein.phys.uwm.edu/templates_LATeah3012L06_0764_13423473.dat --debug 0 --device 0 -o LATeah3012L06_764.0_0_0.0_13423473_1_0.out
output files: 'LATeah3012L06_764.0_0_0.0_13423473_1_0.out' '../../projects/einstein.phys.uwm.edu/LATeah3012L06_764.0_0_0.0_13423473_1_0' 'LATeah3012L06_764.0_0_0.0_13423473_1_0.out.cohfu' '../../projects/einstein.phys.uwm.edu/LATeah3012L06_764.0_0_0.0_13423473_1_1'
13:39:26 (8872): [debug]: Flags: X64 SSE SSE2 GNUC X86 GNUX86
13:39:26 (8872): [debug]: Set up communication with graphics process.
boinc_get_opencl_ids returned [000000000527a210 , 00000000052512f0]
Using OpenCL platform provided by: Intel(R) Corporation
Using OpenCL device "Intel(R) HD Graphics 520" by: Intel(R) Corporation
Max allocation limit: 841357312
Global mem size: 1682714624
OpenCL device has FP64 support
read_checkpoint(): Couldn't open file 'LATeah3012L06_764.0_0_0.0_13423473_1_0.out.cpt': No such file or directory (2)
% fft length: 16777216 (0x1000000)
% Scratch buffer size: 136314880

Unhandled Exception Detected...

- Unhandled Exception Record -
Reason: Breakpoint Encountered (0x80000003) at address 0x00007ffc5f3e08d2

Engaging BOINC Windows Runtime Debugger...

********************

BOINC Windows Runtime Debugger Version 7.3.0

Dump Timestamp : 09/05/23 13:40:40
Install Directory : C:\Program Files\BOINC\
Data Directory : C:\ProgramData\BOINC
Project Symstore :
LoadLibraryA( C:\Program Files\BOINC\\dbghelp.dll ): GetLastError = 126
Loaded Library : dbghelp.dll
LoadLibraryA( C:\Program Files\BOINC\\symsrv.dll ): GetLastError = 126
LoadLibraryA( symsrv.dll ): GetLastError = 126
LoadLibraryA( C:\Program Files\BOINC\\srcsrv.dll ): GetLastError = 126
LoadLibraryA( srcsrv.dll ): GetLastError = 126
LoadLibraryA( C:\Program Files\BOINC\\version.dll ): GetLastError = 126
Loaded Library : version.dll
Debugger Engine : 4.0.5.0
Symbol Search Path: C:\ProgramData\BOINC\slots\0;C:\ProgramData\BOINC\projects\einstein.phys.uwm.edu

ModLoad: 0000000000400000 0000000000d22000 C:\ProgramData\BOINC\projects\einstein.phys.uwm.edu\hsgamma_FGRPB1G_1.22_windows_x86_64__FGRPopencl-intel_gpu.exe (-nosymbols- Symbols Loaded)

ModLoad: 00000000618a0000 0000000000209000 C:\Windows\SYSTEM32\ntdll.dll (6.2.22000.2295) (-exported- Symbols Loaded)
File Version : 10.0.22000.2295 (WinBuild.160101.0800)
Company Name : Microsoft Corporation
Product Name : Sistema operativo Microsoft&#174; Windows&#174;
Product Version : 10.0.22000.2295

ModLoad: 0000000061000000 00000000000be000 C:\Windows\System32\KERNEL32.DLL (6.2.22000.2124) (-exported- Symbols Loaded)
File Version : 10.0.22000.2295 (WinBuild.160101.0800)
Company Name : Microsoft Corporation
Product Name : Sistema operativo Microsoft&#174; Windows&#174;
Product Version : 10.0.22000.2295

ModLoad: 000000005f2e0000 0000000000384000 C:\Windows\System32\KERNELBASE.dll (6.2.22000.2295) (-exported- Symbols Loaded)
File Version : 10.0.22000.2295 (WinBuild.160101.0800)
Company Name : Microsoft Corporation
Product Name : Sistema operativo Microsoft&#174; Windows&#174;
Product Version : 10.0.22000.2295

ModLoad: 0000000061130000 00000000000af000 C:\Windows\System32\ADVAPI32.dll (6.2.22000.1880) (-exported- Symbols Loaded)
File Version : 10.0.22000.2295 (WinBuild.160101.0800)
Company Name : Microsoft Corporation
Product Name : Sistema operativo Microsoft&#174; Windows&#174;
Product Version : 10.0.22000.2295

ModLoad: 000000005f6f0000 00000000000a3000 C:\Windows\System32\msvcrt.dll (7.0.22000.1) (-exported- Symbols Loaded)
File Version : 7.0.22000.1 (WinBuild.160101.0800)
Company Name : Microsoft Corporation
Product Name : Microsoft&#174; Windows&#174; Operating System
Product Version : 7.0.22000.1

ModLoad: 00000000609d0000 000000000009e000 C:\Windows\System32\sechost.dll (6.2.22000.1880) (-exported- Symbols Loaded)
File Version : 10.0.22000.1 (WinBuild.160101.0800)
Company Name : Microsoft Corporation
Product Name : Microsoft&#174; Windows&#174; Operating System
Product Version : 10.0.22000.1

ModLoad: 0000000057520000 000000000001c000 C:\Windows\SYSTEM32\OpenCL.dll (3.0.1.0) (-exported- Symbols Loaded)
File Version : 3.0.1.0
Company Name : Khronos Group
Product Name : Khronos OpenCL ICD Loader
Product Version :

ModLoad: 000000005f0b0000 000000000009d000 C:\Windows\System32\msvcp_win.dll (6.2.22000.1) (-exported- Symbols Loaded)
File Version : 10.0.22000.1 (WinBuild.160101.0800)
Company Name : Microsoft Corporation
Product Name : Microsoft&#174; Windows&#174; Operating System
Product Version : 10.0.22000.1

ModLoad: 000000005f850000 0000000000121000 C:\Windows\System32\RPCRT4.dll (6.2.22000.2176) (-exported- Symbols Loaded)
File Version : 10.0.22000.2295 (WinBuild.160101.0800)
Company Name : Microsoft Corporation
Product Name : Sistema operativo Microsoft&#174; Windows&#174;
Product Version : 10.0.22000.2295

ModLoad: 000000005f150000 0000000000111000 C:\Windows\System32\ucrtbase.dll (6.2.22000.1) (-exported- Symbols Loaded)
File Version : 10.0.22000.1 (WinBuild.160101.0800)
Company Name : Microsoft Corporation
Product Name : Microsoft&#174; Windows&#174; Operating System
Product Version : 10.0.22000.1

ModLoad: 0000000060120000 00000000007c5000 C:\Windows\System32\SHELL32.dll (6.2.22000.2245) (-exported- Symbols Loaded)
File Version : 10.0.22000.2295 (WinBuild.160101.0800)
Company Name : Microsoft Corporation
Product Name : Sistema operativo Microsoft&#174; Windows&#174;
Product Version : 10.0.22000.2295

ModLoad: 0000000060c20000 00000000001ad000 C:\Windows\System32\USER32.dll (6.2.22000.1761) (-exported- Symbols Loaded)
File Version : 10.0.22000.2295 (WinBuild.160101.0800)
Company Name : Microsoft Corporation
Product Name : Sistema operativo Microsoft&#174; Windows&#174;
Product Version : 10.0.22000.2295

ModLoad: 0000000061340000 0000000000376000 C:\Windows\System32\combase.dll (6.2.22000.1641) (-exported- Symbols Loaded)
File Version : 10.0.22000.2295 (WinBuild.160101.0800)
Company Name : Microsoft Corporation
Product Name : Sistema operativo Microsoft&#174; Windows&#174;
Product Version : 10.0.22000.2295

ModLoad: 000000005ed30000 0000000000026000 C:\Windows\System32\win32u.dll (6.2.22000.2245) (-exported- Symbols Loaded)
File Version : 10.0.22000.2245 (WinBuild.160101.0800)
Company Name : Microsoft Corporation
Product Name : Microsoft&#174; Windows&#174; Operating System
Product Version : 10.0.22000.2245

ModLoad: 00000000609a0000 000000000002a000 C:\Windows\System32\GDI32.dll (6.2.22000.1880) (-exported- Symbols Loaded)
File Version : 10.0.22000.1880 (WinBuild.160101.0800)
Company Name : Microsoft Corporation
Product Name : Microsoft&#174; Windows&#174; Operating System
Product Version : 10.0.22000.1880

ModLoad: 000000005ed60000 000000000011f000 C:\Windows\System32\gdi32full.dll (6.2.22000.2245) (-exported- Symbols Loaded)
File Version : 10.0.22000.2245 (WinBuild.160101.0800)
Company Name : Microsoft Corporation
Product Name : Microsoft&#174; Windows&#174; Operating System
Product Version : 10.0.22000.2245

ModLoad: 000000005e940000 000000000004c000 C:\Windows\SYSTEM32\cfgmgr32.dll (6.2.22000.2124) (-exported- Symbols Loaded)
File Version : 10.0.22000.2124 (WinBuild.160101.0800)
Company Name : Microsoft Corporation
Product Name : Microsoft&#174; Windows&#174; Operating System
Product Version : 10.0.22000.2124

ModLoad: 00000000600e0000 0000000000031000 C:\Windows\System32\IMM32.DLL (6.2.22000.1641) (-exported- Symbols Loaded)
File Version : 10.0.22000.1641 (WinBuild.160101.0800)
Company Name : Microsoft Corporation
Product Name : Microsoft&#174; Windows&#174; Operating System
Product Version : 10.0.22000.1641

ModLoad: 000000005ddc0000 0000000000034000 C:\Windows\SYSTEM32\ntmarta.dll (6.2.22000.1) (-exported- Symbols Loaded)
File Version : 10.0.22000.2295 (WinBuild.160101.0800)
Company Name : Microsoft Corporation
Product Name : Sistema operativo Microsoft&#174; Windows&#174;
Product Version : 10.0.22000.2295

ModLoad: 000000005c080000 0000000000038000 C:\Windows\SYSTEM32\dxcore.dll (6.2.22000.653) (-exported- Symbols Loaded)
File Version : 10.0.22000.653 (WinBuild.160101.0800)
Company Name : Microsoft Corporation
Product Name : Microsoft&#174; Windows&#174; Operating System
Product Version : 10.0.22000.653

ModLoad: 000000005dda0000 0000000000018000 C:\Windows\SYSTEM32\kernel.appcore.dll (6.2.22000.71) (-exported- Symbols Loaded)
File Version : 10.0.22000.71 (WinBuild.160101.0800)
Company Name : Microsoft Corporation
Product Name : Microsoft&#174; Windows&#174; Operating System
Product Version : 10.0.22000.71

ModLoad: 000000005f670000 000000000007f000 C:\Windows\System32\bcryptPrimitives.dll (6.2.22000.1455) (-exported- Symbols Loaded)
File Version : 10.0.22000.1455 (WinBuild.160101.0800)
Company Name : Microsoft Corporation
Product Name : Microsoft&#174; Windows&#174; Operating System
Product Version : 10.0.22000.1455

ModLoad: 000000005f7a0000 00000000000af000 C:\Windows\System32\clbcatq.dll (2001.12.10941.16384) (-exported- Symbols Loaded)
File Version : 2001.12.10941.16384 (WinBuild.160101.0800)
Company Name : Microsoft Corporation
Product Name : Microsoft&#174; Windows&#174; Operating System
Product Version : 10.0.22000.1516

ModLoad: 000000005c0e0000 00000000000f3000 C:\Windows\SYSTEM32\dxgi.dll (6.2.22000.2245) (-exported- Symbols Loaded)
File Version : 10.0.22000.2245 (WinBuild.160101.0800)
Company Name : Microsoft Corporation
Product Name : Microsoft&#174; Windows&#174; Operating System
Product Version : 10.0.22000.2245

ModLoad: 0000000052760000 0000000000044000 C:\Windows\SYSTEM32\directxdatabasehelper.dll (6.2.22000.653) (-nosymbols- Symbols Loaded)
File Version : 10.0.22000.653 (WinBuild.160101.0800)
Company Name : Microsoft Corporation
Product Name : Microsoft&#174; Windows&#174; Operating System
Product Version : 10.0.22000.653

ModLoad: 00000000279f0000 000000000055b000 C:\Windows\System32\DriverStore\FileRepository\iigd_dch.inf_amd64_b53c057d22ce6f37\igdrcl64.dll (23.20.101.2115) (-exported- Symbols Loaded)
File Version : 23.20.101.2115
Company Name : Intel Corporation
Product Name : Intel HD Graphics Drivers for Windows(R)
Product Version : 23.20.101.2115

ModLoad: 00000000610c0000 000000000006f000 C:\Windows\System32\WS2_32.dll (6.2.22000.1) (-exported- Symbols Loaded)
File Version : 10.0.22000.2295 (WinBuild.160101.0800)
Company Name : Microsoft Corporation
Product Name : Sistema operativo Microsoft&#174; Windows&#174;
Product Version : 10.0.22000.2295

ModLoad: 00000000471a0000 0000000000409000 C:\Windows\System32\DriverStore\FileRepository\iigd_dch.inf_amd64_b53c057d22ce6f37\igdgmm64.dll (31.0.101.2115) (-exported- Symbols Loaded)
File Version : 31.0.101.2115
Company Name : Intel Corporation
Product Name : Intel HD Graphics Drivers for Windows(R)
Product Version : 31.0.101.2115

ModLoad: 000000002c950000 0000000000104000 C:\Windows\System32\DriverStore\FileRepository\iigd_dch.inf_amd64_b53c057d22ce6f37\igdfcl64.dll (31.0.101.2115) (-exported- Symbols Loaded)
File Version : 31.0.101.2115
Company Name : Intel Corporation
Product Name : Intel HD Graphics Drivers for Windows(R)
Product Version : 31.0.101.2115

ModLoad: 0000000060a70000 000000000019a000 C:\Windows\System32\ole32.dll (6.2.22000.1936) (-exported- Symbols Loaded)
File Version : 10.0.22000.2295 (WinBuild.160101.0800)
Company Name : Microsoft Corporation
Product Name : Sistema operativo Microsoft&#174; Windows&#174;
Product Version : 10.0.22000.2295

ModLoad: 00000000434e0000 0000000003794000 C:\Windows\System32\DriverStore\FileRepository\iigd_dch.inf_amd64_b53c057d22ce6f37\igc64.dll (31.0.101.2115) (-exported- Symbols Loaded)
File Version : 31.0.101.2115
Company Name : Intel Corporation
Product Name : Intel HD Graphics Drivers for Windows(R)
Product Version : 31.0.101.2115

ModLoad: 0000000017750000 0000000004de5000 C:\Windows\System32\DriverStore\FileRepository\iigd_dch.inf_amd64_b53c057d22ce6f37\opencl-clang64.dll (2.0.9.0) (-exported- Symbols Loaded)
File Version : 2.0.9.0
Company Name : Intel Corporation
Product Name : Intel&#174; Front-end Library for OpenCL&#153; software
Product Version : 2.0.9.0

ModLoad: 0000000061250000 00000000000d6000 C:\Windows\System32\OLEAUT32.dll (6.2.22000.2176) (-exported- Symbols Loaded)
File Version : 10.0.22000.2176 (WinBuild.160101.0800)
Company Name : Microsoft Corporation
Product Name : Microsoft&#174; Windows&#174; Operating System
Product Version : 10.0.22000.2176

ModLoad: 000000005a440000 000000000000a000 C:\Windows\SYSTEM32\VERSION.dll (6.2.22000.1) (-exported- Symbols Loaded)
File Version : 10.0.22000.1 (WinBuild.160101.0800)
Company Name : Microsoft Corporation
Product Name : Microsoft&#174; Windows&#174; Operating System
Product Version : 10.0.22000.1

ModLoad: 000000005c710000 0000000000221000 C:\Windows\SYSTEM32\dbghelp.dll (6.2.22000.1) (-exported- Symbols Loaded)
File Version : 10.0.22000.1 (WinBuild.160101.0800)
Company Name : Microsoft Corporation
Product Name : Microsoft&#174; Windows&#174; Operating System
Product Version : 10.0.22000.1

*** Dump of the Process Statistics: ***

- I/O Operations Counters -
Read: 0, Write: 0, Other 0

- I/O Transfers Counters -
Read: 0, Write: 0, Other 0

- Paged Pool Usage -
QuotaPagedPoolUsage: 0, QuotaPeakPagedPoolUsage: 0
QuotaNonPagedPoolUsage: 0, QuotaPeakNonPagedPoolUsage: 0

- Virtual Memory Usage -
VirtualSize: 0, PeakVirtualSize: 0

- Pagefile Usage -
PagefileUsage: 0, PeakPagefileUsage: 0

- Working Set Size -
WorkingSetSize: 0, PeakWorkingSetSize: 0, PageFaultCount: 0

*** Dump of thread ID 5068 (state: Initialized): ***

- Information -
Status: Base Priority: Normal, Priority: Normal, , Kernel Time: 0.000000, User Time: 0.000000, Wait Time: 0.000000

- Unhandled Exception Record -
Reason: Breakpoint Encountered (0x80000003) at address 0x00007ffc5f3e08d2

- Registers -
rax=0000000000000000 rbx=000000000384eea0 rcx=000000000071ce83 rdx=000000000071ce82 rsi=0000000061023560 rdi=000000005f6f8010
r8=000000000384eea0 r9=0000000003132026 r10=0000000000000001 r11=0000000000000000 r12=0000000000000000 r13=0000000000000000
r14=0000000000000000 r15=0000000000000001 rip=000000005f3e08d2 rsp=000000000384e6b8 rbp=0000000000000001
cs=0033 ss=002b ds=002b es=002b fs=0053 gs=002b efl=00000246

- Callstack -
ChildEBP RetAddr Args to Child
0384e6b0 005b062c 0384eea0 0071ce82 0384eea0 03132026 KERNELBASE!DebugBreak+0x0
0384fef0 005b084c 6101a6d0 00000000 fff0bdc0 00000000 hsgamma_FGRPB1G_1.22_windows_x8!+0x0
0384ff20 610155a0 00000000 00000000 00000000 00000000 hsgamma_FGRPB1G_1.22_windows_x8!+0x0
0384ff50 618a485b 00000000 00000000 00000000 00000000 KERNEL32!BaseThreadInitThunk+0x0
0384ffd0 00000000 00000000 00000000 00000000 00000000 ntdll!RtlUserThreadStart+0x0

*** Dump of thread ID 3980 (state: Initialized): ***

- Information -
Status: Base Priority: Normal, Priority: Normal, , Kernel Time: 0.000000, User Time: 0.000000, Wait Time: 0.000000

- Registers -
rax=0000000000000034 rbx=000000000132eb70 rcx=0000000000000000 rdx=000000000132eb70 rsi=00000000002ee000 rdi=0000000000000000
r8=000000000132eb08 r9=0000000000989680 r10=0000000000000000 r11=0000000000000246 r12=000000000132ed08 r13=00000000015a1000
r14=0000000000000000 r15=0000000000000497 rip=00000000619442b4 rsp=000000000132eb08 rbp=0000000000989680
cs=0033 ss=002b ds=0000 es=0000 fs=0000 gs=0000 efl=00000246

- Callstack -
ChildEBP RetAddr Args to Child
0132eb00 618fb8c3 00000000 00000000 00000000 00000000 ntdll!ZwDelayExecution+0x0
0132eb30 5f34dc5b 00000000 00000000 0159fa60 00000000 ntdll!RtlDelayExecution+0x0
0132eb60 27ba570a 00000000 00000000 00000000 00000000 KERNELBASE!SwitchToThread+0x0
0132ec40 27ba907d 00000000 00000000 00000497 00000000 igdrcl64!clReleaseGlSharedEventINTEL+0x0
0132ec90 27b4e049 00000000 00000000 0159fa60 00000000 igdrcl64!clReleaseGlSharedEventINTEL+0x0
0132ed40 27a63939 00000000 27a623c9 052ae4d0 27a62295 igdrcl64!clReleaseGlSharedEventINTEL+0x0
0132edc0 27a6381a 00000000 052ae4d0 0132eed0 00000000 igdrcl64!GTPin_Init+0x0
0132f000 27ac40c4 00000000 052ae4d0 0132f100 0132f1b8 igdrcl64!GTPin_Init+0x0
0132f030 27a13d32 00000000 00000000 00000000 00000000 igdrcl64!GTPin_Init+0x0
0132f150 0042294b 00000010 0132f1b0 00000000 00000000 igdrcl64!+0x0
0132f1d0 0041199d 01000000 0027cc10 01000000 ffffffff hsgamma_FGRPB1G_1.22_windows_x8!+0x0
0132fa30 006d755a 5f784a60 006eb968 00000000 00000001 hsgamma_FGRPB1G_1.22_windows_x8!+0x0
0132fe20 004013f8 0000003d 00000002 00000000 00000000 hsgamma_FGRPB1G_1.22_windows_x8!+0x0
0132fef0 0040151b 00000000 00000000 00000000 00000000 hsgamma_FGRPB1G_1.22_windows_x8!+0x0
0132ff20 610155a0 00000000 00000000 00000000 00000000 hsgamma_FGRPB1G_1.22_windows_x8!+0x0
0132ff50 618a485b 00000000 00000000 00000000 00000000 KERNEL32!BaseThreadInitThunk+0x0
0132ffd0 00000000 00000000 00000000 00000000 00000000 ntdll!RtlUserThreadStart+0x0

*** Dump of thread ID 11644 (state: Initialized): ***

- Information -
Status: Base Priority: Normal, Priority: Normal, , Kernel Time: 0.000000, User Time: 0.000000, Wait Time: 0.000000

- Registers -
rax=00000000000001df rbx=00000000014e2350 rcx=0000000000000018 rdx=00000000014e1fd0 rsi=00000000014b0c50 rdi=00000000014e2350
r8=0000000000000279 r9=0000000000000000 r10=000000003e5b4170 r11=000000007ffe0008 r12=0000000000000000 r13=0000000000000000
r14=0000000000000000 r15=0000000000000000 rip=0000000061947804 rsp=0000000003a4fc38 rbp=0000000000000000
cs=0033 ss=002b ds=0000 es=0000 fs=0000 gs=0000 efl=00000246

- Callstack -
ChildEBP RetAddr Args to Child
03a4fc30 618b6cdf 014c5ff8 014c5ff8 014b0c50 00000000 ntdll!NtWaitForWorkViaWorkerFactory+0x0
03a4ff20 610155a0 00000000 00000000 00000000 00000000 ntdll!EtwNotificationRegister+0x0
03a4ff50 618a485b 00000000 00000000 00000000 00000000 KERNEL32!BaseThreadInitThunk+0x0
03a4ffd0 00000000 00000000 00000000 00000000 00000000 ntdll!RtlUserThreadStart+0x0

*** Dump of thread ID 8772 (state: Initialized): ***

- Information -
Status: Base Priority: Normal, Priority: Normal, , Kernel Time: 0.000000, User Time: 0.000000, Wait Time: 0.000000

- Registers -
rax=00000000000001dc rbx=0000000000000000 rcx=00000000014ea480 rdx=0000000000000000 rsi=0000000000000000 rdi=00000000014ea480
r8=0000000000000000 r9=0000000000000002 r10=0000000000000000 r11=0000000000000246 r12=0000000000000000 r13=0000000000000000
r14=0000000000000102 r15=00000000014ea518 rip=00000000619477a4 rsp=000000000554fd68 rbp=000000000554fdc0
cs=0033 ss=002b ds=0000 es=0000 fs=0000 gs=0000 efl=00000246

- Callstack -
ChildEBP RetAddr Args to Child
0554fd60 6190ab47 00000000 00000000 00000000 00000000 ntdll!NtWaitForAlertByThreadId+0x0
0554fde0 5f34da19 014ea478 014ea470 014ea510 4de863bf ntdll!RtlSleepConditionVariableSRW+0x0
0554fe20 27c78bb9 00000000 27b53380 05e85660 27b538c2 KERNELBASE!SleepConditionVariableSRW+0x0
0554fe50 27c78cd6 4de8605f 014ea450 4de8602f 00000000 igdrcl64!clReleaseGlSharedEventINTEL+0x0
0554fe80 27b53ad9 014ea470 00000000 014ea470 00000000 igdrcl64!clReleaseGlSharedEventINTEL+0x0
0554fec0 27bfeed6 00080001 05265770 00000000 00000000 igdrcl64!clReleaseGlSharedEventINTEL+0x0
0554fef0 27c98c7a 00000000 00000000 00000000 00000000 igdrcl64!clReleaseGlSharedEventINTEL+0x0
0554ff20 610155a0 00000000 00000000 00000000 00000000 igdrcl64!clReleaseGlSharedEventINTEL+0x0
0554ff50 618a485b 00000000 00000000 00000000 00000000 KERNEL32!BaseThreadInitThunk+0x0
0554ffd0 00000000 00000000 00000000 00000000 00000000 ntdll!RtlUserThreadStart+0x0

*** Dump of thread ID 7252 (state: Initialized): ***

- Information -
Status: Base Priority: Normal, Priority: Normal, , Kernel Time: 0.000000, User Time: 0.000000, Wait Time: 0.000000

- Registers -
rax=00000000000001df rbx=00000000070d5ec0 rcx=0000000000000220 rdx=00000000070d5b40 rsi=00000000014def70 rdi=00000000070d5ec0
r8=000000000593c020 r9=00000000000000e0 r10=00000000000c0cc0 r11=0000000000000000 r12=0000000000000000 r13=0000000000000000
r14=0000000000000000 r15=0000000000000000 rip=0000000061947804 rsp=000000000eedfc38 rbp=0000000000000000
cs=0033 ss=002b ds=0000 es=0000 fs=0000 gs=0000 efl=00000246

- Callstack -
ChildEBP RetAddr Args to Child
0eedfc30 618b6cdf 014df088 014df088 014def70 00000000 ntdll!NtWaitForWorkViaWorkerFactory+0x0
0eedff20 610155a0 00000000 00000000 00000000 00000000 ntdll!EtwNotificationRegister+0x0
0eedff50 618a485b 00000000 00000000 00000000 00000000 KERNEL32!BaseThreadInitThunk+0x0
0eedffd0 00000000 00000000 00000000 00000000 00000000 ntdll!RtlUserThreadStart+0x0

*** Dump of thread ID 7520 (state: Initialized): ***

- Information -
Status: Base Priority: Normal, Priority: Normal, , Kernel Time: 0.000000, User Time: 0.000000, Wait Time: 0.000000

- Registers -
rax=00000000000001df rbx=0000000000000000 rcx=0000000000000220 rdx=00000000070d6620 rsi=00000000014def70 rdi=0000000000000000
r8=0000000000000000 r9=0000000000000000 r10=0000000000000000 r11=0000000000000000 r12=0000000000000000 r13=0000000000000000
r14=0000000000000000 r15=0000000000000000 rip=0000000061947804 rsp=000000000f0dfc38 rbp=0000000000000000
cs=0033 ss=002b ds=0000 es=0000 fs=0000 gs=0000 efl=00000246

- Callstack -
ChildEBP RetAddr Args to Child
0f0dfc30 618b6cdf 00000000 00000000 014def70 00000000 ntdll!NtWaitForWorkViaWorkerFactory+0x0
0f0dff20 610155a0 00000000 00000000 00000000 00000000 ntdll!EtwNotificationRegister+0x0
0f0dff50 618a485b 00000000 00000000 00000000 00000000 KERNEL32!BaseThreadInitThunk+0x0
0f0dffd0 00000000 00000000 00000000 00000000 00000000 ntdll!RtlUserThreadStart+0x0

*** Debug Message Dump ****

*** Foreground Window Data ***
Window Name :
Window Class :
Window Process ID: 0
Window Thread ID : 0

Exiting...

</stderr_txt>
]]>


Beercules48
Beercules48
Joined: 18 Jun 23
Posts: 7
Credit: 55048156
RAC: 121903

I saw the same behaviour. A

I saw the same behaviour. A bunch of tasks will return no error, so that one thinks all is well and the problem is fixed. Then all of a sudden it decides it doesn't like my tasks again and just throws them away... Sadly, with German energy prices being among the, if not THE, highest in the world, I cannot sustain that. I tried it twice, that is all I'm willing to do. I switched the rig over to a different citizen science project. Works flawlessly.

 

If the mere fact that I dare to update the project manually, and, *gasp*, from my main rig *remotely* is enough that the Validator(tm) panics and throws my results out, that to me sounds more like a BOINC/Einstein issue and not that smth on my part is wrong. The GPU doesn't overheat, not even close, I check the temps with hwinfo. Both hotspot and GPU core. Both are somewhere in the 60s°C after hours of running. PSU is a 750W unit and the GPU is limited by AMD to 150W. Also the W5700 Pros cannot (reasonably*) be overclocked so it is entirely stock. *with a custom BIOS it is possible but I haven't bothered.

 

I built the rig with Einstein at home in mind, so it is sad to switch, but if the Validator(tm) doesn't like my tasks and won't tell me why there isn't much I can do.

 

As always thanks for the answers, that is much appreciated!

Keith Myers
Keith Myers
Joined: 11 Feb 11
Posts: 4968
Credit: 18764958960
RAC: 7162945

Trying to run gpu tasks on an

Trying to run gpu tasks on an Intel igpu is always iffy since it has to share memory resources and access with the cpu.

If you insist on trying to run on the igpu I suggest not computing any cpu tasks in competition with the igpu.

Also Intel historically has had a very frequent schedule of updating its drivers.  From working one week to not working the next week until the next update makes things work again.

Also since you are trying to use the Intel 530 you should review this thread about that gpu.

INTEL HD 530 AND UHD 730 TASKS IMMEDIATELY BEING ABORTED WITH THE ERROR "MISSING COPROCESSOR"

Bernd broke the application trying to make a working Mac Silicon application.

If the application is still broken you should report it to Bernd in that thread.

 

Scrooge McDuck
Scrooge McDuck
Joined: 2 May 07
Posts: 1057
Credit: 17955983
RAC: 11712

Giorgio's persistent

Giorgio's persistent computation errors are clearly NO validation errors, so off topic here.

Beercules48 problems with FGRPB1G task on his Ryzen 5 computer on AMD Radeon GPU are clearly sporadic occuring validation errors. All of his FGRP7 (MeerKat) tasks are either immediatly rejected by the server due to "validate error", few are classified "marked invalid" only after comparing results with wingman's. So all of his FGRPB1G as well as FGRP7 tasks finished sucessfully. So the causes are supposedly different.

I created a new thread in the "problems and bug report" section on Giorgio's problem, what I observed in Giorgio's logfiles of his failing tasks on his Intel iGPU and what I observed for my always successful FGRPB1G tasks on a different type of Intel's iGPU:

https://einsteinathome.org/de/content/fgrpb1g-mem-alloc-problem-low-memorybound-new-lateah3012l06-tasks-older-igpus

P.S.: I sympathize with the objection regarding German energy prices even if it is futile (and unhealthy... blood pressure)... Remedy: patience until next federal elections...

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5872
Credit: 117754792099
RAC: 34835748

Beercules48 wrote:.... Then

Beercules48 wrote:
.... Then all of a sudden it decides it doesn't like my tasks again and just throws them away...

I'm not sure what the "it" is that "doesn't like my tasks" that you refer to, but ...

Beercules48 wrote:
... is enough that the Validator(tm) panics and throws my results out ...

seems to indicate that you're blaming the validator.  I know full well I'm wasting my time, but others may benefit so I'll waste it anyway.

It's nothing to do with the validator, simply because the validator was never called.  Maybe the project should call these "Sanity Fails" (or somesuch) because there is a separate process that checks the 'sanity' of a result file to see that it meets a specification for what it should contain.  This is to ensure that the validator isn't burdened with rubbish results, just ones that meet the spec.  I have no idea what the spec contains or how rigid a compliance is expected, but I do understand that things like the correct number of lines of data and the formatting of the fields in the file, fields containing only numeric in a specific range, etc. (or things like that) are important.

These are the very sorts of things that might get a bit mangled if remote desktop software somehow mishandles things.  It was just a random suggestion.  I made it because you actually mentioned that someone else had suggested something like that and a vague memory of this happening in the past was triggered.  I don't use Windows at all, so I have no way of seeing directly if anything like this ever happens.

I mentioned it as a possibility purely because you are in the perfect position to prove or disprove the notion.  If you had bothered to let BOINC return results for a day or two, and checked for 'sanity fails' and then used a lot of remote updates for comparison, you should have gained a pretty compelling clue to where the fault lay.

The next step could have been to ask the Devs to grab one of these 'sanity fails' and disclose exactly what was causing the fail.  It may well have been something as simple as an extra blank line inserted somewhere (or something else quite trivial) and then something could have been done about it.

Beercules48 wrote:
...that to me sounds more like a BOINC/Einstein issue and not that smth on my part is wrong. The GPU doesn't overheat, not even close, I check the temps with hwinfo. Both hotspot and GPU core. Both are somewhere in the 60s°C after hours of running. PSU is a 750W unit and the GPU is limited by AMD to 150W. Also the W5700 Pros cannot (reasonably*) be overclocked so it is entirely stock. *with a custom BIOS it is possible but I haven't bothered.

You weren't being blamed for "smth on my part".  You were being given a standard list of the sorts of things that have caused 'sanity fails' in the past.  These overwhelmingly have been caused by the hardware related issues listed.  I'm happy that your hardware is good.  I'd be even happier if a definitive cause could be found in this particular case.

Maybe the sanity check is too strict and maybe it could be improved as a result of knowing the real cause.

If you're still listening, why not see if you can sort this problem out?

Cheers,
Gary.

Scrooge McDuck
Scrooge McDuck
Joined: 2 May 07
Posts: 1057
Credit: 17955983
RAC: 11712

Gary Roberts schrieb:These

Gary Roberts wrote:
These are the very sorts of things that might get a bit mangled if remote desktop software somehow mishandles things.

Remote Desktop sessions immediately remove the GPU (don't know if all type of GPUs, but surely Intel iGPUs) from the BOINC client. There's a log message in the BOINC manager's messages tab on removing/disabling GPU as soon as a RemoteDesktop session is initiated. So there's at least one partly finished GPU task and further tasks in initial state (0.00% progress) now missing the necessary GPU. Sometimes the running GPU task fails or returns useless results. To regain the GPU within BOINC, the BOINC client has to be restarted which isn't possible through a Remote Desktop session as this blocks the GPU again. That's my experience with BOINC and remote controlling windows computers with Remote Desktop. To get around the problem, I use SSH servers (installing cygwin and configuring necessary packages for SSH on each windows system) on Windows hosts and control BOINC clients remotely via ssh session and BOINC's command line interface "boinccmd". Controlling specific tasks isn't comfortable at all, but it works. You can even stop or restart the BOINC client this way. I don't know if all Windows versions (e.g Windows 11) are affected by this problem. I avoid using Remote Desktop if possible. If not, I stop GPU computing first through a ssh session (boinccmd --set_gpu_mode never) to prevent failing GPU tasks. There are surely better or more comfortable options to remote control your farm of Windows computers. I have none, so I'm fine with my primitive solution to prevent RemoteDesktop from shredding any GPU task.

Gary Roberts wrote:
I mentioned it as a possibility purely because you are in the perfect position to prove or disprove the notion.  If you had bothered to let BOINC return results for a day or two, and checked for 'sanity fails' and then used a lot of remote updates for comparison, you should have gained a pretty compelling clue to where the fault lay.

A naive thought: If BOINC client's network connection is disabled temporally on a remote host (e.g. 'boinccmd --set_network_mode never 3600') finished result files pile up in einsteins project directory (C:\ProgramData\BOINC\projects\einstein.phys.uwm.edu\). If using cygwin ssh session, path is: /cygdrive/c/ProgramData/BOINC... They are easily identified by their long filenames and ending with "_0", "_1", "_2", ... These result files can be copied elsewhere (not touching them in BOINC's directory). Independent on the filename and suffix they had been gziped by BOINC client prior to uploading them. So one can unzip them (e.g. easiest to rename copied file to test.txt.gz first, then unzip), then check them with a text editor. Depending on the science app, some results are in binary format (requires hex editor) others are in ASCII text format containing columns of floating point numbers. Maybe one can have a look at the basic file or text structure if something looks suspicious.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.