Error occured on Tuesday, October 29, 2019 at 16:55:49.
C:\Boinc_data\projects\einstein.phys.uwm.edu\einstein_O2MD1_2.02_windows_x86_64__GW-opencl-ati.exe caused a Breakpoint at location 39de9522 in module C:\WINDOWS\System32\KERNELBASE.dll.
Yes RX 580. Maybe those errors indicated somekind of a hardware problem. It's strange that after those tasks in the middle there were then tasks that run fine and even validated succesfully. But there was that last error and after that again tasks seemed to run normally. But just as I was watching the progress... tasks suddenly started crashing again after about 40 secs. Four in line... until I stopped crunching and reported them.
Boinc Event log said "Output file h1_0504..... for task h1_0504..... absent" for all those failing tasks.
(I shortened those task names)
And corresponding Stderr ouputs said the same as earlier for the three: "The target internal file identifier is incorrect.(0x72) - exit code 114 (0x72)".
I see there's also these lines:
ERROR: data missing at end of SFT#513 (GPS 1187586915.000000) expected bin 909462, bin 908999 read from file '..\..\projects\einstein.phys.uwm.edu\h1_0504.95_O2C02Cl1In0.UUap'
Plus many times some kind of "I/O error". That's why I think maybe it's a hardware problem or unstability. I will give a fresh reboot for that machine and hope the errors won't start to continue in line.
edit: Windows upgraded itself which took some time, but that supported to perform a few reboots too. I restarted Boinc now and one tasks which had been already running for a few minutes was able to finish in normal time (around 20 minutes). BUT... all the tasks that started in parallel with it seem to crash in about 40 secs. And they crash even if running 1x... with exit code 114. Same as earlier. I don't know what is wrong. Is it this machine or those tasks. I've set NNT and stopped crunching on that host for now. There is about 30 tasks in queue, but I'll leave them waiting for another session.
Indeed there's still something wrong with the validator. File deletion has been turned off deliberately, so re-validation can be done easily. I'm working on it.
I reinstalled (clean install) AMD driver 19.10.2 that had been already in use. Then I started Boinc and run the remaining tasks.
All tasks with 'h1_0504..... O2MD1G2.... 505... Hz' in their names kept crashing in 40 secs.
Among the remaining tasks was one task from a different group, a white sheep with name 'h1_0316... O2MD1n... 316... Hz'. This taks completed without problems in 22-23 min and validated succesfully.
Those different groups of tasks must have some differencies in such a way that one group is compatible with my system and the other group isn't. Those O2MD1G2 task that crashed were sent to this host early today on 30th. In fact, the white sheep was also sent at the same time.
Now this host received a few fresh O2MD1G2 505Hz stuff and they kept just crashing again, so I can't let it download more.
This looks like a problem with the workunit setup that we have already seen in O2AS. Seems to be an anomality in the application code around 505Hz. For now I turned off sending O2MD1 tasks until I know how to circumvent this.
Interesting, the first The
)
Interesting, the first The requested page could not be found, the other 2 were on ATI cards.
Sorry... that first link had
)
Sorry... that first link had extra space at the end.... here's a corrected link: https://einsteinathome.org/task/893410246
* There was also this one that had been reported at the same second with the middle one:
https://einsteinathome.org/task/893410245
But that crashed with different error after almost three hours of running:
Client state:Compute error Exit status:197 (0x000000C5) EXIT_TIME_LIMIT_EXCEEDED
Error occured on Tuesday, October 29, 2019 at 16:55:49.
C:\Boinc_data\projects\einstein.phys.uwm.edu\einstein_O2MD1_2.02_windows_x86_64__GW-opencl-ati.exe caused a Breakpoint at location 39de9522 in module C:\WINDOWS\System32\KERNELBASE.dll.
Well then all 3 were on ATI
)
Well then all 3 were on ATI cards.
Yes RX 580. Maybe those
)
Yes RX 580. Maybe those errors indicated somekind of a hardware problem. It's strange that after those tasks in the middle there were then tasks that run fine and even validated succesfully. But there was that last error and after that again tasks seemed to run normally. But just as I was watching the progress... tasks suddenly started crashing again after about 40 secs. Four in line... until I stopped crunching and reported them.
Boinc Event log said "Output file h1_0504..... for task h1_0504..... absent" for all those failing tasks.
(I shortened those task names)
And corresponding Stderr ouputs said the same as earlier for the three: "The target internal file identifier is incorrect. (0x72) - exit code 114 (0x72)".
I see there's also these lines:
ERROR: data missing at end of SFT#513 (GPS 1187586915.000000) expected bin 909462, bin 908999 read from file '..\..\projects\einstein.phys.uwm.edu\h1_0504.95_O2C02Cl1In0.UUap'
Plus many times some kind of "I/O error". That's why I think maybe it's a hardware problem or unstability. I will give a fresh reboot for that machine and hope the errors won't start to continue in line.
edit: Windows upgraded itself which took some time, but that supported to perform a few reboots too. I restarted Boinc now and one tasks which had been already running for a few minutes was able to finish in normal time (around 20 minutes). BUT... all the tasks that started in parallel with it seem to crash in about 40 secs. And they crash even if running 1x... with exit code 114. Same as earlier. I don't know what is wrong. Is it this machine or those tasks. I've set NNT and stopped crunching on that host for now. There is about 30 tasks in queue, but I'll leave them waiting for another session.
Indeed there's still
)
Indeed there's still something wrong with the validator. File deletion has been turned off deliberately, so re-validation can be done easily. I'm working on it.
BM
I reinstalled (clean install)
)
I reinstalled (clean install) AMD driver 19.10.2 that had been already in use. Then I started Boinc and run the remaining tasks.
All tasks with 'h1_0504..... O2MD1G2.... 505... Hz' in their names kept crashing in 40 secs.
Among the remaining tasks was one task from a different group, a white sheep with name 'h1_0316... O2MD1n... 316... Hz'. This taks completed without problems in 22-23 min and validated succesfully.
Those different groups of tasks must have some differencies in such a way that one group is compatible with my system and the other group isn't. Those O2MD1G2 task that crashed were sent to this host early today on 30th. In fact, the white sheep was also sent at the same time.
Now this host received a few fresh O2MD1G2 505Hz stuff and they kept just crashing again, so I can't let it download more.
Richie wrote:I All tasks with
)
Could you report these? As of now we don't have any such (finished) tasks in the database.
BM
I have reported them
)
I have reported them immediately after they run.
Here are links to the tasks that I resumed after they had been suspended over night:
https://einsteinathome.org/task/893546031
https://einsteinathome.org/task/893594967
https://einsteinathome.org/task/893612640
https://einsteinathome.org/task/893615702
https://einsteinathome.org/task/893626397
https://einsteinathome.org/task/893626407
https://einsteinathome.org/task/893626409
https://einsteinathome.org/task/893626415
https://einsteinathome.org/task/893634134
https://einsteinathome.org/task/893634385
https://einsteinathome.org/task/893634683
https://einsteinathome.org/task/893637055
https://einsteinathome.org/task/893637101
https://einsteinathome.org/task/893637129
https://einsteinathome.org/task/893637510
https://einsteinathome.org/task/893640140
https://einsteinathome.org/task/893640144
https://einsteinathome.org/task/893640149
https://einsteinathome.org/task/893647191
https://einsteinathome.org/task/893647409
https://einsteinathome.org/task/893647487
https://einsteinathome.org/task/893650041
https://einsteinathome.org/task/893655988
https://einsteinathome.org/task/893674476
https://einsteinathome.org/task/893674481
https://einsteinathome.org/task/893682704
https://einsteinathome.org/task/893682706
https://einsteinathome.org/task/893682773
https://einsteinathome.org/task/893682777
https://einsteinathome.org/task/893682838
https://einsteinathome.org/task/893710495
https://einsteinathome.org/task/893710499
https://einsteinathome.org/task/893719616
Here's the white sheep O2MD1Gn that run succesfully somewhere in the middle:
https://einsteinathome.org/task/893679387
Thanks a lot! This looks
)
Thanks a lot!
This looks like a problem with the workunit setup that we have already seen in O2AS. Seems to be an anomality in the application code around 505Hz. For now I turned off sending O2MD1 tasks until I know how to circumvent this.
BM
Mine all finnished with an
)
Mine all finnished with an error, for example:
https://einsteinathome.org/task/893062516
Aborted the remaining tasks, 1.0.7 worked fine on that machine.