Discussion Thread for the Continuous GW Search known as O2MD1 (now O2MDF - GPUs only)

Betreger
Betreger
Joined: 25 Feb 05
Posts: 992
Credit: 1592442348
RAC: 776761

Interesting, the first The

Interesting, the first The requested page could not be found, the other 2 were on  ATI cards.

Richie
Richie
Joined: 7 Mar 14
Posts: 656
Credit: 1702989778
RAC: 0

Sorry... that first link had

Sorry... that first link had extra space at the end.... here's a corrected link: https://einsteinathome.org/task/893410246

 * There was also this one that had been reported at the same second with the middle one:

https://einsteinathome.org/task/893410245

But that crashed with different error after almost three hours of running:

Client state:Compute error Exit status:197 (0x000000C5) EXIT_TIME_LIMIT_EXCEEDED

Error occured on Tuesday, October 29, 2019 at 16:55:49.

C:\Boinc_data\projects\einstein.phys.uwm.edu\einstein_O2MD1_2.02_windows_x86_64__GW-opencl-ati.exe caused a Breakpoint at location 39de9522 in module C:\WINDOWS\System32\KERNELBASE.dll.

Betreger
Betreger
Joined: 25 Feb 05
Posts: 992
Credit: 1592442348
RAC: 776761

Well then all 3 were on ATI

Well then all 3 were on ATI cards.

Richie
Richie
Joined: 7 Mar 14
Posts: 656
Credit: 1702989778
RAC: 0

Yes RX 580. Maybe those

Yes RX 580. Maybe those errors indicated somekind of a hardware problem. It's strange that after those tasks in the middle there were then tasks that run fine and even validated succesfully. But there was that last error and after that again tasks seemed to run normally. But just as I was watching the progress... tasks suddenly started crashing again after about 40 secs. Four in line... until I stopped crunching and reported them.

Boinc Event log said "Output file h1_0504..... for task h1_0504..... absent" for all those failing tasks.

(I shortened those task names)

And corresponding Stderr ouputs said the same as earlier for the three: "The target internal file identifier is incorrect. (0x72) - exit code 114 (0x72)".

I see there's also these lines:

ERROR: data missing at end of SFT#513 (GPS 1187586915.000000) expected bin 909462, bin 908999 read from file '..\..\projects\einstein.phys.uwm.edu\h1_0504.95_O2C02Cl1In0.UUap'

Plus many times some kind of "I/O error". That's why I think maybe it's a hardware problem or unstability. I will give a fresh reboot for that machine and hope the errors won't start to continue in line.

edit: Windows upgraded itself which took some time, but that supported to perform a few reboots too. I restarted Boinc now and one tasks which had been already running for a few minutes was able to finish in normal time (around 20 minutes). BUT... all the tasks that started in parallel with it seem to crash in about 40 secs. And they crash even if running 1x... with exit code 114. Same as earlier. I don't know what is wrong. Is it this machine or those tasks. I've set NNT and stopped crunching on that host for now. There is about 30 tasks in queue, but I'll leave them waiting for another session.

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4312
Credit: 250601258
RAC: 34581

Indeed there's still

Indeed there's still something wrong with the validator. File deletion has been turned off deliberately, so re-validation can be done easily. I'm working on it.

BM

Richie
Richie
Joined: 7 Mar 14
Posts: 656
Credit: 1702989778
RAC: 0

I reinstalled (clean install)

I reinstalled (clean install) AMD driver 19.10.2 that had been already in use. Then I started Boinc and run the remaining tasks.

All tasks with 'h1_0504..... O2MD1G2.... 505... Hz' in their names kept crashing in 40 secs.

Among the remaining tasks was one task from a different group, a white sheep with name 'h1_0316... O2MD1n... 316... Hz'. This taks completed without problems in 22-23 min and validated succesfully.

Those different groups of tasks must have some differencies in such a way that one group is compatible with my system and the other group isn't. Those O2MD1G2 task that crashed were sent to this host early today on 30th. In fact, the white sheep was also sent at the same time.

Now this host received a few fresh O2MD1G2 505Hz stuff and they kept just crashing again, so I can't let it download more.

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4312
Credit: 250601258
RAC: 34581

Richie wrote:I All tasks with

Richie wrote:
I All tasks with 'h1_0504..... O2MD1G2.... 505... Hz' in their names kept crashing in 40 secs.

Could you report these? As of now we don't have any such (finished) tasks in the database.

BM

Richie
Richie
Joined: 7 Mar 14
Posts: 656
Credit: 1702989778
RAC: 0

I have reported them

I have reported them immediately after they run.

Here are links to the tasks that I resumed after they had been suspended over night:

https://einsteinathome.org/task/893546031

https://einsteinathome.org/task/893594967

https://einsteinathome.org/task/893612640

https://einsteinathome.org/task/893615702

https://einsteinathome.org/task/893626397

https://einsteinathome.org/task/893626407

https://einsteinathome.org/task/893626409

https://einsteinathome.org/task/893626415

https://einsteinathome.org/task/893634134

https://einsteinathome.org/task/893634385

https://einsteinathome.org/task/893634683

https://einsteinathome.org/task/893637055

https://einsteinathome.org/task/893637101

https://einsteinathome.org/task/893637129

https://einsteinathome.org/task/893637510

https://einsteinathome.org/task/893640140

https://einsteinathome.org/task/893640144

https://einsteinathome.org/task/893640149

https://einsteinathome.org/task/893647191

https://einsteinathome.org/task/893647409

https://einsteinathome.org/task/893647487

https://einsteinathome.org/task/893650041

https://einsteinathome.org/task/893655988

https://einsteinathome.org/task/893674476

https://einsteinathome.org/task/893674481

https://einsteinathome.org/task/893682704

https://einsteinathome.org/task/893682706

https://einsteinathome.org/task/893682773

https://einsteinathome.org/task/893682777

https://einsteinathome.org/task/893682838

https://einsteinathome.org/task/893710495

https://einsteinathome.org/task/893710499

https://einsteinathome.org/task/893719616

 

Here's the white sheep O2MD1Gn that run succesfully somewhere in the middle:

https://einsteinathome.org/task/893679387

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4312
Credit: 250601258
RAC: 34581

Thanks a lot! This looks

Thanks a lot!

This looks like a problem with the workunit setup that we have already seen in O2AS. Seems to be an anomality in the application code around 505Hz. For now I turned off sending O2MD1 tasks until I know how to circumvent this.

BM

B.I.G
B.I.G
Joined: 26 Oct 07
Posts: 117
Credit: 1177109306
RAC: 983293

Mine all finnished with an

Mine all finnished with an error, for example:

https://einsteinathome.org/task/893062516

Aborted the remaining tasks, 1.0.7 worked fine on that machine.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.