Possible bad batch of faulty Gravity Wave tasks

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 2143
Credit: 3040731341
RAC: 1947490
Topic 229172

Got a bunch of tasks on host 12788501, all of which errored out with:

einstein_O3MDF_1.00_x86_64-pc-linux-gnu__GW-opencl-nvidia: `--tlCompartments' requires `--FreqBand' to be divisible without rest by `--dFreq', is 7e-08

Suspending GW work for the time being.

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5888
Credit: 119632084314
RAC: 24957400

Hi Richard, Thanks for the

Hi Richard,

Thanks for the heads-up.  I presume you would have messaged Bernd directly as well?

 

Cheers,
Gary.

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 2143
Credit: 3040731341
RAC: 1947490

No, I was just on my way out

No, I was just on my way out of the door when I saw them. And I doubt he would want to intervene at this time of a Friday night!

I see the same thing on host 12808716, but that's protected from further downloads by the same preference change.

cecht
cecht
Joined: 7 Mar 18
Posts: 1616
Credit: 3027710275
RAC: 1405269

Richard Haselgrove

Richard Haselgrove wrote:

Got a bunch of tasks on host 12788501, all of which errored out with:

einstein_O3MDF_1.00_x86_64-pc-linux-gnu__GW-opencl-nvidia: `--tlCompartments' requires `--FreqBand' to be divisible without rest by `--dFreq', is 7e-08

Suspending GW work for the time being.

I'm getting the same error with:

einstein_O3MDF_1.00_x86_64-pc-linux-gnu__GW-opencl-ati

These are all with the VelaJr tasks.

Ideas are not fixed, nor should they be; we live in model-dependent reality.

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5888
Credit: 119632084314
RAC: 24957400

Richard Haselgrove

Richard Haselgrove wrote:

... I doubt he would want to intervene at this time of a Friday night!

I've sent him a message so hopefully he might get it in the morning.

He may want to use remote access at least to stop sending faulty tasks.  Otherwise nothing might happen until Monday.

 

Cheers,
Gary.

Ron Kosinski
Ron Kosinski
Joined: 23 Mar 05
Posts: 57
Credit: 1129622552
RAC: 833967

It's just the 1.03 GW GPU

It's just the 1.03 GW GPU tasks that are failing.  I am running some 1.03 GW CPU tasks with no issues.

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 2143
Credit: 3040731341
RAC: 1947490

Yes, mine were GPU tasks too

Yes, mine were GPU tasks too (for NVidia), so it might be another application problem - I didn't have any CPU tasks for comparison. But from the error message I posted, it seemed possible that the search parameters were at fault. All possibilities will have to be explored.

Edit - I looked through some of my failed tasks on the original machine, but couldn't find any that had been reallocated to a CPU. They'd all been tried (and failed) by other users on GPUs, and then cancelled.

Link
Link
Joined: 15 Mar 20
Posts: 137
Credit: 13099256
RAC: 26542

Richard Haselgrove wrote:Edit

Richard Haselgrove wrote:
Edit - I looked through some of my failed tasks on the original machine, but couldn't find any that had been reallocated to a CPU. They'd all been tried (and failed) by other users on GPUs, and then cancelled.

O3MDF does not have any CPU apps, it would be very unusual if they suddenly became O3MD1 tasks.

.

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 2143
Credit: 3040731341
RAC: 1947490

Bernd cancelled the batch

Bernd cancelled the batch (see tech news)

It's been re-started now. I got one bad task on the first attempt, but the ones being issued now seem to be working.

mikey
mikey
Joined: 22 Jan 05
Posts: 12937
Credit: 1884473078
RAC: 32462

Richard Haselgrove

Richard Haselgrove wrote:

Bernd cancelled the batch (see tech news)

It's been re-started now. I got one bad task on the first attempt, but the ones being issued now seem to be working. 

thanks

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.