Possible bad batch of faulty Gravity Wave tasks

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 2143
Credit: 3032055542
RAC: 1746577
Topic 229172

Got a bunch of tasks on host 12788501, all of which errored out with:

einstein_O3MDF_1.00_x86_64-pc-linux-gnu__GW-opencl-nvidia: `--tlCompartments' requires `--FreqBand' to be divisible without rest by `--dFreq', is 7e-08

Suspending GW work for the time being.

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5887
Credit: 119542138216
RAC: 24855081

Hi Richard, Thanks for the

Hi Richard,

Thanks for the heads-up.  I presume you would have messaged Bernd directly as well?

 

Cheers,
Gary.

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 2143
Credit: 3032055542
RAC: 1746577

No, I was just on my way out

No, I was just on my way out of the door when I saw them. And I doubt he would want to intervene at this time of a Friday night!

I see the same thing on host 12808716, but that's protected from further downloads by the same preference change.

cecht
cecht
Joined: 7 Mar 18
Posts: 1612
Credit: 3022257023
RAC: 1362251

Richard Haselgrove

Richard Haselgrove wrote:

Got a bunch of tasks on host 12788501, all of which errored out with:

einstein_O3MDF_1.00_x86_64-pc-linux-gnu__GW-opencl-nvidia: `--tlCompartments' requires `--FreqBand' to be divisible without rest by `--dFreq', is 7e-08

Suspending GW work for the time being.

I'm getting the same error with:

einstein_O3MDF_1.00_x86_64-pc-linux-gnu__GW-opencl-ati

These are all with the VelaJr tasks.

Ideas are not fixed, nor should they be; we live in model-dependent reality.

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5887
Credit: 119542138216
RAC: 24855081

Richard Haselgrove

Richard Haselgrove wrote:

... I doubt he would want to intervene at this time of a Friday night!

I've sent him a message so hopefully he might get it in the morning.

He may want to use remote access at least to stop sending faulty tasks.  Otherwise nothing might happen until Monday.

 

Cheers,
Gary.

Ron Kosinski
Ron Kosinski
Joined: 23 Mar 05
Posts: 57
Credit: 1126943680
RAC: 871281

It's just the 1.03 GW GPU

It's just the 1.03 GW GPU tasks that are failing.  I am running some 1.03 GW CPU tasks with no issues.

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 2143
Credit: 3032055542
RAC: 1746577

Yes, mine were GPU tasks too

Yes, mine were GPU tasks too (for NVidia), so it might be another application problem - I didn't have any CPU tasks for comparison. But from the error message I posted, it seemed possible that the search parameters were at fault. All possibilities will have to be explored.

Edit - I looked through some of my failed tasks on the original machine, but couldn't find any that had been reallocated to a CPU. They'd all been tried (and failed) by other users on GPUs, and then cancelled.

Link
Link
Joined: 15 Mar 20
Posts: 137
Credit: 13098819
RAC: 38240

Richard Haselgrove wrote:Edit

Richard Haselgrove wrote:
Edit - I looked through some of my failed tasks on the original machine, but couldn't find any that had been reallocated to a CPU. They'd all been tried (and failed) by other users on GPUs, and then cancelled.

O3MDF does not have any CPU apps, it would be very unusual if they suddenly became O3MD1 tasks.

.

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 2143
Credit: 3032055542
RAC: 1746577

Bernd cancelled the batch

Bernd cancelled the batch (see tech news)

It's been re-started now. I got one bad task on the first attempt, but the ones being issued now seem to be working.

mikey
mikey
Joined: 22 Jan 05
Posts: 12924
Credit: 1884454640
RAC: 44078

Richard Haselgrove

Richard Haselgrove wrote:

Bernd cancelled the batch (see tech news)

It's been re-started now. I got one bad task on the first attempt, but the ones being issued now seem to be working. 

thanks

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.