Maximum elapsed time exceeded

Saenger
Saenger
Joined: 15 Feb 05
Posts: 403
Credit: 33009522
RAC: 0
Topic 195741

My last WU lasted for over 25h on my GPU, that's considerably too long, and it got the error mentioned in the title.

6.10.58

Maximum elapsed time exceeded

I usually have no problems with the app, so I'm a bit lost as what happened and why.

Grüße vom Sänger

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 2143
Credit: 2988219926
RAC: 704545

Maximum elapsed time exceeded

What's your time zone?

I'm trying to correlate

Sent 8 Apr 2011 11:37:33 UTC 
Received 9 Apr 2011 15:20:21 UTC 


from the header for task 226440325, with

[15:57:41][32236][INFO ] Starting data processing...
[17:17:41][32236][INFO ] Checkpoint committed!


from

All the processing times logged seem monotonic, but we can't see from the data if an extra day crept in anywhere - that's the only way it could get up to 91,160.12 seconds of runtime. You didn't change the system clock at all, did you?

Saenger
Saenger
Joined: 15 Feb 05
Posts: 403
Credit: 33009522
RAC: 0

RE: What's your time

Quote:
What's your time zone?


CEST, that's UTC+2
The clock changed itself some time ago once summer time began, iirc end of march.

Grüße vom Sänger

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 2143
Credit: 2988219926
RAC: 704545

RE: RE: What's your time

Quote:
Quote:
What's your time zone?

CEST, that's UTC+2
The clock changed itself some time ago once summer time began, iirc end of march.


Presumably it only changed by an hour, not a whole day? That's what's missing.

Saenger
Saenger
Joined: 15 Feb 05
Posts: 403
Credit: 33009522
RAC: 0

The WU really ran for 25h, my

The WU really ran for 25h, my machine is running 24/7, and no other GPU-project alive here. As you can see it was the only WU for a whole day.
In my messages it looked like this:

08-Apr-2011 13:37:10 [Einstein@Home] Started download of PM0079_01451_252.binary
08-Apr-2011 13:37:10 [Einstein@Home] Started download of PM0079_01451_253.binary
08-Apr-2011 13:37:10 [Einstein@Home] Started download of PM0079_01451_254.binary
08-Apr-2011 13:37:22 [Einstein@Home] Finished download of PM0079_01451_253.binary
08-Apr-2011 13:37:22 [Einstein@Home] Started download of PM0079_01451_255.binary
08-Apr-2011 13:37:25 [Einstein@Home] Finished download of PM0079_01451_254.binary
08-Apr-2011 13:37:33 [Einstein@Home] Finished download of PM0079_01451_252.binary
08-Apr-2011 13:37:35 [Einstein@Home] Finished download of PM0079_01451_255.binary
08-Apr-2011 15:57:41 [Einstein@Home] Starting PM0079_01451.dm_252_0
08-Apr-2011 15:57:41 [Einstein@Home] Starting task PM0079_01451.dm_252_0 using einsteinbinary_BRP3 version 108
08-Apr-2011 15:58:45 [Einstein@Home] [checkpoint_debug] result PM0079_01451.dm_252_0 checkpointed
08-Apr-2011 15:59:46 [Einstein@Home] [checkpoint_debug] result PM0079_01451.dm_252_0 checkpointed
08-Apr-2011 16:00:48 [Einstein@Home] [checkpoint_debug] result PM0079_01451.dm_252_0 checkpointed
08-Apr-2011 16:01:50 [Einstein@Home] [checkpoint_debug] result PM0079_01451.dm_252_0 checkpointed
08-Apr-2011 16:02:51 [Einstein@Home] [checkpoint_debug] result PM0079_01451.dm_252_0 checkpointed
08-Apr-2011 16:03:51 [Einstein@Home] [checkpoint_debug] result PM0079_01451.dm_252_0 checkpointed
08-Apr-2011 16:04:53 [Einstein@Home] [checkpoint_debug] result PM0079_01451.dm_252_0 checkpointed
08-Apr-2011 16:05:55 [Einstein@Home] [checkpoint_debug] result PM0079_01451.dm_252_0 checkpointed
08-Apr-2011 16:06:57 [Einstein@Home] [checkpoint_debug] result PM0079_01451.dm_252_0 checkpointed
08-Apr-2011 16:07:58 [Einstein@Home] [checkpoint_debug] result PM0079_01451.dm_252_0 checkpointed
08-Apr-2011 16:08:59 [Einstein@Home] [checkpoint_debug] result PM0079_01451.dm_252_0 checkpointed
08-Apr-2011 16:10:00 [Einstein@Home] [checkpoint_debug] result PM0079_01451.dm_252_0 checkpointed
08-Apr-2011 16:11:02 [Einstein@Home] [checkpoint_debug] result PM0079_01451.dm_252_0 checkpointed
08-Apr-2011 16:12:04 [Einstein@Home] [checkpoint_debug] result PM0079_01451.dm_252_0 checkpointed
08-Apr-2011 16:13:04 [Einstein@Home] [checkpoint_debug] result PM0079_01451.dm_252_0 checkpointed
08-Apr-2011 16:14:06 [Einstein@Home] [checkpoint_debug] result PM0079_01451.dm_252_0 checkpointed
08-Apr-2011 16:15:07 [Einstein@Home] [checkpoint_debug] result PM0079_01451.dm_252_0 checkpointed
08-Apr-2011 16:16:08 [Einstein@Home] [checkpoint_debug] result PM0079_01451.dm_252_0 checkpointed
08-Apr-2011 16:17:10 [Einstein@Home] [checkpoint_debug] result PM0079_01451.dm_252_0 checkpointed
08-Apr-2011 16:18:12 [Einstein@Home] [checkpoint_debug] result PM0079_01451.dm_252_0 checkpointed
08-Apr-2011 16:19:14 [Einstein@Home] [checkpoint_debug] result PM0079_01451.dm_252_0 checkpointed
08-Apr-2011 16:20:15 [Einstein@Home] [checkpoint_debug] result PM0079_01451.dm_252_0 checkpointed
08-Apr-2011 16:21:17 [Einstein@Home] [checkpoint_debug] result PM0079_01451.dm_252_0 checkpointed
08-Apr-2011 16:22:19 [Einstein@Home] [checkpoint_debug] result PM0079_01451.dm_252_0 checkpointed
08-Apr-2011 16:23:20 [Einstein@Home] [checkpoint_debug] result PM0079_01451.dm_252_0 checkpointed
08-Apr-2011 16:24:22 [Einstein@Home] [checkpoint_debug] result PM0079_01451.dm_252_0 checkpointed
08-Apr-2011 16:25:24 [Einstein@Home] [checkpoint_debug] result PM0079_01451.dm_252_0 checkpointed
08-Apr-2011 16:26:25 [Einstein@Home] [checkpoint_debug] result PM0079_01451.dm_252_0 checkpointed
08-Apr-2011 16:27:26 [Einstein@Home] [checkpoint_debug] result PM0079_01451.dm_252_0 checkpointed
08-Apr-2011 16:28:27 [Einstein@Home] [checkpoint_debug] result PM0079_01451.dm_252_0 checkpointed
08-Apr-2011 16:29:28 [Einstein@Home] [checkpoint_debug] result PM0079_01451.dm_252_0 checkpointed
08-Apr-2011 16:30:30 [Einstein@Home] [checkpoint_debug] result PM0079_01451.dm_252_0 checkpointed
08-Apr-2011 16:31:33 [Einstein@Home] [checkpoint_debug] result PM0079_01451.dm_252_0 checkpointed
08-Apr-2011 16:32:34 [Einstein@Home] [checkpoint_debug] result PM0079_01451.dm_252_0 checkpointed
08-Apr-2011 16:33:36 [Einstein@Home] [checkpoint_debug] result PM0079_01451.dm_252_0 checkpointed
08-Apr-2011 16:34:38 [Einstein@Home] [checkpoint_debug] result PM0079_01451.dm_252_0 checkpointed
08-Apr-2011 16:35:39 [Einstein@Home] [checkpoint_debug] result PM0079_01451.dm_252_0 checkpointed
08-Apr-2011 16:36:40 [Einstein@Home] [checkpoint_debug] result PM0079_01451.dm_252_0 checkpointed
08-Apr-2011 16:37:41 [Einstein@Home] [checkpoint_debug] result PM0079_01451.dm_252_0 checkpointed
08-Apr-2011 16:38:42 [Einstein@Home] [checkpoint_debug] result PM0079_01451.dm_252_0 checkpointed
08-Apr-2011 16:39:43 [Einstein@Home] [checkpoint_debug] result PM0079_01451.dm_252_0 checkpointed
08-Apr-2011 16:40:45 [Einstein@Home] [checkpoint_debug] result PM0079_01451.dm_252_0 checkpointed
08-Apr-2011 16:41:47 [Einstein@Home] [checkpoint_debug] result PM0079_01451.dm_252_0 checkpointed
08-Apr-2011 16:42:48 [Einstein@Home] [checkpoint_debug] result PM0079_01451.dm_252_0 checkpointed
08-Apr-2011 16:43:50 [Einstein@Home] [checkpoint_debug] result PM0079_01451.dm_252_0 checkpointed
08-Apr-2011 16:44:51 [Einstein@Home] [checkpoint_debug] result PM0079_01451.dm_252_0 checkpointed
08-Apr-2011 16:45:53 [Einstein@Home] [checkpoint_debug] result PM0079_01451.dm_252_0 checkpointed
08-Apr-2011 16:46:55 [Einstein@Home] [checkpoint_debug] result PM0079_01451.dm_252_0 checkpointed
08-Apr-2011 16:47:57 [Einstein@Home] [checkpoint_debug] result PM0079_01451.dm_252_0 checkpointed
08-Apr-2011 16:48:58 [Einstein@Home] [checkpoint_debug] result PM0079_01451.dm_252_0 checkpointed
08-Apr-2011 16:49:59 [Einstein@Home] [checkpoint_debug] result PM0079_01451.dm_252_0 checkpointed
08-Apr-2011 16:51:01 [Einstein@Home] [checkpoint_debug] result PM0079_01451.dm_252_0 checkpointed
08-Apr-2011 16:52:03 [Einstein@Home] [checkpoint_debug] result PM0079_01451.dm_252_0 checkpointed
08-Apr-2011 16:53:04 [Einstein@Home] [checkpoint_debug] result PM0079_01451.dm_252_0 checkpointed
08-Apr-2011 16:54:05 [Einstein@Home] [checkpoint_debug] result PM0079_01451.dm_252_0 checkpointed
08-Apr-2011 16:55:06 [Einstein@Home] [checkpoint_debug] result PM0079_01451.dm_252_0 checkpointed
08-Apr-2011 16:56:08 [Einstein@Home] [checkpoint_debug] result PM0079_01451.dm_252_0 checkpointed
08-Apr-2011 16:57:10 [Einstein@Home] [checkpoint_debug] result PM0079_01451.dm_252_0 checkpointed
08-Apr-2011 16:58:11 [Einstein@Home] [checkpoint_debug] result PM0079_01451.dm_252_0 checkpointed
08-Apr-2011 16:59:12 [Einstein@Home] [checkpoint_debug] result PM0079_01451.dm_252_0 checkpointed
08-Apr-2011 17:00:14 [Einstein@Home] [checkpoint_debug] result PM0079_01451.dm_252_0 checkpointed
08-Apr-2011 17:01:15 [Einstein@Home] [checkpoint_debug] result PM0079_01451.dm_252_0 checkpointed
08-Apr-2011 17:02:17 [Einstein@Home] [checkpoint_debug] result PM0079_01451.dm_252_0 checkpointed
08-Apr-2011 17:03:19 [Einstein@Home] [checkpoint_debug] result PM0079_01451.dm_252_0 checkpointed
08-Apr-2011 17:04:21 [Einstein@Home] [checkpoint_debug] result PM0079_01451.dm_252_0 checkpointed
08-Apr-2011 17:05:23 [Einstein@Home] [checkpoint_debug] result PM0079_01451.dm_252_0 checkpointed
08-Apr-2011 17:06:24 [Einstein@Home] [checkpoint_debug] result PM0079_01451.dm_252_0 checkpointed
08-Apr-2011 17:07:25 [Einstein@Home] [checkpoint_debug] result PM0079_01451.dm_252_0 checkpointed
08-Apr-2011 17:08:27 [Einstein@Home] [checkpoint_debug] result PM0079_01451.dm_252_0 checkpointed
08-Apr-2011 17:09:28 [Einstein@Home] [checkpoint_debug] result PM0079_01451.dm_252_0 checkpointed
08-Apr-2011 17:10:29 [Einstein@Home] [checkpoint_debug] result PM0079_01451.dm_252_0 checkpointed
08-Apr-2011 17:11:32 [Einstein@Home] [checkpoint_debug] result PM0079_01451.dm_252_0 checkpointed
08-Apr-2011 17:12:34 [Einstein@Home] [checkpoint_debug] result PM0079_01451.dm_252_0 checkpointed
08-Apr-2011 17:13:35 [Einstein@Home] [checkpoint_debug] result PM0079_01451.dm_252_0 checkpointed
08-Apr-2011 17:14:37 [Einstein@Home] [checkpoint_debug] result PM0079_01451.dm_252_0 checkpointed
08-Apr-2011 17:15:39 [Einstein@Home] [checkpoint_debug] result PM0079_01451.dm_252_0 checkpointed
08-Apr-2011 17:16:40 [Einstein@Home] [checkpoint_debug] result PM0079_01451.dm_252_0 checkpointed
08-Apr-2011 17:17:42 [Einstein@Home] [checkpoint_debug] result PM0079_01451.dm_252_0 checkpointed
08-Apr-2011 17:19:51 [Einstein@Home] [sched_op_debug] Starting scheduler request
08-Apr-2011 17:19:52 [Einstein@Home] Sending scheduler request: To fetch work.
08-Apr-2011 17:19:52 [Einstein@Home] Requesting new tasks for GPU
08-Apr-2011 17:19:52 [Einstein@Home] [sched_op_debug] CPU work request: 0.00 seconds; 0.00 CPUs
08-Apr-2011 17:19:52 [Einstein@Home] [sched_op_debug] NVIDIA GPU work request: 35.17 seconds; 0.00 GPUs
08-Apr-2011 17:19:57 [Einstein@Home] Scheduler request completed: got 1 new tasks
08-Apr-2011 17:19:57 [Einstein@Home] [sched_op_debug] Server version 611
08-Apr-2011 17:19:57 [Einstein@Home] Project requested delay of 60 seconds
08-Apr-2011 17:19:57 [Einstein@Home] [sched_op_debug] estimated total CPU job duration: 0 seconds
08-Apr-2011 17:19:57 [Einstein@Home] [sched_op_debug] estimated total NVIDIA GPU job duration: 6746 seconds
08-Apr-2011 17:19:57 [Einstein@Home] [sched_op_debug] Deferring communication for 1 min 0 sec
08-Apr-2011 17:19:57 [Einstein@Home] [sched_op_debug] Reason: requested by project
08-Apr-2011 17:19:59 [Einstein@Home] Started download of PM0079_00791_368.binary
08-Apr-2011 17:19:59 [Einstein@Home] Started download of PM0079_00791_369.binary
08-Apr-2011 17:19:59 [Einstein@Home] Started download of PM0079_00791_370.binary
08-Apr-2011 17:20:19 [Einstein@Home] Finished download of PM0079_00791_370.binary
08-Apr-2011 17:20:19 [Einstein@Home] Started download of PM0079_00791_371.binary
08-Apr-2011 17:20:21 [Einstein@Home] Finished download of PM0079_00791_368.binary
08-Apr-2011 17:20:21 [Einstein@Home] Finished download of PM0079_00791_369.binary
08-Apr-2011 17:20:27 [Einstein@Home] Finished download of PM0079_00791_371.binary
09-Apr-2011 17:17:31 [Einstein@Home] Aborting task PM0079_01451.dm_252_0: exceeded elapsed time limit 91159.686236
09-Apr-2011 17:17:31 [Einstein@Home] [sched_op_debug] Deferring communication for 1 min 0 sec
09-Apr-2011 17:17:31 [Einstein@Home] [sched_op_debug] Reason: Unrecoverable error for result PM0079_01451.dm_252_0 (Maximum elapsed time exceeded)
09-Apr-2011 17:17:32 [Einstein@Home] Computation for task PM0079_01451.dm_252_0 finished
09-Apr-2011 17:17:32 [Einstein@Home] Output file PM0079_01451.dm_252_0_2 for task PM0079_01451.dm_252_0 absent
09-Apr-2011 17:17:32 [Einstein@Home] Output file PM0079_01451.dm_252_0_3 for task PM0079_01451.dm_252_0 absent
09-Apr-2011 17:17:32 [Einstein@Home] Starting PM0079_015B1.dm_140_1
09-Apr-2011 17:17:37 [Einstein@Home] Starting task PM0079_015B1.dm_140_1 using einsteinbinary_BRP3 version 108
09-Apr-2011 17:17:41 [Einstein@Home] Started upload of PM0079_01451.dm_252_0_0
09-Apr-2011 17:17:41 [Einstein@Home] Started upload of PM0079_01451.dm_252_0_1
09-Apr-2011 17:17:44 [Einstein@Home] Finished upload of PM0079_01451.dm_252_0_0
09-Apr-2011 17:17:49 [Einstein@Home] Finished upload of PM0079_01451.dm_252_0_1
09-Apr-2011 17:18:41 [Einstein@Home] [checkpoint_debug] result PM0079_015B1.dm_140_1 checkpointed
09-Apr-2011 17:19:38 [Einstein@Home] update requested by user
09-Apr-2011 17:19:42 [Einstein@Home] [checkpoint_debug] result PM0079_015B1.dm_140_1 checkpointed
09-Apr-2011 17:19:43 [Einstein@Home] [sched_op_debug] Starting scheduler request
09-Apr-2011 17:19:43 [Einstein@Home] Sending scheduler request: Requested by user.
09-Apr-2011 17:19:43 [Einstein@Home] Reporting 1 completed tasks, not requesting new tasks
09-Apr-2011 17:19:43 [Einstein@Home] [sched_op_debug] CPU work request: 0.00 seconds; 0.00 CPUs
09-Apr-2011 17:19:43 [Einstein@Home] [sched_op_debug] NVIDIA GPU work request: 0.00 seconds; 0.00 GPUs
09-Apr-2011 17:19:45 [Einstein@Home] Scheduler request completed
09-Apr-2011 17:19:45 [Einstein@Home] [sched_op_debug] Server version 611
09-Apr-2011 17:19:45 [Einstein@Home] Project requested delay of 60 seconds
09-Apr-2011 17:19:45 [Einstein@Home] [sched_op_debug] handle_scheduler_reply(): got ack for result PM0079_01451.dm_252_0

It stopped to checkpoint at 08-Apr-2011 17:17:42 and ran until 09-Apr-2011 17:17:31, that's 24h hours later. The time to the last checkpoint was about the time a WU usually takes, I have no idea why it didn't stop correct but somehow keep hanging there.

Grüße vom Sänger

Bikeman (Heinz-Bernd Eggenstein)
Bikeman (Heinz-...
Moderator
Joined: 28 Aug 06
Posts: 3522
Credit: 775661463
RAC: 1254555

Hi! Hmm...the task failed

Hi!

Hmm...the task failed to checkpoint at 17:18 on the 8th. It really looks like the task itself got stuck. I would keep an eye on that card.

CU
HB

Saenger
Saenger
Joined: 15 Feb 05
Posts: 403
Credit: 33009522
RAC: 0

RE: I would keep an eye on

Quote:
I would keep an eye on that card.


I'll do.
But I wonder what's better for you: 100% Einstein on a GT240 (now) or 50% Einstein and 50% Milky on a less-than-100€ double precision Nvidia (if I have to replace it ;)

Grüße vom Sänger

Bikeman (Heinz-Bernd Eggenstein)
Bikeman (Heinz-...
Moderator
Joined: 28 Aug 06
Posts: 3522
Credit: 775661463
RAC: 1254555

RE: RE: I would keep an

Quote:
Quote:
I would keep an eye on that card.

I'll do.
But I wonder what's better for you: 100% Einstein on a GT240 (now) or 50% Einstein and 50% Milky on a less-than-100€ double precision Nvidia (if I have to replace it ;)

What kind of DP card do you have in mind? All the cheaper Fermi models seem to be severely limited in DP performance.

CU
HB

Saenger
Saenger
Joined: 15 Feb 05
Posts: 403
Credit: 33009522
RAC: 0

RE: RE: RE: I would

Quote:
Quote:
Quote:
I would keep an eye on that card.

I'll do.
But I wonder what's better for you: 100% Einstein on a GT240 (now) or 50% Einstein and 50% Milky on a less-than-100€ double precision Nvidia (if I have to replace it ;)

What kind of DP card do you have in mind? All the cheaper Fermi models seem to be severely limited in DP performance.


None specific, I will look what's on the market and what's in my budget once the need arises.
And I will look at WUProp for performance in Einstein and Milky.

Grüße vom Sänger

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.