ABP2 CPU-only applications

MarkJ
MarkJ
Joined: 28 Feb 08
Posts: 437
Credit: 137331180
RAC: 18170

I got a whole bunch last

I got a whole bunch last night which I have completed.

This seems to have validated against another Win7 x64 box but the XP x64 got a validate error. It might not be related because I got the wu afterwards.

From checking the successful wu some of them are against linux boxes

Bikeman (Heinz-Bernd Eggenstein)
Bikeman (Heinz-...
Moderator
Joined: 28 Aug 06
Posts: 3522
Credit: 686179252
RAC: 550603

RE: I got a whole bunch

Message 96492 in response to message 96491

Quote:

I got a whole bunch last night which I have completed.

This seems to have validated against another Win7 x64 box but the XP x64 got a validate error. It might not be related because I got the wu afterwards.

From checking the successful wu some of them are against linux boxes


There seems to be something wrong with this particular x64 host, it produces lots of validation errors, ands if you look inside the result logs, you see that the app on that host checkpoints every second, which is quite insane.

CU
HB

MarkJ
MarkJ
Joined: 28 Feb 08
Posts: 437
Credit: 137331180
RAC: 18170

RE: RE: I got a whole

Message 96493 in response to message 96492

Quote:
Quote:

I got a whole bunch last night which I have completed.

This seems to have validated against another Win7 x64 box but the XP x64 got a validate error. It might not be related because I got the wu afterwards.

From checking the successful wu some of them are against linux boxes


There seems to be something wrong with this particular x64 host, it produces lots of validation errors, ands if you look inside the result logs, you see that the app on that host checkpoints every second, which is quite insane.

CU
HB

I'll send Fred a PM in case he isn't aware.

Jim Wilkins
Jim Wilkins
Joined: 1 Jun 05
Posts: 33
Credit: 28426884
RAC: 0

Are we still fast tracking

Are we still fast tracking ABP2 results back to E@H or can we let BOINC takes its course?

Thanks,
Jim

Bikeman (Heinz-Bernd Eggenstein)
Bikeman (Heinz-...
Moderator
Joined: 28 Aug 06
Posts: 3522
Credit: 686179252
RAC: 550603

RE: Are we still fast

Message 96495 in response to message 96494

Quote:

Are we still fast tracking ABP2 results back to E@H or can we let BOINC takes its course?

Thanks,
Jim

Hi Jim!

The transition phase is over, you can let BOINC figure out what to run first.

Happy crunching
HB

Jim Wilkins
Jim Wilkins
Joined: 1 Jun 05
Posts: 33
Credit: 28426884
RAC: 0

RE: RE: Are we still fast

Message 96496 in response to message 96495

Quote:
Quote:

Are we still fast tracking ABP2 results back to E@H or can we let BOINC takes its course?

Thanks,
Jim

Hi Jim!

The transition phase is over, you can let BOINC figure out what to run first.

Happy crunching
HB

Thanks.

Jim

Fred J. Verster
Fred J. Verster
Joined: 27 Apr 08
Posts: 118
Credit: 22451438
RAC: 0

RE: The transition phase is

Message 96497 in response to message 96496

Quote:

The transition phase is over, you can let BOINC figure out what to run first.

Happy crunching
HB

Thanks.

Jim

Hi, answered the PM I got from MarkJ, due to checkpoint writing every second, for whatever reason.

Quote:

There seems to be something wrong with this particular x64 host, it produces lots of validation errors, ands if you look inside the result logs, you see that the app on that host checkpoints every second, which is quite insane.

CU
HB

Hi, I've looked at my task's on the stat's/task page. Noticed these error's.
Will keep an eye on that host, the BOINC install got messed up, due to forgetting to name the BOINC DIR, so it's all over the X:\Documents and Settings dir?!?
It's running SETI Optimized (SSSE3x) and CUDA (9800GTX+)(& QX9650) stock, now.
Also Einstein@Home on CPU+GPU.
Again sorry to my wingmen (MarkJ) & others(?) and my late reaction, too.
Noticed 2 kinds of WU, some take 3 hours (CPU-only) and 30 minutes, which uses the GPU, too.
I found 9 WU's with these checkpoint error's
Last:
21 Jan 2010 6:39:51 UTC 23 Jan 2010 10:18:14 UTC Over Validate error Done 2,089.97 15.23 ---
1st:
19 Jan 2010 2:30:37 UTC 19 Jan 2010 19:29:44 UTC Over Validate error Done 2,091.88 15.06 ---

FredJV.

M. Schmitt
M. Schmitt
Joined: 27 Jun 05
Posts: 478
Credit: 15872262
RAC: 0

Got two(161484922, 161484846)

Got two(161484922, 161484846) signal 11 results on my i7 920 root server with the quad ABP2 WUs yesterday. Before and afterwords everything works like a charm. Both WUs crashed at the same time, no hints in the system log files.
Interesting thing is I still get APP2 WUs stamped to be done by app 1.08, while the actual app should be 1.11.

Part of the BOINC log(UTC +1h):

18-Feb-2010 21:24:35 [Einstein@Home] Starting task p2030_54655_15712_0042_G55.43-00.09.C_0.dm_464_0 using einsteinbinary_ABP2 version 108
18-Feb-2010 21:24:37 [Einstein@Home] Started upload of h1_1102.85_S5R4__390_S5R6a_0_0
18-Feb-2010 21:26:58 [Einstein@Home] Finished upload of h1_1102.85_S5R4__390_S5R6a_0_0
18-Feb-2010 21:27:01 [Einstein@Home] Sending scheduler request: To report completed tasks.
18-Feb-2010 21:27:01 [Einstein@Home] Reporting 1 completed tasks, not requesting new tasks
18-Feb-2010 21:27:11 [Einstein@Home] Scheduler request completed
18-Feb-2010 22:01:42 [Einstein@Home] Task h1_1114.55_S5R4__630_S5R6a_1 exited with zero status but no 'finished' file
18-Feb-2010 22:01:42 [Einstein@Home] If this happens repeatedly you may need to reset the project.
18-Feb-2010 22:01:43 [Einstein@Home] Restarting task h1_1114.55_S5R4__630_S5R6a_1 using einstein_S5R6 version 101
18-Feb-2010 22:01:44 [Einstein@Home] Task h1_1114.45_S5R4__585_S5R6a_0 exited with zero status but no 'finished' file
18-Feb-2010 22:01:44 [Einstein@Home] If this happens repeatedly you may need to reset the project.
18-Feb-2010 22:01:44 [Einstein@Home] Restarting task h1_1114.45_S5R4__585_S5R6a_0 using einstein_S5R6 version 101
18-Feb-2010 22:01:45 [Einstein@Home] Task h1_1102.95_S5R4__509_S5R6a_0 exited with zero status but no 'finished' file
18-Feb-2010 22:01:45 [Einstein@Home] If this happens repeatedly you may need to reset the project.
18-Feb-2010 22:01:45 [Einstein@Home] Restarting task h1_1102.95_S5R4__509_S5R6a_0 using einstein_S5R6 version 101
18-Feb-2010 22:01:47 [Einstein@Home] Task h1_1102.95_S5R4__508_S5R6a_0 exited with zero status but no 'finished' file
18-Feb-2010 22:01:47 [Einstein@Home] If this happens repeatedly you may need to reset the project.
18-Feb-2010 22:01:47 [Einstein@Home] Restarting task h1_1102.95_S5R4__508_S5R6a_0 using einstein_S5R6 version 101
18-Feb-2010 22:01:48 [Einstein@Home] Task h1_1102.95_S5R4__507_S5R6a_0 exited with zero status but no 'finished' file
18-Feb-2010 22:01:48 [Einstein@Home] If this happens repeatedly you may need to reset the project.
18-Feb-2010 22:01:48 [Einstein@Home] Restarting task h1_1102.95_S5R4__507_S5R6a_0 using einstein_S5R6 version 101
18-Feb-2010 22:01:50 [Einstein@Home] Task h1_1102.80_S5R4__193_S5R6a_2 exited with zero status but no 'finished' file
18-Feb-2010 22:01:50 [Einstein@Home] If this happens repeatedly you may need to reset the project.
18-Feb-2010 22:01:50 [Einstein@Home] Restarting task h1_1102.80_S5R4__193_S5R6a_2 using einstein_S5R6 version 101
18-Feb-2010 22:01:51 [Einstein@Home] Computation for task p2030_54655_15712_0042_G55.43-00.09.C_0.dm_548_1 finished
18-Feb-2010 22:01:51 [Einstein@Home] Output file p2030_54655_15712_0042_G55.43-00.09.C_0.dm_548_1_0 for task p2030_54655_15712_0042_G55.43-00.09.C_0.dm_548_1 absent
18-Feb-2010 22:01:51 [Einstein@Home] Output file p2030_54655_15712_0042_G55.43-00.09.C_0.dm_548_1_1 for task p2030_54655_15712_0042_G55.43-00.09.C_0.dm_548_1 absent
18-Feb-2010 22:01:51 [Einstein@Home] Output file p2030_54655_15712_0042_G55.43-00.09.C_0.dm_548_1_2 for task p2030_54655_15712_0042_G55.43-00.09.C_0.dm_548_1 absent
18-Feb-2010 22:01:51 [Einstein@Home] Output file p2030_54655_15712_0042_G55.43-00.09.C_0.dm_548_1_3 for task p2030_54655_15712_0042_G55.43-00.09.C_0.dm_548_1 absent
18-Feb-2010 22:01:51 [Einstein@Home] Starting h1_1102.85_S5R4__389_S5R6a_0
18-Feb-2010 22:01:51 [Einstein@Home] Starting task h1_1102.85_S5R4__389_S5R6a_0 using einstein_S5R6 version 101
18-Feb-2010 22:01:53 [Einstein@Home] Computation for task p2030_54655_15712_0042_G55.43-00.09.C_0.dm_464_0 finished
18-Feb-2010 22:01:53 [Einstein@Home] Output file p2030_54655_15712_0042_G55.43-00.09.C_0.dm_464_0_0 for task p2030_54655_15712_0042_G55.43-00.09.C_0.dm_464_0 absent
18-Feb-2010 22:01:53 [Einstein@Home] Output file p2030_54655_15712_0042_G55.43-00.09.C_0.dm_464_0_1 for task p2030_54655_15712_0042_G55.43-00.09.C_0.dm_464_0 absent
18-Feb-2010 22:01:53 [Einstein@Home] Output file p2030_54655_15712_0042_G55.43-00.09.C_0.dm_464_0_2 for task p2030_54655_15712_0042_G55.43-00.09.C_0.dm_464_0 absent
18-Feb-2010 22:01:53 [Einstein@Home] Output file p2030_54655_15712_0042_G55.43-00.09.C_0.dm_464_0_3 for task p2030_54655_15712_0042_G55.43-00.09.C_0.dm_464_0 absent
18-Feb-2010 22:01:53 [Einstein@Home] Starting h1_1102.85_S5R4__388_S5R6a_0
18-Feb-2010 22:01:53 [Einstein@Home] Starting task h1_1102.85_S5R4__388_S5R6a_0 using einstein_S5R6 version 101
18-Feb-2010 22:02:54 [Einstein@Home] Sending scheduler request: To report completed tasks.
18-Feb-2010 22:02:54 [Einstein@Home] Reporting 2 completed tasks, not requesting new tasks
18-Feb-2010 22:04:56 [---] Project communication failed: attempting access to reference site
18-Feb-2010 22:04:59 [---] Internet access OK - project servers may be temporarily down.
18-Feb-2010 22:04:59 [Einstein@Home] Scheduler request failed: Timeout was reached
18-Feb-2010 22:22:24 [Einstein@Home] Task h1_1114.55_S5R4__630_S5R6a_1 exited with zero status but no 'finished' file
18-Feb-2010 22:22:24 [Einstein@Home] If this happens repeatedly you may need to reset the project.
18-Feb-2010 22:22:24 [Einstein@Home] Restarting task h1_1114.55_S5R4__630_S5R6a_1 using einstein_S5R6 version 101
18-Feb-2010 22:22:24 [Einstein@Home] Sending scheduler request: To report completed tasks.
18-Feb-2010 22:22:24 [Einstein@Home] Reporting 2 completed tasks, not requesting new tasks
18-Feb-2010 22:22:26 [Einstein@Home] Task h1_1114.45_S5R4__585_S5R6a_0 exited with zero status but no 'finished' file
18-Feb-2010 22:22:26 [Einstein@Home] If this happens repeatedly you may need to reset the project.

cu,
Michael

Bernd Machenschalk
Bernd Machenschalk
Moderator
Administrator
Joined: 15 Oct 04
Posts: 4267
Credit: 244933393
RAC: 16304

RE: Got two(161484922,

Message 96499 in response to message 96498

Quote:
Got two(161484922, 161484846) signal 11 results on my i7 920 root server with the quad ABP2 WUs yesterday. Before and afterwords everything works like a charm. Both WUs crashed at the same time, no hints in the system log files.

Interesting. Anything running on this machine that could eati up memory at that time?

Quote:
Interesting thing is I still get APP2 WUs stamped to be done by app 1.08, while the actual app should be 1.11.

The CUDA App version is at 1.11; the CPU App is 1.08. That's ok.

BM

BM

M. Schmitt
M. Schmitt
Joined: 27 Jun 05
Posts: 478
Credit: 15872262
RAC: 0

RE: RE: Got

Message 96500 in response to message 96499

Quote:
Quote:
Got two(161484922, 161484846) signal 11 results on my i7 920 root server with the quad ABP2 WUs yesterday. Before and afterwords everything works like a charm. Both WUs crashed at the same time, no hints in the system log files.

Interesting. Anything running on this machine that could eati up memory at that time?

Quote:
Interesting thing is I still get APP2 WUs stamped to be done by app 1.08, while the actual app should be 1.11.

The CUDA App version is at 1.11; the CPU App is 1.08. That's ok.

BM

The server has 8GB RAM and low load, just some Apache instances, mail server and PosgreSQL running.

[pre]free
total used free shared buffers cached
Mem: 8172168 3611740 4560428 0 514704 2337156
-/+ buffers/cache: 759880 7412288
Swap: 4200888 0 4200888[/pre]

Disks run in LVM Raid 1 config.
Only thing that "might" have happened is that both WUs were run on the same physical core. This host got only a few quad APP2 WUs so far, but in a about 5 days there are 25 in a row. If necessary, I can suspend all earlier WUs for farther research.

cu,
Michael

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.