BRP4 1.31/1.32 GPU app release: feedback thread

Alex
Alex
Joined: 1 Mar 05
Posts: 451
Credit: 502961644
RAC: 5572

Hello folks, not long ago

Hello folks,

not long ago there was a discussion here @ einstein about performance and the old, old question: nVidia or AMD?

Now things have changed.
Looking at the top hosts list, everyone can see:
AMD leading and on positions 3,4 and 6

No, this post is not sponsored, it's just interesting how fast things can change.

Alexander

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 2142
Credit: 2818766419
RAC: 881426

Just happened to notice a

Just happened to notice a case where BRP4 cuda v1.32 continued to run and show steady %age progress increase when it should have been suspended.

BOINC (v7.0.38) on Windows 7/64-bit decided to run benchmarks:

Quote:
02/12/2012 11:36:49 | | Suspending computation - CPU benchmarks in progress
02/12/2012 11:36:49 | Einstein@Home | [cpu_sched] Preempting p2030.20111210.G177.12-02.64.S.b0s0g0.00000_664_0 (left in memory)
02/12/2012 11:37:21 | | Resuming computation
02/12/2012 11:37:21 | Einstein@Home | [cpu_sched] Resuming p2030.20111210.G177.12-02.64.S.b0s0g0.00000_664_0


[that '(left in memory)' is correct: GPU tasks are always flushed from VRAM memory when suspended, except in the special case of suspension for benchmarking]

But the stderr_txt for task 324106164 shows no interruption around that time - it even committed a checkpoint while 'suspended':

Quote:
[11:26:26][4468][INFO ] CUDA global memory status (GPU setup complete):
------> Used in total: 260 MB (764 MB free / 1024 MB total) -> Used by this application (assuming a single GPU task): 204 MB
[11:26:56][4468][INFO ] Checkpoint committed!
[11:27:56][4468][INFO ] Checkpoint committed!
[11:28:56][4468][INFO ] Checkpoint committed!
[11:29:56][4468][INFO ] Checkpoint committed!
[11:30:56][4468][INFO ] Checkpoint committed!
[11:31:56][4468][INFO ] Checkpoint committed!
[11:32:56][4468][INFO ] Checkpoint committed!
[11:33:56][4468][INFO ] Checkpoint committed!
[11:34:57][4468][INFO ] Checkpoint committed!
[11:35:56][4468][INFO ] Checkpoint committed!
[11:36:56][4468][INFO ] Checkpoint committed!
[11:37:57][4468][INFO ] Checkpoint committed!
[11:38:17][4468][INFO ] Statistics: count dirty SumSpec pages 1329 (not checkpointed), Page Size 1024, fundamental_idx_hi-window_2: 329052
[11:38:17][4468][INFO ] Data processing finished successfully!
Bikeman (Heinz-Bernd Eggenstein)
Bikeman (Heinz-...
Moderator
Joined: 28 Aug 06
Posts: 3522
Credit: 692158372
RAC: 2427

Hi! Very interesting

Hi!

Very interesting indeed to know that this happens during benchmarking as well, as that is easily triggered. The new app version has extra code in it to double check if BOINC as asked the app to suspend. We had really hoped that this would help :-(

Cheers
HB

Richard Haselgrove
Richard Haselgrove
Joined: 10 Dec 05
Posts: 2142
Credit: 2818766419
RAC: 881426

The plot thickens. Here's the

The plot thickens. Here's the benchmark record for a fast (GTX 670) card running two tasks at once.

Quote:
04/12/2012 10:16:30 | | Running CPU benchmarks
04/12/2012 10:16:30 | | Suspending computation - CPU benchmarks in progress
04/12/2012 10:16:30 | Einstein@Home | [cpu_sched] Preempting p2030.20111211.G192.85-04.06.N.b3s0g0.00000_1336_0 (left in memory)
04/12/2012 10:16:30 | Einstein@Home | [cpu_sched] Preempting p2030.20111211.G192.72-04.29.N.b6s0g0.00000_2728_0 (left in memory)
04/12/2012 10:17:02 | | Resuming computation
04/12/2012 10:17:02 | Einstein@Home | [cpu_sched] Resuming p2030.20111211.G192.85-04.06.N.b3s0g0.00000_1336_0
04/12/2012 10:17:02 | Einstein@Home | [cpu_sched] Resuming p2030.20111211.G192.72-04.29.N.b6s0g0.00000_2728_0


Task 324775302 seemed to continue uninterrupted, with a checkpoint during suspension:

Quote:
[10:14:33][3044][INFO ] CUDA global memory status (GPU setup complete):
------> Used in total: 549 MB (1500 MB free / 2049 MB total) -> Used by this application (assuming a single GPU task): 207 MB
[10:14:46][3044][INFO ] Checkpoint committed!
[10:15:46][3044][INFO ] Checkpoint committed!
[10:16:46][3044][INFO ] Checkpoint committed!
[10:17:46][3044][INFO ] Checkpoint committed!
[10:17:58][3044][INFO ] Statistics: count dirty SumSpec pages 3773 (not checkpointed), Page Size 1024, fundamental_idx_hi-window_2: 329052
[10:17:58][3044][INFO ] Data processing finished successfully!


But its partner task 324777457delayed the checkpoint until computation resumed:

Quote:
[10:14:38][3228][INFO ] CUDA global memory status (GPU setup complete):
------> Used in total: 549 MB (1500 MB free / 2049 MB total) -> Used by this application (assuming a single GPU task): 204 MB
[10:14:51][3228][INFO ] Checkpoint committed!
[10:15:51][3228][INFO ] Checkpoint committed!
[10:17:02][3228][INFO ] Checkpoint committed!
[10:18:02][3228][INFO ] Checkpoint committed!
[10:19:02][3228][INFO ] Statistics: count dirty SumSpec pages 1186 (not checkpointed), Page Size 1024, fundamental_idx_hi-window_2: 329052
[10:19:02][3228][INFO ] Data processing finished successfully!


Comparing the run times of the 8 individual sub-tasks, it does look as if one of them was interrupted, and the other took advantage of the freed resources - though I wasn't in a position to see whether either or both the the progress %ages continued to count up.

00:03:54 ... 00:03:52
00:03:47 ... 00:03:49
00:04:00 ... 00:04:00
00:03:25 ... 00:04:24
00:03:49 ... 00:03:51
00:03:54 ... 00:03:53
00:03:54 ... 00:03:54
00:03:47 ... 00:03:51

Blackbird
Blackbird
Joined: 28 Jan 07
Posts: 2
Credit: 46771832
RAC: 377712

I posted this on the BOINC

I posted this on the BOINC forums and they directed me here:

I have this intermittent problem: sometimes when I return to my computer after it has been idle and there computing using BOINC, all my BOINC applications will stop computing, but not the Einstein GPU one. My computer is set to only compute when idle, including for the GPU. Thoughts?

I'm using 7.0.31, Mas OS 10.8.2.

Below is my start up messages:

Quote:
Tue Dec 18 09:41:54 2012 | | No config file found - using defaults
Tue Dec 18 09:41:54 2012 | | Starting BOINC client version 7.0.31 for x86_64-apple-darwin
Tue Dec 18 09:41:54 2012 | | log flags: file_xfer, sched_ops, task
Tue Dec 18 09:41:54 2012 | | Libraries: libcurl/7.26.0 OpenSSL/0.9.7l zlib/1.2.5 c-ares/1.9.1
Tue Dec 18 09:41:54 2012 | | Data directory: /Library/Application Support/BOINC Data
Tue Dec 18 09:41:54 2012 | | Processor: 8 GenuineIntel Intel(R) Core(TM) i7-2820QM CPU @ 2.30GHz [x86 Family 6 Model 42 Stepping 7]
Tue Dec 18 09:41:54 2012 | | Processor features: FPU VME DE PSE TSC MSR PAE MCE CX8 APIC SEP MTRR PGE MCA CMOV PAT PSE36 CLFSH DS ACPI MMX FXSR SSE SSE2 SS HTT TM PBE SSE3 PCLMULQDQ DTES64 MON DSCPL VMX SMX EST TM2 SSSE3 CX16 TPR PDCM SSE4.1 SSE4.2 xAPIC POPCNT AES PCID XSAVE OSXSAVE TSCTMR AVX1.0
Tue Dec 18 09:41:54 2012 | | OS: Mac OS X 10.8.2 (Darwin 12.2.0)
Tue Dec 18 09:41:54 2012 | | Memory: 8.00 GB physical, 109.25 GB virtual
Tue Dec 18 09:41:54 2012 | | Disk: 464.96 GB total, 109.00 GB free
Tue Dec 18 09:41:54 2012 | | Local time is UTC -5 hours
Tue Dec 18 09:41:54 2012 | | OpenCL: ATI GPU 0: ATI Radeon HD 6750M (driver version 1.0, device version OpenCL 1.1, 1024MB, 1024MB available)
Tue Dec 18 09:41:54 2012 | rosetta@home | URL http://boinc.bakerlab.org/rosetta/; Computer ID 1494725; resource share 15
Tue Dec 18 09:41:54 2012 | climateprediction.net | URL http://climateprediction.net/; Computer ID 1256030; resource share 10
Tue Dec 18 09:41:54 2012 | Einstein@Home | URL http://einstein.phys.uwm.edu/; Computer ID 6086500; resource share 10
Tue Dec 18 09:41:54 2012 | SETI@home | URL http://setiathome.berkeley.edu/; Computer ID 6247099; resource share 10
Tue Dec 18 09:41:54 2012 | World Community Grid | URL http://www.worldcommunitygrid.org/; Computer ID 2224268; resource share 55
Tue Dec 18 09:41:54 2012 | World Community Grid | General prefs: from World Community Grid (last modified 25-May-2011 07:52:02)
Tue Dec 18 09:41:54 2012 | World Community Grid | Host location: none
Tue Dec 18 09:41:54 2012 | World Community Grid | General prefs: using your defaults
Tue Dec 18 09:41:54 2012 | | Reading preferences override file
Tue Dec 18 09:41:54 2012 | | Preferences:
Tue Dec 18 09:41:54 2012 | | max memory usage when active: 5324.80MB
Tue Dec 18 09:41:54 2012 | | max memory usage when idle: 7782.40MB
Tue Dec 18 09:41:54 2012 | | max disk usage: 4.00GB
Tue Dec 18 09:41:54 2012 | | don't compute while active
Tue Dec 18 09:41:54 2012 | | don't use GPU while active
Tue Dec 18 09:41:54 2012 | | suspend work if non-BOINC CPU load exceeds 25 %
Tue Dec 18 09:41:54 2012 | | (to change preferences, visit the web site of an attached project, or select Preferences in the Manager)
Tue Dec 18 09:41:54 2012 | World Community Grid | Task SN2S_AAW88547_0000116_0457_1 is 0.03 days overdue; you may not get credit for it. Consider aborting it.
Tue Dec 18 09:41:54 2012 | | Not using a proxy
Tue Dec 18 09:41:57 2012 | | Suspending computation - computer is in use
Tue Dec 18 09:41:57 2012 | | Suspending network activity - computer is in use
Tue Dec 18 09:42:36 2012 | World Community Grid | General prefs: from World Community Grid (last modified 25-May-2011 07:52:02)
Tue Dec 18 09:42:36 2012 | World Community Grid | Host location: none
Tue Dec 18 09:42:36 2012 | World Community Grid | General prefs: using your defaults
Tue Dec 18 09:42:36 2012 | | Reading preferences override file
Tue Dec 18 09:42:36 2012 | | Preferences:
Tue Dec 18 09:42:36 2012 | | max memory usage when active: 5324.80MB
Tue Dec 18 09:42:36 2012 | | max memory usage when idle: 7782.40MB
Tue Dec 18 09:42:36 2012 | | max disk usage: 4.00GB
Tue Dec 18 09:42:36 2012 | | don't compute while active
Tue Dec 18 09:42:36 2012 | | don't use GPU while active
Tue Dec 18 09:42:36 2012 | | suspend work if non-BOINC CPU load exceeds 25 %
Tue Dec 18 09:42:36 2012 | | (to change preferences, visit the web site of an attached project, or select Preferences in the Manager)
Sun Badger*
Sun Badger*
Joined: 15 Nov 09
Posts: 12
Credit: 3698677
RAC: 0

1) My task ok this computer

1) My task ok this computer started downloading and then stopped
2)Recive task they finish stops are not retrived nor am I given new work.
???????????

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.