gpu task taking 10 times as long. Stuck at 67% ? Should I abort?

AgentB
AgentB
Joined: 17 Mar 12
Posts: 915
Credit: 513211304
RAC: 0

RE: RE: RE: RE: The

Quote:
Quote:
Quote:
Quote:
The cure is to stop and restart BOINC. Suspending the task does not work if you have the pref setting 'keep tasks in memory when suspended' set.

Why? GPU tasks are Always removed from memory when suspended. The 'keep tasks in memory when suspended' setting relates to CPU tasks only.

I was commenting about observed behaviour with AMD GPUs. I didn't ever notice the particular problem of very slow running tasks with nvidia GPUs.

All GPU apps, be they Nvidia, AMD or Intel should exit memory when suspended (assuming they're not running a buggy boinc api)


This is true on my machines, and worse (depending on your perspective) it does not checkpoint nor wait until next checkpoint before exiting.

Quote:

The difference between just suspending tasks, and restarting Boinc/the OS, is that Boinc/OS is possibly still using the same core/thread for feeding the app,
restarting Boinc/OS, most likely will switch which core/thread is being used for the feeding.

What seems to happen, when suspended, is the GPU application is completed removed from OS running processes, but it is left behind in the boinc "slots" along with its last checkpoint (cpt) file. I'm guessing this is also true for CPU apps if the "keep in memory" flag is off.

I never see any affinity for tasks to cores on linux, and in fact each task is serviced randomly by all cores during its runtime. Of course the apps remain in memory while running.

Below is three boinc "top" outputs, the interesting column is "P" which is the physical core, and this changes at least once per second. I suspended a FGRP4 between the two - it stopped and another started but his time the task remained in memory and on the task list. If i suspended a BRP6, then it would disappear.

[pre]
top - 17:19:58 up 52 days, 4:29, 4 users, load average: 2.20, 1.94, 1.84
Tasks: 206 total, 2 running, 204 sleeping, 0 stopped, 0 zombie
Cpu(s): 0.2%us, 2.8%sy, 26.7%ni, 69.5%id, 0.0%wa, 0.0%hi, 0.8%si, 0.0%st
Mem: 3924512k total, 3475204k used, 449308k free, 255976k buffers
Swap: 7812088k total, 13192k used, 7798896k free, 1671844k cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ P TIME COMMAND
25322 boinc 39 19 218m 208m 2724 R 100 5.4 484:53.56 1 484:53 hsgamma_FGRP4_1
25441 boinc 39 19 16.3g 186m 47m S 5 4.9 10:18.89 3 10:18 einsteinbinary_
25467 boinc 39 19 16.3g 158m 47m S 5 4.1 7:14.20 2 7:14 einsteinbinary_
25487 boinc 39 19 16.3g 116m 47m S 5 3.1 4:45.40 0 4:45 einsteinbinary_
25472 boinc 39 19 16.3g 154m 47m S 5 4.0 6:43.82 2 6:43 einsteinbinary_
1695 boinc 39 19 16.1g 34m 5436 S 0 0.9 129:06.87 2 129:06 boinc

---------------------------------------------------
top - 17:20:08 up 52 days, 4:29, 4 users, load average: 2.10, 1.92, 1.83
Tasks: 206 total, 2 running, 204 sleeping, 0 stopped, 0 zombie
Cpu(s): 0.3%us, 2.2%sy, 26.6%ni, 69.7%id, 0.0%wa, 0.0%hi, 1.1%si, 0.0%st
Mem: 3924512k total, 3474700k used, 449812k free, 255976k buffers
Swap: 7812088k total, 13192k used, 7798896k free, 1671844k cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ P TIME COMMAND
25322 boinc 39 19 218m 208m 2724 R 100 5.4 485:03.56 1 485:03 hsgamma_FGRP4_1
25467 boinc 39 19 16.3g 158m 47m S 5 4.1 7:14.75 3 7:14 einsteinbinary_
25472 boinc 39 19 16.3g 154m 47m S 5 4.0 6:44.34 2 6:44 einsteinbinary_
25441 boinc 39 19 16.3g 186m 47m S 5 4.9 10:19.43 2 10:19 einsteinbinary_
25487 boinc 39 19 16.3g 116m 47m S 5 3.1 4:45.92 2 4:45 einsteinbinary_
1695 boinc 39 19 16.1g 34m 5436 S 0 0.9 129:06.89 2 129:06 boinc

---------------------------------------------------
top - 17:21:23 up 52 days, 4:31, 4 users, load average: 1.91, 1.89, 1.82
Tasks: 207 total, 2 running, 205 sleeping, 0 stopped, 0 zombie
Cpu(s): 0.2%us, 1.8%sy, 27.3%ni, 69.6%id, 0.1%wa, 0.0%hi, 1.0%si, 0.0%st
Mem: 3924512k total, 3693144k used, 231368k free, 255976k buffers
Swap: 7812088k total, 13192k used, 7798896k free, 1671896k cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ P TIME COMMAND
25680 boinc 39 19 412m 402m 2712 R 100 10.5 0:48.19 3 0:48 hsgamma_FGRP4_1
25467 boinc 39 19 16.3g 158m 47m S 5 4.1 7:18.76 1 7:18 einsteinbinary_
25441 boinc 39 19 16.3g 186m 47m S 5 4.9 10:23.20 2 10:23 einsteinbinary_
25487 boinc 39 19 16.3g 116m 47m S 5 3.1 4:49.69 2 4:49 einsteinbinary_
25472 boinc 39 19 16.3g 154m 47m S 5 4.0 6:48.16 2 6:48 einsteinbinary_
1695 boinc 39 19 16.1g 34m 5444 S 0 0.9 129:07.08 2 129:07 boinc
25322 boinc 39 19 31320 20m 2724 S 0 0.5 485:30.48 1 485:30 hsgamma_FGRP4_1

---------------------------------------------------
[/pre]

hth.

jay
jay
Joined: 25 Jan 07
Posts: 99
Credit: 84044023
RAC: 0

Hi, I'm still having

Hi,
I'm still having problems connecting the dots....

About: "uncheck" Gamma-ray pulsar search #4

Does "Gamma-ray pulsar search #4" translate to "BRP4G"?????
This was not annotated as a GPU task in the choices
on the website - Einstein project preferences.
And it isn't obvious to me..

Which relates to BRP6 on a gpu???

  • Binary Radio Pulsar Search (Arecibo)

OR

  • Binary Radio Pulsar Search (Arecibo, GPU)

Or

  • Binary Radio Pulsar Search (Parkes PMPS XT)

Or some combination???

I guess I'm saying that the Einstein preferences pages names and the names of the BOINC Manager task-page displays
aren't completely obvious....

I *now* dawns on me that I should look at previous_client_state.xml
and make a reference table between the
and the .

previous_client_state.xml infers that Binary Radio Pulsar Search (Parkes PMPS XT)
is related to einsteinbinary_BRP6.

I just wasn't thinking of looking into the .xml file....

Can something be added to the website - in the area of project choices??
... always room for improvement....
( My most hated HR phrase... :-) )

Thanks again,
Jay

AgentB
AgentB
Joined: 17 Mar 12
Posts: 915
Credit: 513211304
RAC: 0

RE: Hi, I'm still having

Quote:
Hi,
I'm still having problems connecting the dots....

Yes it's not obvious but the link E@H Applications should help.

Gary Roberts
Gary Roberts
Moderator
Joined: 9 Feb 05
Posts: 5872
Credit: 117719005671
RAC: 35004486

RE: I guess I'm saying that

Quote:
I guess I'm saying that the Einstein preferences pages names and the names of the BOINC Manager task-page displays
aren't completely obvious....

Yes, it's not obvious if you haven't been following for quite a long time :-).

The applications page, as suggested, is probably the best page to see the relationships (in combination with the project preferences page).

We all use too many acronyms. These are some current ones:-
BRP = Binary Radio Pulsar
FGRP = Fermi Gamma Ray Pulsar
LAT = Large Area Telescope (as carried by NASA's Fermi satellite)
PMPS = Parkes Multibeam Pulsar Survey

There are no current Gravity Wave (GW) searches but there soon(ish) should be with brand new higher sensitivity data from advanced LIGO (Laser Interferometer Gravitational-wave Observatory) detectors. There should be some extra acronyms floating around then ;-).

The number attached to an acronym, eg BRP4 or BRP6, is a sort of version number. There would have been a BRP1 (and it would have been CPU only when E@H first started looking for pulsars in radio telescope data, early 2009 I think. That would have been using Arecibo telescope data only. Along the way the numbers have increased, Parkes radio telescope data has been added to the mix and the apps have largely shifted from CPU to GPU. Because of the power of discrete GPUs, tasks are now 'bundled' to give a suitable computation length. For low power devices (eg Android) 'single' BRP4 tasks are used (credit = 62.5) and these are referred to as BRP4 (Arecibo). BRP4G tasks (Arecibo GPU) are just a bundle of 16 BRP4 tasks suitable for higher power GPUs.

I don't remember exactly when Parkes data started but both BRP5 and BRP6 have been runs using data from this telescope. A certain number of pulsars were discovered in this data originally and quite a few extra ones since E@H has been analysing (and re-analysing at higher sensitivity) this same data.

The gamma ray pulsar search kicked off in 2011 with FGRP1 and we are now at FGRP4. This is currently a CPU only run, although a GPU app was tested for a short while at one stage.

The E@H home page contains a number of interesting links to discoveries made in the various pulsar searches.

Cheers,
Gary.

jay
jay
Joined: 25 Jan 07
Posts: 99
Credit: 84044023
RAC: 0

Thanks again! I looked at

Thanks again!

I looked at the explanation of E@H applications,
and LO and BEHOLD! There was the 'boinc name' in the section titles.

I now see the obvious...

Thanks again for your patience and explanations!!!

Jay

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.