Einstein hogs too much CPU cycles on Linux 2.6.8

Michael Karlinsky
Michael Karlinsky
Joined: 22 Jan 05
Posts: 888
Credit: 23502182
RAC: 0

There might be a reason for

There might be a reason for the missing CPU times.
(Observed this on an HP-UX system at work, so not BOINC
related.) If a lot of processes are spawned in a short time
interval, this uses CPU time, but will not be seen via top.
(httpd respawned due to bad httpd.conf in my case.)

You can check this by starting a program, like xterm, use
ps to determine the pid, write it down. Now start another xterm
and get pid via ps again. Both pid's must be close, if not
than there is a problem with a rapidly restarting process.

HTH

Michael

Walt Gribben
Walt Gribben
Joined: 20 Feb 05
Posts: 219
Credit: 1645393
RAC: 0

RE: There might be a reason

Message 17282 in response to message 17281

Quote:

There might be a reason for the missing CPU times.
(Observed this on an HP-UX system at work, so not BOINC
related.) If a lot of processes are spawned in a short time
interval, this uses CPU time, but will not be seen via top.
(httpd respawned due to bad httpd.conf in my case.)

You can check this by starting a program, like xterm, use
ps to determine the pid, write it down. Now start another xterm
and get pid via ps again. Both pid's must be close, if not
than there is a problem with a rapidly restarting process.

HTH

Michael

Top would show that, if you sorted by PID.

Theres a lot of info you can pull out of a 60 second before_and_after picture of your system. I run this command:

top -b -d 60 -n 2 >toplist

and then display the output in two terminal sessions - one terminal shows the before section, the other terminal shows the after section. Switching between the two terminals really brings out the differences.

It does help having a text editor that lets me sort the process lines, so I can reorder by CPU time and compare, then by PID and compare, then by memory and compare. I ought to script all that, its really handy as a process performance tool.

Walt

Walt

Anonyymi
Anonyymi
Joined: 6 Jul 05
Posts: 17
Credit: 1270182
RAC: 0

I ran top -b -d 60 -n 2

Message 17283 in response to message 17282

I ran

top -b -d 60 -n 2 >toplist

the results.

top - 16:11:56 up 6 days, 17:38, 3 users, load average: 1.71, 1.91, 1.73
Tasks: 52 total, 3 running, 49 sleeping, 0 stopped, 0 zombie
Cpu(s): 3.1% us, 1.8% sy, 65.7% ni, 28.9% id, 0.3% wa, 0.1% hi, 0.0% si
Mem: 191820k total, 185596k used, 6224k free, 61228k buffers
Swap: 188960k total, 4112k used, 184848k free, 53616k cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
11808 boinc 34 19 18196 4948 1692 S 48.3 2.6 2049:58 einstein_4.81_i
1 root 16 0 1596 536 472 S 0.0 0.3 0:11.55 init
2 root 34 19 0 0 0 S 0.0 0.0 0:00.05 ksoftirqd/0
3 root 10 -5 0 0 0 S 0.0 0.0 0:08.99 events/0
4 root 16 -5 0 0 0 S 0.0 0.0 0:00.06 khelper
5 root 10 -5 0 0 0 S 0.0 0.0 0:00.00 kthread
23 root 10 -5 0 0 0 S 0.0 0.0 0:00.90 kblockd/0
64 root 20 -5 0 0 0 S 0.0 0.0 0:00.00 aio/0
63 root 15 0 0 0 0 S 0.0 0.0 0:02.76 kswapd0
650 root 15 0 0 0 0 S 0.0 0.0 0:00.01 kseriod
905 root 15 0 0 0 0 S 0.0 0.0 0:30.34 kjournald
947 root 11 -4 1584 468 404 S 0.0 0.2 0:00.45 udevd
1573 root 15 0 0 0 0 S 0.0 0.0 0:03.90 kapmd
2283 root 15 0 0 0 0 S 0.0 0.0 0:00.00 pccardd
2291 root 19 0 0 0 0 S 0.0 0.0 0:00.00 pccardd
2373 root 15 0 0 0 0 S 0.0 0.0 0:00.00 khubd
3357 root 17 0 0 0 0 S 0.0 0.0 0:00.00 kgameportd
3805 root 19 0 0 0 0 S 0.0 0.0 0:00.00 kIrDAd
4112 root 15 0 2464 856 736 S 0.0 0.4 0:00.00 dhclient
4810 root 16 0 2352 868 748 S 0.0 0.5 0:01.78 syslogd
4843 root 15 0 2472 1528 472 S 0.0 0.8 0:00.38 klogd
4854 root 16 0 1596 564 500 S 0.0 0.3 0:00.28 apmd
4862 boinc 15 0 4300 2692 1620 R 0.0 1.4 54:59.60 boinc_client
4873 messageb 18 0 2208 828 716 S 0.0 0.4 0:00.00 dbus-daemon-1
4882 root 19 0 2336 756 676 S 0.0 0.4 0:00.00 inetd
4916 root 18 0 1616 688 516 S 0.0 0.4 0:00.00 cardmgr
4930 root 16 0 4068 4068 3380 S 0.0 2.1 0:44.97 ntpd
4933 daemon 16 0 1808 592 572 S 0.0 0.3 0:00.00 atd
4936 root 16 0 1868 696 644 S 0.0 0.4 0:00.68 cron
4949 root 17 0 1596 428 424 S 0.0 0.2 0:00.00 getty
4950 root 16 0 1596 428 424 S 0.0 0.2 0:00.00 getty
5009 root 16 0 1592 428 424 S 0.0 0.2 0:00.00 getty
5017 root 16 0 1592 428 424 S 0.0 0.2 0:00.00 getty
5102 anonyy 16 0 4112 1924 1736 S 0.0 1.0 0:02.18 gconfd-2
6469 lp 18 0 2572 936 796 S 0.0 0.5 0:00.00 lpd
6472 lp 15 0 2576 956 812 R 0.0 0.5 15:23.43 lpd
7574 root 15 0 0 0 0 S 0.0 0.0 0:09.09 pdflush
7652 root 15 0 0 0 0 S 0.0 0.0 0:00.15 pdflush
10687 root 16 0 3616 1600 1324 S 0.0 0.8 0:00.03 sshd
11511 anonyy 16 0 4112 2208 1796 S 0.0 1.2 0:01.37 gconfd-2
11541 anonyy 16 0 3664 1972 1224 S 0.0 1.0 0:00.18 bash
12609 anonyy 18 0 3316 1552 984 S 0.0 0.8 0:00.04 startx
12625 anonyy 16 0 2468 676 580 S 0.0 0.4 0:00.00 xinit
12626 root 15 0 22888 12m 2556 S 0.0 6.7 0:09.89 XFree86
12643 anonyy 15 0 5340 2708 2144 S 0.0 1.4 0:00.57 fvwm
12644 anonyy 16 0 5808 2696 2092 S 0.0 1.4 0:00.16 xterm
12645 anonyy 15 0 3636 1952 1228 S 0.0 1.0 0:00.04 bash
12646 anonyy 16 0 96132 27m 17m S 0.0 14.8 0:14.68 mozilla-bin
12696 root 16 0 1592 500 436 S 0.0 0.3 0:00.01 getty
12749 anonyy 16 0 5820 3060 2124 S 0.0 1.6 0:00.20 xterm
12750 anonyy 16 0 3644 1960 1228 S 0.0 1.0 0:00.04 bash
12753 anonyy 15 0 2172 992 776 R 0.0 0.5 0:00.02 top

top - 16:12:57 up 6 days, 17:39, 3 users, load average: 1.66, 1.86, 1.72
Tasks: 52 total, 2 running, 50 sleeping, 0 stopped, 0 zombie
Cpu(s): 0.4% us, 2.1% sy, 97.5% ni, 0.0% id, 0.0% wa, 0.0% hi, 0.0% si
Mem: 191820k total, 185844k used, 5976k free, 61264k buffers
Swap: 188960k total, 4112k used, 184848k free, 53628k cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
11808 boinc 34 19 18196 4948 1692 S 43.9 2.6 2050:25 einstein_4.81_i
4862 boinc 15 0 4300 2692 1620 S 0.7 1.4 55:00.00 boinc_client
6472 lp 15 0 2576 956 812 S 0.2 0.5 15:23.58 lpd
12644 anonyy 16 0 5808 2696 2092 S 0.0 1.4 0:00.18 xterm
12753 anonyy 16 0 2180 1056 828 R 0.0 0.6 0:00.04 top
905 root 15 0 0 0 0 S 0.0 0.0 0:30.35 kjournald
12749 anonyy 16 0 5820 3060 2124 S 0.0 1.6 0:00.21 xterm
1 root 16 0 1596 536 472 S 0.0 0.3 0:11.55 init
2 root 34 19 0 0 0 S 0.0 0.0 0:00.05 ksoftirqd/0
3 root 10 -5 0 0 0 S 0.0 0.0 0:08.99 events/0
4 root 16 -5 0 0 0 S 0.0 0.0 0:00.06 khelper
5 root 10 -5 0 0 0 S 0.0 0.0 0:00.00 kthread
23 root 10 -5 0 0 0 S 0.0 0.0 0:00.90 kblockd/0
64 root 20 -5 0 0 0 S 0.0 0.0 0:00.00 aio/0
63 root 15 0 0 0 0 S 0.0 0.0 0:02.76 kswapd0
650 root 15 0 0 0 0 S 0.0 0.0 0:00.01 kseriod
947 root 11 -4 1584 468 404 S 0.0 0.2 0:00.45 udevd
1573 root 15 0 0 0 0 S 0.0 0.0 0:03.90 kapmd
2283 root 15 0 0 0 0 S 0.0 0.0 0:00.00 pccardd
2291 root 19 0 0 0 0 S 0.0 0.0 0:00.00 pccardd
2373 root 15 0 0 0 0 S 0.0 0.0 0:00.00 khubd
3357 root 17 0 0 0 0 S 0.0 0.0 0:00.00 kgameportd
3805 root 19 0 0 0 0 S 0.0 0.0 0:00.00 kIrDAd
4112 root 15 0 2464 856 736 S 0.0 0.4 0:00.00 dhclient
4810 root 16 0 2352 868 748 S 0.0 0.5 0:01.78 syslogd
4843 root 15 0 2472 1528 472 S 0.0 0.8 0:00.38 klogd
4854 root 16 0 1596 564 500 S 0.0 0.3 0:00.28 apmd
4873 messageb 18 0 2208 828 716 S 0.0 0.4 0:00.00 dbus-daemon-1
4882 root 19 0 2336 756 676 S 0.0 0.4 0:00.00 inetd
4916 root 18 0 1616 688 516 S 0.0 0.4 0:00.00 cardmgr
4930 root 16 0 4068 4068 3380 R 0.0 2.1 0:44.97 ntpd
4933 daemon 16 0 1808 592 572 S 0.0 0.3 0:00.00 atd
4936 root 16 0 1868 696 644 S 0.0 0.4 0:00.68 cron
4949 root 17 0 1596 428 424 S 0.0 0.2 0:00.00 getty
4950 root 16 0 1596 428 424 S 0.0 0.2 0:00.00 getty
5009 root 16 0 1592 428 424 S 0.0 0.2 0:00.00 getty
5017 root 16 0 1592 428 424 S 0.0 0.2 0:00.00 getty
5102 anonyy 15 0 4112 1924 1736 S 0.0 1.0 0:02.18 gconfd-2
6469 lp 18 0 2572 936 796 S 0.0 0.5 0:00.00 lpd
7574 root 15 0 0 0 0 S 0.0 0.0 0:09.09 pdflush
7652 root 15 0 0 0 0 S 0.0 0.0 0:00.15 pdflush
10687 root 16 0 3616 1600 1324 S 0.0 0.8 0:00.03 sshd
11511 anonyy 16 0 4112 2208 1796 S 0.0 1.2 0:01.37 gconfd-2
11541 anonyy 16 0 3664 1972 1224 S 0.0 1.0 0:00.18 bash
12609 anonyy 18 0 3316 1552 984 S 0.0 0.8 0:00.04 startx
12625 anonyy 16 0 2468 676 580 S 0.0 0.4 0:00.00 xinit
12626 root 15 0 22888 12m 2556 S 0.0 6.7 0:09.89 XFree86
12643 anonyy 15 0 5340 2708 2144 S 0.0 1.4 0:00.57 fvwm
12645 anonyy 15 0 3636 1952 1228 S 0.0 1.0 0:00.04 bash
12646 anonyy 16 0 96132 27m 17m S 0.0 14.8 0:14.68 mozilla-bin
12696 root 16 0 1592 500 436 S 0.0 0.3 0:00.01 getty
12750 anonyy 16 0 3644 1960 1228 S 0.0 1.0 0:00.04 bash

I also checked the numbering of subsequently started terminals. Their process IDs were something like 12684-12688, so no problem there.

Anonyymi
Anonyymi
Joined: 6 Jul 05
Posts: 17
Credit: 1270182
RAC: 0

I checked stderrdae.txt and

I checked stderrdae.txt and stdoutdae.txt and couldn't find any errors.

Desti
Desti
Joined: 20 Aug 05
Posts: 117
Credit: 23762214
RAC: 0

RE: I ran top -b -d 60 -n

Message 17285 in response to message 17283

Quote:

I ran

top -b -d 60 -n 2 >toplist

the results.

top - 16:11:56 up 6 days, 17:38, 3 users, load average: 1.71, 1.91, 1.73
Tasks: 52 total, 3 running, 49 sleeping, 0 stopped, 0 zombie
Cpu(s): 3.1% us, 1.8% sy, 65.7% ni, 28.9% id, 0.3% wa, 0.1% hi, 0.0% si
Mem: 191820k total, 185596k used, 6224k free, 61228k buffers
Swap: 188960k total, 4112k used, 184848k free, 53616k cached

Maybe it's waiting for swap space or your system is just i/o bandwidth limited.

Anonyymi
Anonyymi
Joined: 6 Jul 05
Posts: 17
Credit: 1270182
RAC: 0

RE: Maybe it's waiting for

Message 17286 in response to message 17285

Quote:

Maybe it's waiting for swap space or your system is just i/o bandwidth limited.

Why would it be waiting for swap space? You are referring to CPU I/O? Isn't any system bandwidth limited? Please explain.

By the way, I also noticed that einstein has a "sticky keys feature". It occasionally wants to leave Shift and Control (or something) on. Can be confusing when you are trying to input passwords :). I have never had sticky keys when einstein isn't running.

Anonyymi
Anonyymi
Joined: 6 Jul 05
Posts: 17
Credit: 1270182
RAC: 0

I tried running CPU Burn-in

I tried running CPU Burn-in with nice 19. Keyboard works fine. Then I tried with nice 0. No jerks. But as soon as einstein is started keyboard is jerky again.

Here is iostat

Linux 2.6.12-1-686 (best) 13.10.2005

avg-cpu: %user %nice %sys %iowait %idle
3,14 62,46 1,94 1,23 31,22

Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
hda 1,80 27,59 23,20 17051770 14336688

Top should say SW if einstein was swapping, shouldn't it?

Running CPU Burn-in niced:

nice -n 19 cpuburn-in &
sar -u 2 5

Linux 2.6.12-1-686 (best) 13.10.2005

17:24:07 CPU %user %nice %system %iowait %idle
17:24:09 all 0,00 99,50 0,50 0,00 0,00
17:24:11 all 0,00 100,00 0,00 0,00 0,00
17:24:13 all 0,50 99,01 0,50 0,00 0,00
17:24:15 all 0,00 100,00 0,00 0,00 0,00
17:24:17 all 0,00 99,50 0,50 0,00 0,00
Average: all 0,10 99,60 0,30 0,00 0,00

Running einstein and sar -u 2 5

Linux 2.6.12-1-686 (best) 13.10.2005

17:27:18 CPU %user %nice %system %iowait %idle
17:27:20 all 1,08 96,77 2,15 0,00 0,00
17:27:23 all 0,00 97,85 2,15 0,00 0,00
17:27:25 all 1,06 95,74 3,19 0,00 0,00
17:27:27 all 0,00 97,85 2,15 0,00 0,00
17:27:29 all 0,00 97,85 2,15 0,00 0,00
Average: all 0,43 97,21 2,36 0,00 0,00

Then as idle as it gets:

Linux 2.6.12-1-686 (best) 13.10.2005

17:33:44 CPU %user %nice %system %iowait %idle
17:33:46 all 0,00 0,00 0,00 0,00 100,00
17:33:48 all 0,00 0,00 0,00 0,00 100,00
17:33:50 all 0,00 0,00 0,00 0,00 100,00
17:33:52 all 0,50 0,00 0,00 0,00 99,50
17:33:54 all 0,00 0,00 0,50 0,00 99,50
Average: all 0,10 0,00 0,10 0,00 99,80

So it seems that einstein is causing about 700% more system activity than CPU Burn-in. I wonder why. Could this cause the jerkiness I'm experiencing? Is there any way of producing %system usage in a controllable way?

Walt Gribben
Walt Gribben
Joined: 20 Feb 05
Posts: 219
Credit: 1645393
RAC: 0

Anonyymi, Your iostat

Anonyymi,

Your iostat output just shows the one drive, try "iostat -x" to show extended I/O stats for each partition.

TOP just shows swap space in use. For swap activity, check the swap partitions in the iostat partition detail. "sw" shows a process is swapped out, not that the process is swapping. Thats a bit harder (more at the end).

In your test when you ran cpuburn-in, what did TOP show for CPU use - was it around 50% or did it show nearly 100%? Also, when you "start Einstein" are you starting BOINC which starts Einstein or is BOINC already running and you resume Einstein?

Back on the TOP output from last week, it still shows its missing like 50% of the CPU. Individual process numbers don't even add up to 50%.

The first interval shows 29% idle and the second one shows 0%. Something else has to be using all that CPU time. Two other processes using it are BOINC and lpd, I wouldn't expect either one to show much CPU. Well, maybe lpd if you have a lot of printer activity.

Some things to look into -

BOINC is using a lot of CPU. I wouldn't expect BOINC to use 55 minutes CPU in just six days. Check the logs to see what BOINC is doing. Not just errors but messages that show what its doing or trying to do.

You can also check whether BOINC is eating CPU cycles. With TOP running with 5 second intervals in one terminal, kill -9 boinc_client in another terminal. The Einstein app will continue running for like 30 more seconds and you can see if things get better. Like does Einstein CPU use jump to 99%? Or the mouse/system responsiveness get better? Or is the system still messed up until the 30 seconds pass and Einstein also ends?

What is your printer setup like? TOP shows two instances of lpd, is that correct? Somehow that second one doesn't look right. And I wouldn't expect to see it so high in the TOP display unless you were trying to print something. What do "lpc stat" and "lpq" show?

What USB devices are on your system? Are the ones connected actually in use? Like printers, webcams, scanners, external drives and network adapters.

EDIT:

I don't know of a way to get swap activity for a process except by looking at the process in /proc. For process PID look in /proc/PID/stat. The 12th entry is the total number of page faults which have resulted in a page being loaded from disk.

One way to get all the numbers is:

cat /proc/*/stat | less

then look thru the output. The first line for each process shows the process id followed by the process name, and thats actually the only non-numeric item. So count 10 over from the name.

I have a program somewhere that compares two lists, I'll look for it if someone wants it. Or maybe just rewrite it, its pretty simple.

Walt

Anonyymi
Anonyymi
Joined: 6 Jul 05
Posts: 17
Credit: 1270182
RAC: 0

RE: Your iostat output

Message 17289 in response to message 17288

Quote:

Your iostat output just shows the one drive, try "iostat -x" to show
extended I/O stats for each partition.

I rebooted and let einstein run for about 10 hours. Then:

anonyy@best:~$ iostat -p ALL
Linux 2.6.12-1-686 (best) 15.10.2005

avg-cpu: %user %nice %sys %iowait %idle
1,53 96,57 1,71 0,12 0,06

Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
ram0 0,00 0,00 0,00 0 0
ram1 0,00 0,00 0,00 0 0
ram2 0,00 0,00 0,00 0 0
ram3 0,00 0,00 0,00 0 0
ram4 0,00 0,00 0,00 0 0
ram5 0,00 0,00 0,00 0 0
ram6 0,00 0,00 0,00 0 0
ram7 0,00 0,00 0,00 0 0
ram8 0,00 0,00 0,00 0 0
ram9 0,00 0,00 0,00 0 0
ram10 0,00 0,00 0,00 0 0
ram11 0,00 0,00 0,00 0 0
ram12 0,00 0,00 0,00 0 0
ram13 0,00 0,00 0,00 0 0
ram14 0,00 0,00 0,00 0 0
ram15 0,00 0,00 0,00 0 0
hda 2,87 50,33 39,79 889362 703120
hda1 0,00 0,00 0,00 0 0
hda2 0,00 0,00 0,00 0 0
hda3 6,34 50,23 39,47 887618 697504
hda4 0,00 0,00 0,00 0 0
hda5 0,04 0,00 0,32 8 5616
hdc 0,00 0,00 0,00 0 0
fd0 0,00 0,00 0,00 0 0

As far as I remember, hda5 is my swap partition.

anonyy@best:~$ iostat -x
Linux 2.6.12-1-686 (best) 15.10.2005

avg-cpu: %user %nice %sys %iowait %idle
1,69 96,39 1,74 0,12 0,06

Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util
hda 0,06 3,44 1,34 1,52 50,13 39,75 25,07 19,87 31,36 0,20 70,77 12,30 3,52

Quote:

In your test when you ran cpuburn-in, what did TOP show for CPU use -
was it around 50% or did it show nearly 100%? Also, when you "start
Einstein" are you starting BOINC which starts Einstein or is BOINC
already running and you resume Einstein?

CPU Burn-in kept %cpu nearly at 100%. By starting einstein I mean
entering the command "boinc_cmd
--set_run_mode auto" and "boinc_cmd --set_run_mode never". I.e. Boinc is already running and I just resume einstein.

Quote:

Back on the TOP output from last week, it still shows its missing like
50% of the CPU. Individual process numbers don't even add up to 50%.

This doesn't happen with CPU Burn-in. I don't know where to look for the missing time. I've tried running ps -lax, ps maxu, etc. but I just can't find anything else consuming the rest of CPU time.

Quote:

The first interval shows 29% idle and the second one shows 0%. Something
else has to be using all that CPU time. Two other processes using it
are BOINC and lpd, I wouldn't expect either one to show much CPU. Well,
maybe lpd if you have a lot of printer activity.

There certainly isn't a lot of printer activity :)

Quote:

BOINC is using a lot of CPU. I wouldn't expect BOINC to use 55 minutes
CPU in just six days. Check the logs to see what BOINC is doing. Not
just errors but messages that show what its doing or trying to do.

stderrdae.txt is about 5 kB. It contains mostly errors about network
connectivity (network is unreliable in this part of world). There are a few
messages about resending lost results and overcommitting. Those are
probably due to my trial-error installations of Boinc. stdoutdae.txt is
about 70 kB. It contains mostly messages due to missing network
connectivity and suspending and resuming due to user requests. I can't
find anything that I could relate with this problem. If you want I can
post them but they are about 1000 lines altogether. I guess Boinc would need to be more verbose if you wanted to find out what it's trying to do and blocks I/O in the process.

Quote:

You can also check whether BOINC is eating CPU cycles. With TOP running
with 5 second intervals in one terminal, kill -9 boinc_client in another
terminal. The Einstein app will continue running for like 30 more
seconds and you can see if things get better. Like does Einstein CPU
use jump to 99%? Or the mouse/system responsiveness get better? Or is
the system still messed up until the 30 seconds pass and Einstein also
ends?

I did this. After boinc_client was killed einstein's CPU usage jumped to
99-100% and keyboard jerkiness disappeared.

Quote:

What is your printer setup like? TOP shows two instances of lpd, is
that correct? Somehow that second one doesn't look right. And I
wouldn't expect to see it so high in the TOP display unless you were
trying to print something. What do "lpc stat" and "lpq" show?

anonyy@best:~$ /usr/sbin/lpc status
lp:
queuing is enabled
printing is enabled
no entries
printer idle
deskjet3820:
queuing is enabled
printing is enabled
no entries
printer idle

This is after I emptied the queue. There was one job in the queue and I
guess lpd was trying to print it since lpc stat said at first that
deskjet3820 is printing. That probably launched the other lpd. The
printer hasn't been online for weeks.

Quote:

What USB devices are on your system? Are the ones connected actually in
use? Like printers, webcams, scanners, external drives and network
adapters.

Just one USB mouse.

About swapping: it would cause audible and distincive disk activity, wouldn't it? I only hear a few clicks every 10 seconds or so but I think it does that even when Boinc isn't running. BTW, I noticed that HDD is jerky too.

Walt Gribben
Walt Gribben
Joined: 20 Feb 05
Posts: 219
Credit: 1645393
RAC: 0

RE: RE: RE: Your

Message 17290 in response to message 17289

Quote:

Quote:
Quote:

Your iostat output just shows the one drive, try "iostat -x" to show
extended I/O stats for each partition.

I rebooted and let einstein run for about 10 hours. Then:

anonyy@best:~$ iostat -p ALL

Thanks for the detail. Apparently you're running a newer version of sysstat than I have.

Hda3 shows all the I/O activity, whats on that? If hda5 is swap then your system has very little swap activity and most of that is writes.

But then this piece tells all:

Quote:
Quote:

You can also check whether BOINC is eating CPU cycles. With TOP running
with 5 second intervals in one terminal, kill -9 boinc_client in another
terminal. The Einstein app will continue running for like 30 more
seconds and you can see if things get better. Like does Einstein CPU
use jump to 99%? Or the mouse/system responsiveness get better? Or is
the system still messed up until the 30 seconds pass and Einstein also
ends?

I did this. After boinc_client was killed einstein's CPU usage jumped to
99-100% and keyboard jerkiness disappeared.

Thats it then. You said BOINC was complaining about network problems, could be its tying up the CPU in constant retries. Try disabling the network until you know you have a good connection and BOINC needs to communicate with the server:

boinc_cmd -set_network_mode never
boinc_cmd -set_network_mode always

See if that fixes it. If it does you could try running BoincMgr to check what BOINC is doing and change the network settings. If you start if after BOINC is running, it'll just connect.

Also check your preferences for "write to disk at most every __ seconds". Its part of the general preferences in "your account" on the E@H web site. That setting controls how often BOINC and the Einstein application write checkpoints. Although the default setting of 60 seconds is OK, I like to reduce the overhead from checkpointing and set it to 300 seconds.

Theres a new version of BOINC almost ready, when its released you should install it. Normally I'd recommend running the development version, but this one doesn't handle dropped network connections very well. Maybe the opposite of what you're seeing now, it doesn't retry file transfers and doesn't tell you why its not retrying.

Quote:

About swapping: it would cause audible and distincive disk activity, wouldn't it? I only hear a few clicks every 10 seconds or so but I think it does that even when Boinc isn't running. BTW, I noticed that HDD is jerky too.

I believe it does. Thats usually an indication that your system isn't memory constrained. The activity is probably dirty pages being written from the disk cache to the disk.

But what do you mean by the jerky HDD?

Walt

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.