From charlesreid1

System Monitoring

What system monitoring entails:

  • inspect and manage processes
  • understand load average
  • available memory
  • checking stuff from a shell
  • disk space, storage available
  • logging
  • log size, log rotate
  • systemd init system
  • systemd journal

Processes

Controlling Processes

In order to control a process, you must be able to find a process.

Use the ps command to list currently running processes, and pass the aux argument to show everything:

$ ps aux

This shows a lot of information. Filter it out with grep:

$ ps aux | grep httpd

This is also useful because it will display information about the cpu and memory usage.

Now you can see its PID, and use that to kill the process:

$ kill 15800

THis will send a SIGTERM, or Signal 15, to the process. There are 18 different signals you can send, the SIGTERM is the nice polite request that the process stop.

$ man 7 signal

If you want to send a different signal, like 12, or 2,

$ kill -12 15800
$ kill -2 15800

or if you are desparate, send a SIGKILL, Signal 9:

$ kill -9 15800

If you don't know the PID but you do know the process name, you can use the killall command:

$ killall firefox


Process Load

You can see some information about the load average by showing the contents of /proc/loadavg:

$ cat /proc/loadavg

Or use the uptime command:

$ uptime

This displays the uptime and the load average.

The numbers given in /proc/loadavg and by uptime are three numbers less than 1, for example

0.63 0.72 0.71

These represent the system load during a period of 1, 5, and 15 minutes, respectively. The load refers to the number of processes that are waiting on or currently utilizing the CPU during each timeframe.

The number that is reported should be thought of as the number of equivalent CPUs that would be required to handle the work. If you see an average load like

9.23 9.81 8.94

that would be perfectly fine for a beefy machine with 16 cores that could chew through the work in no time. However, if this were a Raspberry Pi or a creaky old desktop, it would be a sign the system is getting hammered.

Memory Load

To check memory, use the free command:

$ free -m

The -m flag makes the output more readable.

used on the first line - corresponds to how much memory is actually being used. This includes disk cache - chunk of memory set aside for data waiting to be written to disk.

this output is also shown in the free command - the number on the far right, labeled "cached".

This memory is not used by processes, so can be contracted as needed.

Actual memory we have free - this number is sown in the second column, second row.

This memory amount is much larger, because it considers the disk cache flexible.


Shell Monitoring

The most obvious monitoring program is top

also install some other popular tools:

$ apt-get install iotop ncdu htop 

you can use top to look at all processes:

$ top

or to look at a particular process:

$ top -p 15800

you can use top at different frequencies of updates using the d flag:

$ top -d 2
$ top -d 0.5

to sort in top:

  • P - sort by cPu usage
  • M - sort by Memory usage
  • k - kill a process (type its process id)

top tells you what processes are hogging cpu and memory.

the info at the TOP of top tells you what resources are being used: %cpu is cpu uisage, %wa is io wait, idle time %id.

if io is the problem, run iotop, which shows how much data each process is writing to disk, and at the TOP of iotop is total data being read and written to disk.

to limit output of iotop to processes that are actually writing:

$ iotop --only


htop

htop is a fancy version of top.

Htop.png


Storage

To monitor storage, use the du and df commands.

Both commands should be paired with the -h flag, for Human-readable (MB or GB file units):

$ df -h

This shows free space on all mounted file systems.

The du command gives a directory-level picture of the file system usage. you can run it like this:

$ du -hs directory/

s shows a Summary (only the total) and h makes it Human-readable.

ncdu

The ncdu command is a very nice way to get a more detailed overview of disk usage in a particular directory. it is a pseudo-graphical interface on the terminal that presents an easy to understand summary of information about what directories use the most space.

$ ncdu /

will show you a breakdown of your entire filesystem. You can also look at just a particular directory:

$ ncdu /home/joecool/pictures

If you don't want it to scan external mounts, which can swamp a network connection and take forever with large network drives, use the -x option:

$ ncdu -x /home/joecool/

System Logging

/var/log is where all your logs go.

If a user logs into the system, it is logged in /var/log/auth.log

If you need messages from processes, network services, startup services, etc, these are in /var/log/messages

If you are troubleshooting hardware, check /var/log/dmesg (or just use the dmesg command, which provides direct access to this file)

Aptitude logs information to /var/log/apt/history.log

Monitoring logs in real time

To follow a log in real time, i.e., if you want to watch who is logging in to verify they can (or cannot) by monitoring /var/log/auth.log, use the tail -f command:

$ tail -f /var/log/auth.log

This will output the last few lines, and actively check for changes to the file. It will print out any new lines to the screen.

Rotating logs

to rotate logs, use logrotate

To rotate logs for the apt process, for example, edit the file /etc/logrotate.d/apt:

/var/log/apt/term.log {
  rotate 12
  monthly
  compress
  missingok
  notifempty
}

/var/log/apt/history.log {
  rotate 12
  monthly
  compress
  missingok
  notifempty
}

This handles two different log files. This will backup up to 12 backup files, which will be rotated monthly (the monthly option). The compress option will compress (potentially very fat) log files.

Related

Linux/Networking