System Health Monitor
Copyright 2004-2006 by Brian C. Lane <bcl@brianlane.com>
All Rights Reserved
Licensed under GPL v2.0
This python program monitors your system's network interfaces, memory, load, running processes and disk space (and inode usage). It creates Round Robin Databases of the information and generates graphs every 5 minutes. It includes a user friendly interactive setup mode, and it generates HTML files suitable for inclusion in a webpage or local viewing.
UPDATE - If you are upgrading from v0.5.1 to 0.6.0 you will need to delete your meminfo.rrd file and create a new one with systemhealth.py --check, this is due to the kernel folks adding a new line to /proc/meminfo in v2.6.10 kernel.
UPDATE v0.8
Changed ctime() call to ctime().replace(":","\\:") to escape the : character
for the rrdtool COMMENT sections.
v0.7
You can now add the output of an external program to System Health. Use the
--add argument and follow the prompts. You can pass arguments to the
program and specify the title, rrd filename and type of value.
The output of the program must be a single number on a single line. Both floating point and integer are accepted. This value will be inserted directly into the RRD database for graphing.
The graph type of gauge is used for numbers that represent a level, like a temperature or a system load. Counter type is for number that are always incrementing, like a packet count.
When you are finished adding new programs you need to run systemhealth.py with the --html argument to regenerate the html pages. If you have hand-edited any of the pages you should back them up first and then re-edit.
REQUIREMENTS
This program requires a few things to work correctly.
- Round Robin Database from Tobi Oetiker http://people.ee.ethz.ch/~oetiker/webtools/rrdtool/
- Python http://www.python.org
- A web server or browser for viewing the summary pages
Installation and Setup
system_health.py should be run as a normal user. I suggest creating a new user named health (or tommy or bob -- the name doesn't matter) and running system_health.py from that user's crontab.
As root do the following:
# useradd health
# chmod ugo+x /home/health
# cp -p system_health.py /home/health/bin/
# chown -R health.health /home/health/bin
# su - health
This creates a new user, named health in this case, and a bin directory for system_health.py to run from.
Switch to the new user and run setup, which will create the health_rrd and health_html directories, scan the system for network interfaces, drives and running processes and prompt for which ones to monitor.
$ ./bin/system_health.py --setup
follow the prompts to select what network interfaces, drive partitions, and processes to monitor. Directories will be prompted for and created if they do not already exist (with a prompt before and a check for writeability).
Hit return to accept the defaults. All networks and drive partitions have a default of 'Y' and processes have a default of 'N'
Watch out for drives, some of the devices that are mounted aren't actual drives and should be skipped.
Look in the ~/health_rrd and ~/health_html directories to makes sure the rrd files and html files were created correctly.
Setup the crontab for the new user:
$ crontab -e
and add the following line:
*/5 * * * * /home/health/bin/system_health.py --log --graph 2>&1 /dev/null
After it is running you can add other processes, drives and interfaces to monitor by editing the ~/.syshealthrc file. For example to add monitoring of the sshd daemon you would add the following line to the [processes] section:
sshd = sshd
And then regenerate the html and create the new rrd file by running this as the health user:
./system_health --check --html
One thing to watch out for is that running --html will rewrite all the HTML files, so if you have made any customizations to them they will be lost.
If you have any questions, comments, suggestions, etc. contact me at bcl@brianlane.com (if you want to make sure your email makes it through my carrier-class spam filtering you should add the word slartibartfast to the subject line).
v0.5.1 Was the initial release
v0.6 Changes the license to GPL v2.0 and fixes a problem with newer
kernels (v2.6.10) that include a new entry in /proc/meminfo
Fixed command line processing so that --log and --graph can be used
at the same time.
v0.7 Added external program graphs. See the UPDATE at the top of this file
The License
I have changed (as of release v0.6) licenses to GPL v2.0, although donations are still greatly appreciated and help to motivate me to improve and maintain my software. Donations can be made via paypal payment sent to bcl@brianlane.com
Thanks for using my software,
Brian C. Lane
June 11, 2006
The Basement
