Argus System and Network Monitoring Software
After a very long and in depth look at web uptime monitoring services and service monitoring companies. I finally came to the conclusion that if I really wanted to monitor a variety of services on a couple dozen machines that the cheapest way to approach it would probably be to install software in a vps that did the monitoring.
I wanted something that was relatively lean, fairly easy to get setup and started, wouldn’t open up additional network ports on my vps and do basic monitoring.
After all of my looking it appeared that Argus really fits that description.
After working with it over the last day, I am seeing all sorts of interesting possibilities for it to integrate in some custom scripts as well. As for ease of setup, well I have seen easier things to install and confugre, but it could have been a much worse install and setup. Having become very used to Ubuntu and apt-get’ting software that I’m looking for, I found argus-server and argus-client listed in my apt-cache and installed. Only to find out that it’s a different piece of software. So, I downloaded the backage from the above listed site, configured and installed the traditional from source way… good ole make, make nstall. Things really went without a hitch on that front. I had to adjust permissions and tinker with the rc. script to get it running automatically, but it didn’t take much effort to get up and running.
Then it was time to configure. I started with the example configuration files for users and the monitoring config. Quickly getting rid of the things I didn’t want or didn’t need. One thing, there is a note in the instructions that the user file accepts hashed passwords created by the standard crypt utility. After struggling to make a password (don’t want to leave this wide open…) I discovered that there is a mkpasswd script bundled in the contrib folder after you unbundle the source package. That did the trick and I’ve taken out the password-less logins.
So, I’ve experimented with restricting access to monitoring groups by different usernames. I have aliases for some items so they can show up in multiple places in the hierarchy without monitoring multiple times. I’ve tweaked the settings so that it won’t send me reminder emails every five minutes when somethings down. (I’ll get to it – mailbombing me doesn’t help me get to it any quicker.)
All in all, everything is very customizable from delivery and notification options, to monitoring options. I’m having a bit of a challenge with a couple of peculiar HTTP servers that are giving somewhat non-standard responses, but I can’t blame Argus for that. I’ve always had issues monitoring that particular HTTP server even with my custom bash scripts. The plus is that I may be able to integrate the solution I have found from my existing scripts into Argus.
Further I’ve started experimenting with an alert for finding a domain on an email blacklist. I see quite a few more advanced features including a distributed mode where I could setup one master server that collates information from several slave monitoring systems. argus-agent is available in the examples folder and has some other interesting possibilities for collecting information and reports such as cpu load, disk space monitoring, etc.
All told I now have about 105 different services monitored using Argus (in a VPS) and have seen no issues with system performance. Judging from their benchmarking I should be able to monitor tens of thousands in my relatively slim vps(!)
Some other good features I should mention are the ability to tag or annotate events on particular services. So, let’s say your testing and you pull the plug on your vpn server (linksys wrt54gl router) to see if your test is going to indicate it’s gone down. When you get the alert you can annotate the outage with testing – pulled plug to verify test logic…. or some such explanation. Which means if you’re working with someone else you can know whether something is being looked at by someone else and what the story is.
You can also override alerts (or if there is scheduled maintenance at a particular window every week you can do so in your config). So, you can over ride an alert indefinitely (until the service comes back up) or for a period of hours or days. When you override an alert you can also tag it. So, you might make a note that there is scheduled maintenance going on, or an estimated time of repair. For that reason, it may be that you would want to make Argus used as a more public accessible network status tool. It’s possible to give passwordless access to it (and you can configure a specific starting area for that passwordless user.)
All in all, the last day configuring and testing Argus has been quite interesting and productive. I suspect this is going to become a good tool for quite some time and will allow me to consolidate some of my many custom made monitoring tools from the last few years. (Possible even reducing the volume of regular check emails that I’ve had going.)