Wed 2 Jul 2008
OpenNMS vs Nagios
Posted by Craig under Tech
What’s Wrong With Nagios?
Don’t get me wrong, I like Nagios. I think it’s an excellent piece of software and I have spent many years working with it, but I have just completed a proof of concept and gained approval to deploy OpenNMS as a new Enterprise Grade Network Monitoring System. And the main system targeted for replacement here? That’s right, it’s Nagios, which is primarily running via the remote plugin model, using the NRPE daemon to run scripts on remote hosts and report back to base.
Now anyone who has ever played with Nagios will know that it can be a beast of a thing to set up and get working satisfactorily. In fact, most places will devote a good year or so to the process. As a newbie, sitting in front of a freshly installed Nagios instance and wondering how to get it to do something can be an extremely disheartening experience. Once it’s up and running though it’s usually fairly low maintenance to keep it going, and not too difficult to add new devices or custom plugins as you go along. And, for the most part, it is good at what it does, so why would you want to replace it?
Well, despite having almost unparalleled abilities to monitor at the application level and perform any manner of esoteric checks, Nagios does have its limitations.
A Question of Scale
One of the biggest problems I have encountered with various Nagios implementations is one of scale. Put simply, Nagios does not scale well.
Too Much Information, Too Little Visibility
I have seen Nagios implementations monitoring hundreds or even thousands of hosts and services where the corresponding Host Detail and Service Detail screens are simply so big that they refresh themselves before you can scroll even half way down the page.
The Tactical Overview page gives you a simple view into the number of current issues, but doesn’t tell you at a glance what or where they are.
This makes using it in a NOC something of a chore as you actually have to interact with it to get at the information you require. It also has fairly poor visibility into historical data, although this can be addressed to some extent using additional plugins such as perfparse, and it has little to no reporting output - both of which are things The Business tend to like rather a lot.
Timeout, I Tripped Myself Up
Running custom plugins to fit any ad-hoc monitoring requirement might seem like a good idea as you have total control over the requirements and the output, and for what it’s worth, I like writing Nagios plugins, I’ve written them for the NRPE daemon as well as for places where the plugins are installed locally and run over ssh.
In both instances I have seen occasions where the amount of time taken to do a single poll run can take longer than the amount of time taken to gather the results of that poll, and have seen systems come crashing to their knees as a result.
Please Invent Me a Wheel
From my experience this is probably the most misunderstood issue with Nagios; people will spend a long time writing all manner of shell scripts or Perl scripts to plug in to Nagios to return all manner of incredibly useful data, which is all well and good, but most of that information is available already, at significantly less cost (both computational and time), from SNMP.
Yes, Nagios is perfectly capable of polling SNMP, it’s just that I’ve not yet come across anyone who was using it that way by default, and once you have the system set up with dozens or even hundreds of plugins, making the choice to convert to SNMP would be an administrative nightmare.
Security?
And here’s the big one…
Ok, so Nagios allows you to write plugins in just about any language and run them on remote servers. Hands up anyone who sees the problem here? Yes, I trust myself and the integrity of my team to be able to write safe plugins that won’t wipe out the remote host when they are run, but what about somebody who worked here years ago whom I never met? Or some new recruit who may come along after I’m gone? Should I trust their code? Should they be allowed to run ad-hoc scripts on just about any production server they feel like with no checks or balances? Because that’s what will happen. If you are using NRPE then most people bundle all the plugins up and install them on each target system rather than picking and choosing which ones are required, so there is the potential to put a script that has never been tested on a system it was never designed to be run on, and then go ahead and run it anyway. Now I don’t know about you, but that strikes me as a recipe for disaster…
And I won’t even mention the time I saw two plugins written in PHP amongst the usual mix of Perl, shell scripts or binaries. What’s so wrong with that? I hear you ask. Only that it then meant that all of the PHP libraries had to be installed just so those two plugins could run. That included all libraries developed in-house, not just the distribution ones.
It’s Not Exactly Broken, But Please Fix It
Obviously for most organisations, setting up something like Nagios represents a significant investment. Remember, there is no such thing as ‘Free’ software, you still have to pay for the time it takes to set up and maintain it, so you have to have some pretty good reasons to want to replace it.
The main driving factor for doing so here was to provide better visibility into the Operational Infrastructure. This meant not only a clearer interface or dashboard showing at a glance where there are any issues or outages, but also better historical trending information and better reporting.
Actually sourcing a replacement system to fulfil the Business Requirements is no mean feat, but at the end of a 5 month project, OpenNMS was chosen as the best solution. I won’t go into the process here, but suffice it to say it had some pretty stiff competition, especially from the likes of Zenoss and Hyperic HQ.
OpenNMS vs Nagios
So, when it comes down to it, why choose OpenNMS over Nagios?
At the end of the day the differences can be brought down to just three points:
- Visibility
- Reporting
- Scalability
There are other important factors - like auto discovery for instance, which Nagios doesn’t do and which OpenNMS makes incredibly easy: with a few clicks through the GUI you can start monitoring your entire network and collecting data with almost no effort. Obviously you will want more from the system than this, you’ll want to set alerts and thresholds for instance, but at least, unlike Nagios, it is very easy to make it start to do something useful.
In terms of the Business Requirements though, as expressed by the three items listed above, OpenNMS has these in spades; you can create multiple customised dashboards, there is a wealth of out of the box reporting functionality as well as customised report creation, and there is the potential for huge scalability, even running a distributed model across multiple servers or locations. But that’s not enough on its own to sell the idea to Management. What makes OpenNMS a better choice for the enterprise?
OpenNMS describes itself as:
the world’s first enterprise grade network management platform developed under the open source model.
But that doesn’t mean much when it comes to selling a solution to Management, who tend to want to know about things like TCO, ROI, and other such important factors. With commercial propositions it is possible to make these kind of calculations. With an Open Source product this is much more of a grey area. What is it that makes OpenNMS a better proposition than, say, HP’s Openview? Yes, the software is free, but what is the cost involved to set it up and maintain it? Obviously most Linux Sys Admins are capable of picking up just about any Open Source product and running with it, it may just take longer to get your head around some systems and require more TLC to get them working just how you want them - see my comments above about Nagios, and the exact same thing can be said about OpenNMS.
Sometimes, though, you don’t have the luxury of time, so the question then comes down to support. And it just so happens that the makers of OpenNMS have a commercial support company in the shape of The OpenNMS Group, which exists to provide various levels of support agreements and professional services.
There’s Service, and Then There’s Service
If your entire business model is based around the support you can offer for an otherwise free product, and when the competition includes vendor provided solutions where they’ll hold your hand all the way and do everything for you (at the cost of an arm and a leg, while we’re at it), you better be able to provide something pretty special.
And here’s the kicker - the guys from the OpenNMS Group are the guys who actually write and maintain the software. Who would you rather ask your technical questions to? Some slick salesman with a glossy brochure and, if you’re lucky, a service manual, or the guy who actually wrote that particular module or function? Ask Management that question and you know which answer you’re going to get, but ask the techies, the people who have to work with these systems, and they’ll go for the techies and the developers every time.
In the course of my investigations into OpenNMS I’ve spoken to quite a few people, and seen any number of online sources, all of whom are happy to sing the praises of the OpenNMS Group and the exemplary commitment to service and support they show, and now I have to take a moment to add my praises too.
Even after OpenNMS was chosen as the best fit for our requirements there was still significant reluctance to sign off on the project and actually move towards deployment. There were varying levels of concerns about the scale, scope and capabilities of the product, even in the face of all the evidence I had produced over the previous couple of months, with the biggest remaining issue being the ability to accurately replicate what was currently being monitored by Nagios. Of course, I would estimate that approximately 90% of the Nagios functionality could be reproduced out of the box by SNMP, and the rest could be mirrored either by running the Nagios plugins from within OpenNMS or by using some other native means - with a bit of work, obviously. It didn’t matter how many times I said this though, or how much evidence I produced, the people who would actually have to work with the system wanted further reassurances.
With just one day to go before we gave up and pulled the plug on the project, the maintainer of the OpenNMS project, Tarus Balog (The Mouth of OpenNMS), very kindly agreed to a teleconference with us to address some of the concerns being voiced by the Infrastructure guys. Since we’re in Australia he even took time out of his evening for the purpose. After spending over an hour on the phone with us, fielding all of the questions which were put to him, not only did I manage to get the project signed off straight away, I managed to get agreement to engage the OpenNMS Group - in fact Tarus himself, to come to Australia and perform an Enterprise GreenLight Deployment.
Coincidentally that very same day I read a post on Tarus’ blog about the need to improve their marketing strategy. To the people that matter, I’d like to state that the friendliness and helpfulness of the OpenNMS guys is probably their single biggest asset. After talking to Tarus, their dedication to their cause is obvious, and it certainly helped to sell the product as far as the people who sign the paperwork here were concerned. From a techie point of view I have to say their marketing already works just fine, but hey, anything which helps promote these guys and the amazing levels of service they provide is alright with me.
Disclaimer: I am a Technical Specialist and Hired Gun. I have no affiliation with the OpenNMS Group other than through implementing and using their software.
5 Responses to “ OpenNMS vs Nagios ”
Comments:
Leave a Reply
You must be logged in to post a comment.
Trackbacks & Pingbacks:
-
Pingback from Vandebilt.com » Monitoring applications
September 22nd, 2008 at 7:31 am[...] OpenNMS vs Nagios [...]









July 2nd, 2008 at 11:57 pm
Thanks a bunch, Craig! I’ve written 3 different comments in response to this and deleted them before hitting “Submit” because (ugh) I end up sounding like a sales guy.
Just wanted to know your take on OpenNMS is appreciated, and I’m glad you “get it” — this is exactly the kind of experience we’re trying to make sure all of our customers and users get.
July 3rd, 2008 at 4:37 am
I´m absolute your opinion. I have strong experience with Nagios and now with OpenNMS. I have decided to do my Open-Source Projects with OpenNMS for the same reasons in your posting. Additional to this posting, it is really hard to maintain a large Nagios-installation. In Nagios there are absolutly no procedures and processes which help you to bring your monitoring up-to-date. All changes in the network must be done manually. The discovery possibilities in OpenNMS (capability-scan and discovery) works really good for that. And last not least, Nagios can do absolutly nothing with external commands like SNMP-Traps and Syslogs. The implementations with snmptt and so on –> !!! ROFL !!!
The same with notification for that –> Netways-Implementation for notify SNMP-Traps –> ROFL too
Look at http://www.netways.de/uploads/media/Martin.Fuerstenau_SNMP.Traphandling.fuer.Nagios.pdf
than you see why this sucks
July 12th, 2008 at 10:53 pm
I absolutely agree with your comments about the OpenNMS group.
I’m hoping to have the pleasure to collaborate with them on an interesting project. They have worked enormously hard to deliver a great software product and develop their expertise.
We’ve chosen to work with OpenNMS and invest in further development because of their product - and their quality service. Well done Tarus, Jeff, David and the rest of the crew.
Of course, a little marketing polish wouldn’t hurt - as long as you understand it to mean -> engaging in a meaningful two-dialog with everyone involved in the process - budget holders, managers, engineers.
July 13th, 2008 at 12:11 pm
I believe the open source community needs to re-focus the way it engages it’s audience (their market, in commercial terms).
We need businesses to support us. To have faith & confidence in us as a community of experts - not as a bunch of geeks with a grudge.
We need to polish our messages - not to go “corporate”, but talk to them as peers. To engage in a dialog that they understand and feel welcomed by.
With the support of the non-technical sides of the business - so we get more involved. Our status as trusted advisors that we *really* looking out for their interests - not to fleece them via quasi-legal lock-ins.
That requires us all to invest in learning how to present ourselves. A “commercial” & a “community” website along side materials, webinars, demos and fully-enabled professional partner programmes to support the delivery of quality services - globally.
We need to compete head-to-head with HP, BMC, CA, EMC, IBM and the mid-tier too. We need to give those engineers that want to deploy our solutions the tools to be agents of change. They need business cases, support packages, structured professional services - they need the experience they receive from the traditional proprietary channel.
That’s a long way from where we are today - but at least the software is already there.