Zabbix GUIs and Automation

In the T-DOSE Zabbix talk, which I’m happy to say was both well presented and showed some interesting features, I got called out for a quote I made on Twitter (which just goes to show - you never know where what you said is going to show up and haunt you) about the relevance, and I’d say overemphasis, of the GUI to the zabbix monitoring system - and other monitoring systems in general. Rather than argue with the speaker (for the record, I hate it when the audience does that) I thought I’d note my objects with it here instead.

My monitoring system of choice is Nagios. It’s starting to get a little long in the tooth (where can I add a new host on the fly?) but it’s survived this long because it got a lot of things right. Including its loose coupling and the fact it can read directories of config files. Zabbix, and to a degree Hyperic (which has a command line interface that only a satanist could love), are GUI focused tools (let’s ignore auto-detect for now). To add a host you click around. To add another host, you click around. To add a new check you click around. To add a new group you click around. To… you get the idea.

Now that might have been fine a couple of years ago (well, not really) and it’s an easy, intuitive way to add new config (well, not in Hyperics case) but it bothers me on two levels, firstly I like the Unix approach that nearly everything is a file, and secondly I no longer have a handful of hosts, I have hundreds to thousands of the things. All spread around multiple production sites for all the usual reasons like load distribution, geographical locality, resilience etc and I generate all the configs for those from my puppet modelling.

Using the right puppet modules my servers know about my clients, my clients can aggregate their services and everything stays in sync. While zabbix allows you to define templates and associate checks using groups and (which Nagios can also do) that’s the wrong level for me. My servers have lots of traits I need applied to them, monitoring, trending, logging etc and I want to define that once and have the artifacts actioned where they’re needed - not have to work around half an API or click through a GUI. To be honest, the fact that the API seems like such an afterthought bothers me (possibly unreasonably so) as I think it shows a community with different needs to mine.

And now on to using the GUI to actually display information. From the presentation I understand that the Zabbix team are moving in the direction I consider correct - anything you can do from the GUI you’ll be able to do from the API. Your monitoring system (and your trending systems) are too important to only access in the way other people think you should. The information in it needs to be presented in different ways to different audiences - and here too I think Nagios (with a little help from MK Livestatus and Nagvis is currently doing an OK job. It’s extensible, I have a full query language for retrieving monitoring state and I can convey the information on a screen that highlights my information in the way I like - without making people use the still very unloved Nagios CGIs. Forcing me through a single GUI that allows no comprehensive API access (other than raw SQL) is a losing bet for me.

Hopefully some of that explains my issues with monitoring in general and zabbix (as it is at the moment) in particular. You may not agree with it - but with the tool chain I have these days I think the nicer interfaces, without full API access is a bug, not a feature.