Cloud-aware, provider-agnostic monitoring
Category : Tools
I’ve never had to deal so much with monitoring. I’ve established a few Nagios instances in earlier days, I’ve used Amazon CloudWatch, Pingdom and New Relic lately for cloud setups, but I don’t consider myself a monitoring expert. At Otomato we are currently mainly focused on software delivery processes, on how to get those bits from dev machines and into production in the most effective and agile manner without compromising quality. Monitoring, although an important part of running software and assuring successful value delivery (should I say an important part of DevOps?) , has been largely out of our scope.
But now we are working on a new project where we had to deal with the full-full cycle – building, deploying and running. Moreover – the requirement was for cloud-provider-agnostic solutions as this is going to be a multicloud setup.
And with that came the most exciting part of any project – the research!
I didn’t want to go back to Nagios as my previous encounters with it left a bad aftertaste. Cumbersome setup, dated WebUI and no real support for ephemeral cloud servers. So I started reading up to understand what are the newer tools on the market and if #monitoringsucks less now than it used to.
Here are the links to a few articles that really helped me in my research:
A series of posts on Florin’s blog: http://florin.myip.org/blog/monitoring-cloud-part-1-tools-and-techniques
All James Turnbull has to say about monitoring and his experience with Riemann: https://kartar.net/tags/monitoring/
This comparison of Nagios, Sensu and Icinga2 : http://phillbarber.blogspot.co.il/2015/03/nagios-vs-sensu-vs-icinga2.html
And there are more…
To sum things up – I really liked what Riemann has to offer, and I’d like to research it more for future project but currently we’ve decided to go with Sensu. Reasons: scalability, cloud-awareness and yes – a good-looking WebUI.
Needless to say – we’ve only touched the tip of the iceberg here – modern monitoring is a world of its own. Metrics collection, event handling, threshold definition, log analysis, anomaly detection – you name it. We promise to revisit all these in more depth but until then – please comment here with links and insights of your own.