As a data engineer, my daily duties include using Fluentd to collect logs, Hadoop to accumulate, and Hive to aggregate and analyze logs. Our Hadoop cluster is medium-sized, consisting of 40 units and approximately 370TB of DFS used space. Data from LINE family apps is smaller compared to the LINE app. While it's nowhere near large enough to be considered as big data, it still has many types of different data, Fluentd tags, and over 400 Fluentd processes due to the various LINE family services tied to it. The Fluentd data flow amounts to 150 thousand messages per second during peak times.
As much as monitoring for Hadoop and Fluentd is crucial to us, the monitoring tools available to us were less than ideal. Prompting us to look for better solutions. That's when I learned that the storage development team for the LINE app uses Prometheus and Grafana for monitoring. Prometheus is a next generation monitoring system with a pull-based architecture and powerful query language. I was so impressed with what Prometheus was capable of, I repeatedly listened to one of its developers speaking on a podcast about its fundamentals.
In the end, I decided to incorporate Prometheus into my working environment. Seeing how many teams were doing the same, I thought it would be a good idea to gather the people using Prometheus in a meetup where we can share information with each other. This is how we began Prometheus Casual Talks, the first Prometheus meetup event in Japan. While we still have to improve speakers diversity, as 4 out of 5 speakers were LINE employees, I think it's a testament to how popular Prometheus is inside the company.
Through this blog post, I'd like to share my thoughts from the Prometheus meetup. And then briefly talk about the upcoming PromCon 2016.