Jump to content

  • Log In with Google      Sign In   
  • Create Account


How to track 500,000 counters every 10 seconds


Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.

  • You cannot reply to this topic
2 replies to this topic

#1 hplus0603   Moderators   -  Reputation: 4903

Like
5Likes
Like

Posted 27 September 2012 - 11:23 AM

This is networking related, bear with me :-)

At my day job, we solve problems that most indies don't have, like "Oh, our gigabit network link is saturating, how can we route more fiber into our data center?" or "we're running out of space on our 50 terabyte storage cluster; can we get new drives from Dell in time?"

One of the things we do to manage this challenge is Continuous Deployment -- as soon as a code commit is done and tests pass, we deploy it in production. Why wait? Get feedback quicker!

Part of continuous deployment is an immune system, which detects when things "go pear shaped" in response to deploying new code, and automatically rolls back the latest commit.

To do that detection with high accuracy, short detection time, and low number of false positives, we needed better instrumentation of our application and data center than we could get through existing monitoring tools like Graphite, Zabbix, OpenTSDB, Cacti, etc. Specifically, we needed a very high rate of sampling (every 10 seconds), a very long retention time (10 days for 10 second data, 6 years for downsampled data) and statistics in real time about the sampling (mean, standard deviation, min, max, for each sample)

Because we couldn't find anything that did this, we wrote it outselves, using boost::asio and C++. And, we release it as open source! "istatd" has been used for the last year to gather, report, and chart metrics for over 500,000 counters (including the 3 different retention intervals, so 130,000 names) every 10 seconds. It rocks!

And, because it uses boost::asio, and does high-throughput networking and I/O, it might be useful for those who want to dive into that kind of network programming with a working, real-life example. Or it might be useful just if you want to track a bunch of counters over time. And if you only have three counters you care about, it'll still work, and draw pretty charts for you; it only uses as much resources as it needs :-)

Check it out at http://github.com/imvu-open/istatd/wiki
Also, I wrote a blog post describing the background: http://engineering.imvu.com/2012/09/26/continuous-monitoring-real-time-statistics-for-a-thousand-servers-and-the-application-they-serve/

Edited by hplus0603, 27 September 2012 - 11:25 AM.

enum Bool { True, False, FileNotFound };

Sponsor:

#2 Madhed   Crossbones+   -  Reputation: 2449

Like
1Likes
Like

Posted 27 September 2012 - 11:48 AM

First I was like: "What? Hplus asking a network question?!"
Then I was like: "That's cool!"

Posted Image

#3 Cornstalks   Crossbones+   -  Reputation: 6966

Like
1Likes
Like

Posted 27 September 2012 - 11:52 AM

First I was like: "What? Hplus asking a network question?!"
Then I was like: "That's cool!"

Posted Image

That's exactly what went through my head.

hplus: that project sounds cool. Thanks for sharing! (and thank your company for being cool)
[ I was ninja'd 71 times before I stopped counting a long time ago ] [ f.k.a. MikeTacular ] [ My Blog ] [ SWFer: Gaplessly looped MP3s in your Flash games ]




Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.



PARTNERS