Performance tuning syslog-ng

Posted by on August 13, 2012

You may have noticed a theme in many of our blog posts. While we do push the technology envelope in many ways, we also like to stick to triedtrue and data driven methods to help us keep us sane.

One of the critical components of our infrastructure is centralised logging. Our systems generate a vast amount of logging, and we like to keep it all. Everything. The whole bag.

When faced with similar situations, other companies have embraced a lot of new technology:

At Etsy, we’ve stayed old-school – practically prehistoric by technology standards. Our logging infrastructure still uses syslog as the preferred transport because it Just Works™.

…Mostly.

Syslog-ng is a powerful tool, and has worked well, as long as we’ve paid a little attention to performance tuning it.
This a collection of our favourite syslog-ng tuning tips.

Our central syslog server, an 8-core server with 12Gb RAM, currently handles around 60,000 events per second at peak at ~25% CPU load.

Rule ordering

If, like us, you have a central syslog server with all of your hosts sending their vital information (whether it be system logs, Apache logs, Squid logs, or anything else) you will probably have a lot of filters set up to match certain hostnames, and certain log lines so you can get the logs into their correct places.

For example, you may be trying to find all Apache access log lines across all of your webservers, so you end up with all the logs in a single file destination (something like /log/web/access.log perhaps)

This feature is pretty widely used, and extremely powerful in the sorting of your logs but it’s also very power consuming in terms of CPU time. Regex matching especially has to check every log line event that comes in, and when you’re pushing tens of thousands of them, it can begin to hurt.

But, in the case of our Apache access log example, you would use the “flags(final)” attribute in order to tell syslog-ng to stop processing that line, so it doesn’t even have to check the other regexes. That is all well and good, but have you considered internally what order syslog-ng is checking those match statements?

For example, we use Chef pretty extensively at Etsy to automate a lot of things; syslog-ng is one of them. Each machine role has it’s own syslog-ng definitions, and our Chef recipes automatically generate both the client and server config for use with that role. To do this, we template out the configuration, and drop the files into /etc/syslog-ng.d/, which is included from the main syslog-ng.conf.

Now, if your biggest traffic (log wise) happens to be for your webservers, and your config file ends up being called “web.conf”, syslog-ng will quite happily parse all your configs and when it compiles it’s rules, the config that checks to see if the line is from Apache will end up at the end of the list. You are potentially running tens if not hundreds of other match statements and regexes for no reason what so ever for the bulk of your traffic.

Luckily the fix is extremely simple: If using one config file, keep the most used rules at the top. If you use syslog-ng.d, keep your most used rules in a file that begins with a “0-” to force it to the top of the list. This tiny change alone halved the CPU we were using on our syslog-ng server.

tl;dr: Make sure syslog-ng is parsing the most frequently logged lines first, not last. 

Log windows and buffer sizing

Math time! Syslog-ng has a great feature called “flow control” – when your servers and applications send more traffic than syslog-ng can handle, it will buffer the lines in memory until it can process them. The alternative would be to drop the log lines, resulting in data loss.

Four variables associated with flow control that we will look at, are:

Each of your servers shouldn’t open more than one TCP connection to your central syslog server, so make sure you set max_connections to more than that number.

After that, it’s time to break out your calculator and commit these equations to memory:

log_iw_size = max_connections * log_fetch_limit
log_fifo_size = log_iw_size * (10~20)

There is some variance on how you calculate log_fifo_size which you will have to experiment with. The principle is this:

Syslog-ng will fetch at most log_fetch_limit * max_connections messages each time it polls the sources. Your log_fifo_size should be able to hold many polls before it fills up. When your destination (file on disk? another syslog server?) is not able to accept messages quickly enough, they will accumulate in the log_fifo buffer, so make this BIG.

log_fifo_size = (log_iw_size = max_connections * log_fetch_limit) * 20

Experiment with disabling power saving features

At Etsy, we take pride in trying to do our bit for the planet. It’s what our business is built on, trying to encourage good behaviours around recycling, and re-using precious resources. It’s why we announced earlier this year that we are now a member of the B-Corp movement so we can measure our success in giving back to the world in various ways. One of the criteria for this involves how much energy we use as an organisation; a tough subject when you have a datacenter which are well known for using huge amounts of power. Because of this, we pride ourselves in working with vendors that also care about their power footprint, and being able to squeeze the most savings without effecting performance.

However, this isn’t to say that, in particular, power saving modes provided with servers are perfect. We have hundreds of servers that can scale back their CPUs and other devices, saving power with absolutely 0 measured effect to performance or any other metric (and, believe us, we tested). There are two recorded cases where this hasn’t worked out well for us.

Briefly, the principle behind power saving in servers is to “scale back” (that is, to decrease the speed and thus power usage) of CPU performance when the server doesn’t need it, and with many modern machines having multiple CPUs they can often be the biggest power draw, so the savings involved here are huge.

Power usage on a typical server

Power usage on a typical server. This server would use 472 watts of power if the CPUs were constantly at full speed, the maximum they hit in the last 24 hours was 280 watts.

What if your server demands a lot of CPU power? Well that’s no problem, the CPU can scale back up to full speed instantly, with basically no effect to the response time.

But what if your server does a lot of context switching, or has very bursty CPU usage? Two prime examples of this are Nagios, and syslog-ng. Nagios, for example, has to spin up processes to execute checks in a very spiky manner, and to add to this there are sometimes hundreds of them at once, so the cost of switching between doing tasks in all those processes (even if the time of each individual one on it’s own is tiny) is huge (this is known as context switching). A similar thing happens with syslog-ng, wher the incoming log events are very bursty, so the CPU is actually spending more time scaling back and fourth than doing our processing.

In these two instances, we switched from power saving mode to static full power mode, and the amount of CPU consumed halved. More importantly, that CPU shows up as system time; time wasted context switching and waiting for other events is suddenly reduced dramatically, and all CPU cores can operate on whatever events are needed as soon as possible.

There are some great tools that allow you to watch your CPU scaling in action, for example i7z which is a great command line tool (and UI, if you fancy) that is easy to get running and gives you a great view into the inner workings of Intel CPUs.

The important point here is that we would’ve actually purchased a more power hungry machine to scale with our log traffic if we hadn’t have found this, somewhat defeating the purpose of the power saving feature in the first place.

tl;dr: Experiment with power settings on your server, unlock their full potential, if it makes no difference then put it back. 

Summary

In total these changes took about 5 days of research and testing, and 1/2 day of effort to implement.
We were regularly hitting the limits of what this servers was capable of processing before making these changes and had considered buying much larger servers and moving to other methods of log aggregation.

A small dose of profiling and planning has reduced the load on the server to 1/5th, reduced our power consumption and improved the efficiency of the whole system:

Drop in CPU usage after these tweaks

This graph illustrates the decrease in CPU usage when we performed two of the steps above. 70% CPU consumed to 15% with two very simple steps.

This post was co-written by Avleen Vig (@avleen) and Laurie Denness (@lozzd) battling to solve hard (and sometimes easy) problems to make operations better for everyone. Why not come and help us?

Posted by on August 13, 2012
Category: infrastructure, operations

28 Comments

60k log events / second is a lot of data, do you physically separate this traffic from other back office traffic (e.g. requests to your database servers)?

    We’re not separating it out right now, as it amounts to around 100mbit of traffic. We run at least gigabit ethernet to all our machines, and the runs from the edge switches to the core are 20 gigabit, so we have no concerns there.
    We’re currently using FITB (http://github.com/lozzd/FITB) to monitor traffic on all our uplinks and links to machines, and Network Weathermap (http://www.network-weathermap.com/) to draw a live diagram of all the links to the racks which change colour depending on the utilisation of the link.

[…] Overall, Etsy was able to reduce CPU consumption from 70 percent to 15 percent with a little research and testing to find the best ways to aggregate server logs. All of the technical details, including graphs, are available on the Etsy developer blog. […]

For C-states analysis under Linux one might want to use turbostat[1]. As bonus it provides information about Turbo Boost in modern Intel processors.

[1] http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=tree;f=tools/power/x86/turbostat

[…] Performance tuning syslog-ng One of the critical components of our infrastructure is centralised logging. Our systems generate a vast amount of logging, and we like to keep it all. Everything. The whole bag. When faced with similar situations, other companies have embraced a lot of new technology: Flume, Scribe, Logstash. At Etsy, we’ve stayed old-school – practically prehistoric by technology standards. Our logging infrastructure still uses syslog as the preferred transport because it Just Works™. … Mostly. […]

[…] Performance tuning syslog-ng One of the critical components of our infrastructure is centralised logging. Our systems breed a vast amount of logging, and we like to keep it all. Everything. The whole bag. When faced with similar situations, additional companies have embraced a lot of new technology: Flume, Scribe, Logstash. At Etsy, we’ve stayed ancient-school – practically prehistoric by technology standards. Our logging infrastructure still uses syslog as the ideal transport because it Just Works™. … Mostly. […]

[…] SpeedAwarenessMonth.com is sharing the best tips to speed up your website.* Etsy describes their logging strategy.* Joshua Bixby looks at how browser usage varies throughout the day and week.* Bugsnap shares how […]

[…] reading Etsy’s excellent performance tuning blog post I learned about i7z, a nifty little tool for showing you Intel CPU states. It gives details on clock […]

You guys mentioned i7z, but how you used and if so what do you think about PowerTOP?

[…] important. Web servers, for example, have no important data since logs are sent constantly to our centralised logging host, and the web code is easily deployed back on to the […]

[…] them and make them available to developers without needing access to systems. Whether this is via a massive central syslog server or a remote service like Papertrail, Loggly or Splunk, logging is important for many reasons. All […]

[…] them and make them available to developers without needing access to systems. Whether this is via a massive central syslog server or a remote service like Papertrail, Loggly or Splunk, logging is important for many reasons. All […]

Very interesting post : rules ordering and power mgmt issue are far from overexposed problems. And you come with solution for each 🙂

Thank you very much for this !

One question :
with such a lot of logs to handle, what kind of solution(s) do use your team to :
– dive into these logs and find what they want (GUI vs cmdline for example) ?
– have global reporting such alerts, tendencies, charts ?

Thanks
Christophe

As always, well written and documented post !

You come with solutions for two under covered tips (rules ordering & power mgmt). Thank you for this !

One question :
With such a huge amount of logs, what kind of solution(s) does your team use for :
– searching into the logs (GUI vs cmdline for ex.) ?
– triggering alerts ?
– having more high level reporting : charts, tendencies etc ?

Thank you for the reading,
Christophe

    For deep diving into logs, we use Splunk. There are other tools out there which other people use (Logstash, Greylog, etc) which we hear also work well.
    Splunk has an API which we script around to create alerts for Nagios.
    We also extract data using the Splunk API, aggregate it and inject it into Graphite.
    From there we use nagios plugins like this one:
    , to alert when things go wrong 🙂

    For the charts and such, we use a combination of Splunk’s internal graphing abilities, and using Graphite once the data is injected into it. Graphite is handy because we use it to collect tons of data, and we can then see it all side by side.

Very useful information. Any thoughts on how to measure the health and track performance of the syslog-ng itself? The stats produced by syslog-ng seem to be lacking. Thanks.

Having worked on low level (CPU/GPU) performance optimizations for a number of years, the results in this experiment look extremely fishy, particularly because the load graph uses % CPU usage on the Y axis.

% CPU usage is rather difficult to calculate. It’s often calculated based on the current clock speed. I think it’s extremely likely that your CPU usage is calculated at 70% of your CPU running at .5 GHz (or more likely, half your cores at 1GHz). When you turn off low power mode, it’s showing your CPU running at 15% of 2.5GHz. This is common to see when your CPU power management is working properly. Turning off the power saving mode would simply use more power.

In this case, running at 70% CPU is absolutely fine, since your CPU is working at low clocks. When you add more load, the clocks go up, and you’ll see the load drop when the sleeping cores stop sleeping, and drop again when the CPU goes to full speed.

A more convincing graph would show the average clock along with the CPU load. Or a graph of the actual power usage.

Could you route your log data by using different source ports on the central syslog-ng server (eg all Apache log data comes in on port 20000) as a way to avoid pattern matching routing rule overhead?

    We certainly could, and this was something we considered at the start.
    There is a downside to this approach, as it increases overhead on managing the platform. We would have to manage the mapping of services to syslog ports.
    It gets more complicated if we would have multiple services on multiple machines.

    At some point we certainly will outgrow the setup we have now. But it’s been about a year since we made these changes and things are still running far better than we could have hoped. So for the foreseeable future, we’re going to stick with this simple approach 🙂

Which tool/tools are you using for graphing power consumption of servers in Watts/hr

I’m wondering if there is a way to use TCP ports to put different kinds of log traffic into separate “lanes” before it gets to the log server. For example, have web servers send data to port 8080, database servers to po 9090, etc. Then on the log server have separate syslog daemons, or rule sets, or whatever, for each port.

    We certainly looked in to this.
    It’s very possible by running multiple syslog instances. The only reason we didn’t immediately jump to this was a desire to keep the system and absolutely simple as possible.

    We use what we call the “4am rule” to judge this: If you’re on call and get paged for this at 4am, will the change make things easier to more difficult? If it’s noticeably more difficult, it may be an indicator that we need to approach the problem differently.

    Thanks!

I am interested in the chef scripts used to setup & deploy syslog-ng server and client.

Have you made the cookbooks open sourced? Is this something you will think about making public?

    I don’t think we’ve open sourced them yet, but I can ask 🙂

Can you comment on when you would use syslog-ng to process log information, and when you would use logster? Logster seems like a local solution for each specific server, so I can imagine it may be easier to set up, but if you have a syslog-ng system going already, I don’t see why you’d also want to use logster.

    Jason,

    We generally use syslog-ng to ship logs from one server to another. It’s pretty good at that.
    We use Logster to parse the centrally collected logs, extract data (eg, “how many times did we see HTTP/404?”) and send that data to statsd 🙂

    So, two different functions.

Oh I see, I thought syslog-ng also handled looking at the contents. Your answer triggered another question: I thought statsd was intended to be hit with “live” metrics as they happen, and statsd would do some bucketing and aggregation, for example to 10-second chunks. The usage of logster is typically through one-minute cron jobs. So do you just live with the discrepancy, and as far as statsd is concerned, it gets a burst of metrics every minute (when logster runs) and puts them all in the same 10-second bucket?

    Actually you’re right, I misspoke – We don’t sent to statsd, we send directly to graphite or ganglia from logster 🙂