Etsy Icon>

Code as Craft

Juggling Multiple Elasticsearch Instances on a Single Host main image

Juggling Multiple Elasticsearch Instances on a Single Host

  image

Elasticsearch is a distributed search engine built on top of Apache Lucene. At Etsy we use Elasticsearch in a number of different configurations: for Logstash, powering user-facing search on some large indexes, some analytics usage, and many internal applications.

Typically, it is assumed that there is a 1:1 relationship between ES instances and machines. This is straightforward and makes sense if your instance requirements line up well with the host - whether physical, virtualized or containerized. We run our clusters on bare metal, and for some of them we have more ES instances than physical hosts. We have good reasons for doing this, and here I'll share some of the rationale, and the configuration options that we've found to be worth tuning.

Why?

Managing JVMs with large heaps is scary business due to garbage collection run times. 31Gb is the magic threshold above which point you lose the ability to use CompressedOops. In our experience, it is better to have even smaller heaps. Not only do GC pause times stay low, but it's easier to capture and analyze heap dumps!

To get optimal Lucene performance, it is also important to have sufficient RAM available for OS file caching of the index files.

At the same time, we are running on server-class hardware with plenty of CPU cores and RAM. Our newest search machines are Ivy Bridge with 20 physical (40 virtual) cores, 128Gb of RAM, and of course SSDs. If we run a single node with a small heap on this hardware we would be wasting both CPU and RAM, because the size of shards such an instance will be able to support will also be smaller.

We currently run 4 ES JVMs per machine with 8Gb of heap each. This works out great for us: GC has not been a concern and we are utilizing our hardware effectively.

The settings

os.processors

Elasticsearch uses this setting to configure thread pool and queue sizing. It defaults to Runtime.getRuntime().availableProcessors(). With multiple instances, it is better to spread the CPU resources across them. We set this to ($(nproc) / $nodes_per_host). So if we are running 4 nodes per host on 40-core machines, each of them will configure thread pools and queues as if there were 10 cores.

node.name

The default for this setting is to pick a random Marvel comic character at startup. In production, we want something that lets us find the node we want with as little thought and effort as possible. We set this to $hostname-$nodeId (which results in names like "search64-es0" - less whimsical, but far more practical when you're trying to get to the bottom of an issue).

http.port, transport.port

If these ports are not specified, ES tries to pick the next available port at startup, starting at a base of 9200 for HTTP and 9300 for its internal transport. We prefer to be explicit and assign ports as $basePort+$nodeIdx from the startup script. This can prevent surprises such as where an instance that you expect to be down is still bound to its port, causing the 'next available' one to be higher than expected.

cluster.routing.allocation.awareness.attributes

A key way to achieve failure tolerance with ES is to use replicas, so that if one host goes down, the affected shards stay available. If you're running multiple instances on each physical host, it's entirely possible to automatically allocate all replicas for a shard to the same host, which isn't going to help you! Thankfully this is avoidable with the use of shard allocation awareness. You can set the hostname as a node attribute on each instance and use that attribute as a factor in shard assignments.

ES_JAVA_OPTS="$ES_JAVA_OPTS -Des.node.host=${HOSTNAME}"
ES_JAVA_OPTS="$ES_JAVA_OPTS -Des.cluster.routing.allocation.awareness.attributes=host"

path.logs

Without having a dedicated log directory for each instance, you would end up with multiple JVMs trying to write to the same log files. An alternative, which we rely on, is to prefix the filenames in logging.yml with the property ${node.name} so that each node's logs are labelled by host and node ID. Another reason to be explicit about node naming!

Minimal Viable Configuration

Elasticsearch has lots of knobs, and it's worth trying to minimize configuration. That said, a lot of ES is optimized for cloud environments so we occasionally find things worth adjusting, like allocation and recovery throttling. What do you end up tweaking?

You can follow Shikhar on Twitter at @shikhrr