+32 472 40 86 79
thijs@feryn.eu
Thijs Feryn
  • Home
  • About me
  • Speaking
  • Books
  • Blog
  • Vlog
  • Home
  • About me
  • Speaking
  • Books
  • Blog
  • Vlog
  • Home
  • Tech
  • The ElasticSearch cat APIs

Tech

23 Aug

The ElasticSearch cat APIs

The ElasticSearch cat APIs allow you to retrieve information from your ElasticSearch cluster in a human readable format

I like ElasticSearch, it’s a great piece of open source technology. Although it was built as a Lucene based search engine, it can do more than just that. It’s an awesome analytics engine, but it’s also a pretty good NoSQL database.

Interacting with ElasticSearch happens through the REST API and the output is JSON. JSON is cool, JSON is fun, but it’s not really made for human readable output.

That’s where the ElasticSearch cat APIs come into play.

Cat? Are there animals involved?

Not really …  the ElasticSearch cat APIs are not related to the feline creatures. The API refers to the cat binary in Unix. Instead of outputting JSON, the cat APIs sends it output line by line. No parsing required: new items are separated by a new line, properties of an item by a space.

Makes sense right?

Calling them

Calling them is quite easy: you just issue a GET request to the “_cat” resource of your ElasticSearch server. This could look like this when using curl:

curl "http://localhost:9200/_cat"

This is the output you get:

=^.^=
/_cat/allocation
/_cat/shards
/_cat/shards/{index}
/_cat/master
/_cat/nodes
/_cat/indices
/_cat/indices/{index}
/_cat/segments
/_cat/segments/{index}
/_cat/count
/_cat/count/{index}
/_cat/recovery
/_cat/recovery/{index}
/_cat/health
/_cat/pending_tasks
/_cat/aliases
/_cat/aliases/{alias}
/_cat/thread_pool
/_cat/plugins
/_cat/fielddata
/_cat/fielddata/{fields}

As you can see, the output contains a pretty extensive list of meta information items you can query.

Let’s try calling a specific one:

curl "http://localhost:9200/_cat/nodes"

Here’s the output:

localhost 127.0.0.1 3 43 1.13 d * Sabretooth

What does all of this mean?

There’s no header line with the column names. Or is there?

By adding “-v” parameter to the query string of the ElasticSearch cat APIs, we can have more verbose output.

This is what the URL looks like:

curl "http://localhost:9200/_cat/nodes?v"

And here is some meaningful output:

host ip heap.percent ram.percent load node.role master name
localhost 127.0.0.1 3 45 1.72 d * Sabretooth

We can also limit the amount of columns by adding the “-h” parameter to the query string.

curl "http://localhost:9200/_cat/nodes?v&h=host,ip,name"

The example above adds the column names and outputs the server host, the ip and the name of the server.

host ip name
localhost 127.0.0.1 Sabretooth

Another thing we can do is perform a “help” call on a specific API. This call gives more information about the meaning of each column.

curl "http://localhost:9200/_cat/nodes?help"

The output will contain a lot more fields than you’d expect. That’s because some API calls will not list certain fields, unless you explicitly address them using the “-h” option.

id | id,nodeId | unique node id
pid | p | process id
host | h | host name
ip | i | ip address
port | po | bound transport port
version | v | es version
build | b | es build hash
jdk | j | jdk version
disk.avail | d,disk,diskAvail | available disk space
heap.current | hc,heapCurrent | used heap
heap.percent | hp,heapPercent | used heap ratio
heap.max | hm,heapMax | max configured heap
ram.current | rc,ramCurrent | used machine memory
ram.percent | rp,ramPercent | used machine memory ratio
ram.max | rm,ramMax | total machine memory
...

What kind of APIs are available?

  • Allocation: information about the resource allocation on each server in the cluster
  • Shards: information about the allocation of (specific) shards on each server in the cluster
  • Master: information about the master server in the cluster
  • Indices: information about (specific) indices in the cluster
  • Segments: information on how an index is segmented across several servers in the cluster
  • Count: count documents in (specific) indices
  • Recovery: information about shard recovery when a shard is moved to a different node in the cluster
  • Health: display the cluster health
  • Pending tasks: as the name indicates. What is the server doing right now?
  • Aliases: information about aliases given to specific indices
  • Thread pool: thread pool statistics per node
  • Plugins: a list of running plugins per node
  • Fielddata: information about loaded body & text fields per node

Pick one and dig deeper?

The cat API documentation is pretty extensive. And I could just quote the docs line by line. That wouldn’t be to useful. Instead I’ll pick one and explain why and how I use it.

The “health” API is the most important one to me. If the cluster is not healthy, searches will not return consistent data sets. Based on the health status the cluster could either be:

  • Green: it’s all good man. Saul Goodman 😉 The nodes are up, the shards for each index are loaded and the replicas have been recovered on a separate nodes
  • Yellow: something is wrong. Not all replicas have been recovered. If a node goes down, there can be data loss
  • Red: some primary data shards are missing. This means there data loss right now. This is bad but not disastrous as some nodes might still be rebooting.

You can actually call a specific health call from your monitoring system:

curl "http://localhost:9200/_cat/health?h=status"

If the output is not “green”, engineers should be alerted. Very convenient!

Let’s look at some video footage

I recorded a short video where I feature a couple of cat APIs on a 3 node cluster. The cluster runs on my laptop.

What I’m doing in this video is showing random API calls that are focused on the cluster, the nodes in the cluster and the indices running on the cluster.

I’m creating an index called “myindex” with a type called “mytype”. At first the index is empty, then I’m adding a document, then another one. Using the API calls I’m checking the size of the index, the allocation of the shards in the cluster and the cluster state.

Have a look:

Why should you use the cat APIs?

Long story short: the cat APIs are the easiest way to manage an ElasticSearch cluster.

OK, you can’t really change anything using these APIs, but at least you get a very detailed view on the current status of “things”. And these things could vary.

Questions that could be answered are:

  • Is the cluster doing OK?
  • Are all primary shards loaded? What about the replicas?
  • What’s the master node in the cluster?
  • How much RAM is each node consuming?
  • Which fields does index x have?
  • How is index x scattered across the nodes of our cluster?

the cat APIs are the easiest way to manage an ElasticSearch cluster.

Try it yourself, you’ll love it!

Tags:apicatelasticsearchnosql
  • Share:

Book me for your next event

Get in touch

Follow me on social media

Buy my book

Getting Started with Varnish Cache
More presentations

Categories

  • Combell
  • Football
  • Interviews
  • Review
  • Speaking
  • Tech
  • Travel
  • Trip report
  • Uncategorized
  • Video
  • Vlog

Recent Posts

Logitech MX Vertical mouse
Logitech MX Vertical mouse product review
14Jan,2019
Airtame product review - VLOG 21
Airtame product review
09Oct,2018
PHPBenelux Conference 2018 behind the scenes - VLOG 10
PHPBenelux Conference 2018 behind the scenes – VLOG 10
31Jan,2018

Get in touch

+32 472408679

thijs@feryn.eu

Follow me on social media

  • Twitter
  • Linkedin
  • Instagram
  • Youtube

Navigation

  • Home
  • About me
  • Speaking
  • Speaker bio
  • Books
  • Blog
  • Video
  • Vlog