Skip to content
Alexander Holbreich
Go back

Elasticsearch: Working with Indices

Continuing article series on Elasticsearch this article explains things around indices.

Creating an index

Grab your favorite REST tool and let’s and make sure you can access your cluster via rest if you like to execute these examples. with HTTP PUT on /article-11-17 we will create index with name article-11-17 pretty convenient.

PUT /article-11-17
{
    "settings" : {
        "index" : {
            "number_of_shards" : 3,
            "number_of_replicas" : 2,
     "refresh_interval": "5s",
     "priority": "10"
        }
    }
}

Hereunder the settings.index we define some important properties:

These settings are frequently used and they can have a significant impact on the performance of your index and cluster.

Static and dynamic settings

PUT /article-11-17
{
    "index" : {
        "number_of_replicas" : 3
    }
}

Number of shards & things to consider

The question regarding a number of primary shards and replicas is probably the most important one. But there is no simple answer because the answer depends on things like query patterns, number of nodes in the cluster, the overall number of documents in the index.

An index with two primary shards and one replica can scale out across four nodes (Picture from Elasticsearch: The Definitive Guide [2.x])

Refresh interval

refresh_interval - is very important on heavy indexing. In many cases, you don’t need the result of the index to be visible immediately (e.g. logs index), but making refresh every second, might strongly affect the overall performance of the cluster. So you can go with 5s or 30s in such a case.

Reindexing

Once an index created with an unlucky number of primary shards you cannot change it on the existing Index. However, Elasticsearch provides very helpful Reindex API, that allows you “re-index” any documents of any existing indexes (even from remote clusters) to the new index. Here an example of reindexing of a Subset of Documents from articles-11-17 to a acricles_experiment index:

{
  "source": {
    "index":  ["articles-11-17"],
    "query": {
      "bool" : {
       "must": [
           { "range" : { "timestamp" : {  "gte": "10.11.2017", "lte": "17.11.2017", "format": "dd.MM.yyyy" } } }
           ]
      }
    }
  },
  "dest": {
    "index": "acricles_experiment"
  }
}

Outlook

Next, i want to spend some words on Document type mappings and things like Index templates


Share this post on:

Previous Post
Docker components explained
Next Post
Internal data structures of Elasticsearch