Skip to content
Alexander Holbreich
Go back

Thoughts on Elasticsearch

Let me say in the beginning: Elasticsearch is great for searching. Currently, I’m busy with the improvement of some searches on millions of objects, therefore I’m getting close to Elasticsearch. I stumbled over the existing cluster (3 x 64gb ram, 32 Core) for logs (ELK stack) that looks like a good place to look for existing data on that and to build new indices (document collections). However, first I need to get to know Elasticsearch more closely.

What I’ve learned so far

in the first week

Examples

Some examples for those who never saw it. Typical REST calls.

Search query

POST /index/_search
{   "_source":  ["entry_id","contract_id", "name","description", "score", "country"],
    
    "from" : 10, "size" : 200,
    "sort" : [{ "@timestamp" : {"order" : "asc"} }],
    "query": {
      "bool" : {
       "must": [
           { "match_phrase": { "entry_type": "score processing" } },
           { "term" :{ "contract_id" : "1000"} },
           { "range" : { "@timestamp" : {  "gte": "17:08:2017", "lte": "17:08:2017", "format": "dd:MM:yyyy" } } },
           { "match": { "name": "fantastic" } }
           ]      
      }
    }
}

We see a bool-query with only one must boolean clause that contains several expressions: match, match_phrase, term, range

New type mapping

PUT /index_name/_mapping/type_name
{
       "type_name" : {
            "properties" : {
                "entry_id" : { "type" : "long" },
                "key" : { "type" : "text" },
                "name" : { "type" : "text" },
                "sescriotuion" : { "type" : "text" },
                "country" : { "type" : "text" },
                "@timestamp" : { "type" : "date", "format": "date_optional_time||yyyy-MM-dd HH:mm:ss" },
                "state"  : { "type" : "byte" },
                "contract_id": {  "type" : "long" },
            }
        }
   
}

This would create new type type_name inside the index index_name

Re-indexing

POST /_reindex
{
  "source": {
    "index": "logstash-2017.08.17",
    "_source":  ["entry_id","contract_id", "name","description", "score", "country"]
    
    "sort": { "@timestamp": "desc" },
    "query": {
      "bool" : {"must": [{ "match_phrase": { "entry_type": "score processing" } }]
      }
    }
  },
  "dest": {
    "index": "new_index", "type":"new_type"
  }
}

This Query creates new_index and fills it with elements that are matching the query section

Challenges

Struggling with the search. I still don’t know how to retrieve (all) elements, but only one child for the same (parent) id - kinda group by. And I don’t know is it even possible to retrieve all elements, but again “group by” child for same parent id field and I want to specify a group by function.

Going further what i need is for example new synthetic fields while grouping:

I have no clue how to achieve that yet.

There are as well not that many examples of advanced queries. Also question to search queries on StackOverflow or on Elastics’ Discuss platform are not well answered or answered at all, which wounder me a bit.

The same applies to Reindex. Probably I would like to use the same “GROUP BY” expression to rebuild the new index and to insert new fields, it looks like it’s possible with “Pipelines”, but not tried so far and not easy to understand without examples.

If you have some tips for beginners or any other feedback, please comment.


Share this post on:

Archived comments (8)

These comments were migrated from Disqus and are no longer accepting replies.


Previous Post
Kubernetes on Ubuntu
Next Post
How to cook Jenkins in 2017