Dariusz Mydlarz

Dariusz Mydlarz

Java, Distributed Systems & Software Engineering

QR code

How to Improve Elasticsearch Search Results? 🏹

Last month I’ve attended Elasticsearch trainings in Warsaw. Among huge number of things I learnt I want to share the way how to improve relevancy of matches in a search query.


So let’s assume you’re looking for phrase event sourcing using kafka in the blogs data store and you build a query like this:

GET blogs/_search

{
  "query": {
    "match": {
      "content": "event sourcing using kafka"
    }
  }
}

List of results contain following entries sorted by the best match:

  1. Kafka — how to configure
  2. New release of Apache Kafka
  3. Event Sourcing using Kafka
  4. Event Store — lessons learnt

Why did that happened? Well, the match clause is searching with OR operator between passed terms. So every document containing one of the term will match and get to result list. We got the desired result, but it was 4th on the list.

What can be done to improve the relevance of search? One of the technique I learnt is to use bool query with match and match_phrase at the same time. The first one goes to must clause, while the second to should. This is how it looks now:

GET blogs/_search

{
  "query": {
    "bool": {
      "must": {
        "match": { "content": "event sourcing using kafka" }
      },
      "should": {
        "match_phrase":  { "content": "event sourcing using kafka" }
      }
    }
  }
}

With this query, we still get the same wide amount of documents. The first query still searches with OR operator, but at the same time should clause improves the score of documents that contains exact match of the whole phrase we passed.

Now, results should look similar to:

  1. Event Sourcing using Kafka
  2. Kafka — how to configure
  3. New release of Apache Kafka
  4. Event Store — lessons learnt

This time the desired article got highest ranking. Play a bit with your Elasticsearch search queries and verify whether you can add same structure to improve the relevancy.

***

Originally published at softwaremill.com on December 10, 2018.