Elasticlunr Query DSL

January 8th, 2022
#Backend #Elixir #Series

vintage map

Like every other search engine, you can make more advanced search queries depending on your requirements, and I'm pleased to tell you that Elasticlunr has not left out such capabilities. So, in the remaining part of this post, I will be highlighting the available query types provided by the library and how you can use them.

I will like to mention that Elasticlunr tries to replicate popular Query DSL (Domain Specific Language) with the same behavior as Elasticsearch, which means the learning curve reduces if you have experience using the search engine. For Elasticlunr, there are the bool, match, match_all, not, and terms query types you can use to retrieve insights about an index.

So, let's proceed to the explanation of these query types and their usages using the blog posts index example in the previous blog post.

Bool

The bool query is used with a combination of queries to retrieve documents matching the boolean combinations of clauses. Consider these clauses to be everything that comes after the SELECT statement in relational databases. Note that the bool query is used under the hood when you pass a string to Index.search/1.

The bool query is built using one or more clauses to achieve desired results, and each clause has its type, see below:

Clause Description
must The clause must appear in the matching documents, and this affects the document's score.
must_not The clause must not appear in the matching document. Scoring is ignored because the clause is executed in the filter context.
filter Like must, the clause must appear in the matching documents but scoring is ignored for the query.
should The clause should appear in the matching document.

It's important to note that only scores from the must and should clauses contribute to the final score of the matching document.

# example bool query

Index.search(index, %{
  "query" => %{
    "bool" => %{
      "must" => %{
        "terms" => %{"content" => "use"}
      },
      "should" => %{
        "terms" => %{"category" => "elixir"}
      },
      "filter" => %{
        "match" => %{
          "id" => 3
        }
      },
      "must_not" => %{
        "match" => %{
          "author" => "mika"
        }
      },
      "minimum_should_match" => 1
    }
  }
})

You can use the minimum_should_match parameter to specify the number of should clauses returned documents must match.

If the bool query includes at least one should clause and no must or filter clauses, the default value is 1. Otherwise, the default value is 0.

Match

The match query is the standard query used for full-text search, including support for fuzzy matching. The provided text is analyzed before matching it against documents.

# example match query

Index.search(index, %{
  "query" => %{
    "match" => %{
      "content" => %{
        "query" => "liveview browser"
      }
    }
  }
})

A match query accepts one or more top-level fields you wish to search, in the example above, it's the content field. Note that when you have more than one top-level fields, the match query is rewritten to a bool query internally by the library. Now, let's see what parameters are accepted by the match query below:

Parameter Description
query String you wish to find in the provided field.
expand Increase token recall, see token expansion.
fuzziness Maximum edit distance allowed for matching.
boost Floating point number used to decrease or increase the relevance scores of a query. Defaults to 1.0.
operator The boolean operator used to interpret the query value. Available values for the operator option are or and and. Defaults to or.
minimum_should_match Minimum number of clauses that a document must match for it to be returned.

Terms

The query return documents that contain the exact terms in a given field. The terms query should be used to find documents based on a precise value such as a price, a product ID, or a username.

# example terms query

Index.search(index, %{
  "query" => %{
    "terms" => %{
      "content" => %{
        "value" => "think"
      }
    }
  }
})

Just like the match query, the terms query also accepts one or more top-level fields. See below, to find what parameters are accepted by the terms query:

Parameter Description
value A term you wish to find in the provided field. The term must match exactly the field value to return a document.
boost Floating point number used to decrease or increase the relevance scores of a query. Defaults to 1.0.

Match All

The most simple query, which matches all documents, gives them a score of 1.0 each.

# example match all query

Index.search(index, %{
  "query" => %{
    "match_all" => %{}
  }
})
Parameter Description
boost Floating point number used to decrease or increase the relevance scores of a query. Defaults to 1.0.

Not

The not query inverts the result of the nested query giving the matched documents a score of 1.0 each.

# example match all query

Index.search(index, %{
  "query" => %{
    "not" => %{
  	  "match" => %{
        "content" => "ecto"
      }
    }
  }
})

Wrap Up

Phew, we made it to the end. The above are the available query types you can use to build more advanced queries for your use case. In the proceeding posts, I will be writing about how you can serialize your index and write to any storage service of your choice.

And don't forget to have a look at the livebook document so that you can fiddle with each query and see how you can tweak them to achieve your wants.

Introduction to Elasticlunr
previous

Made with 💙 © 2020 Atanda Rasheed