Introduction to Elasticlunr

January 3rd, 2022

#Backend #Elixir #Quickstart #Series

elixir language image

Elasticlunr is a small, full-text search library for use in the Elixir environment. It indexes JSON documents and provides a friendly search interface to retrieve documents.

The library is built for web applications that do not require the deployment complexities of popular search engines while taking advantage of the Beam capabilities.

Imagine how much is gained when the search functionality of your application resides in the same environment (Beam VM) as your business logic; search resolves faster, the number of services (Elasticsearch, Solr, and so on) to monitor reduces.

Getting Started

First of all, the library has to be installed in a project of choice – see snippet below – before we proceed to the fundamental parts of the library.

def deps do
  [
    {:elasticlunr, "~> 0.5"}
  ]
end

What's an Index?

An index is a collection of structured data that is referred to when looking for results that are relevant to a specific query.

In RDBMS, a table can be likened to an index, meaning that you can store, update, delete and search documents in an index. But the difference here is that an index has a pipeline that every JSON document passes through before it becomes searchable.

alias Elasticlunr.{Index, Pipeline}

pipeline = Pipeline.new(Pipeline.default_runners())

index = Index.new(pipeline: pipeline)

The above code block creates a new index with a pipeline of default functions that work with the English language.

The new index does not define the expected structure of the JSON documents to be indexed. To fix this, let's assume we are building an index of blog posts, and each post consists of the author, content, category, and title attributes.

index =
  index
  |> Index.add_field("title")
  |> Index.add_field("author")
  |> Index.add_field("content")
  |> Index.add_field("category")

Based on the changes made to the index, it expects a JSON document of the below structure:

%{
  "id" => "...",
  "title" => "...",
  "author" => "...",
  "category" => "...",
  "content" => "..."
}

Note that only the fields specified on the index will be indexed, i.e. fields on a document not recognized by the index will be ignored. And every index by default assumes that the unique identifier on documents to be indexed is id, and this behavior is configurable.

Indexing Documents

Following our example or use-case above, to make the blog posts searchable we need to add them to the index so that they can be analyzed and transformed appropriately.

documents = [
  %{
    "id" => 1,
    "author" => "Mark Ericksen",
    "title" => "Saving and Restoring LiveView State using the Browser",
    "category" => "elixir liveview browser",
    "content" => "There are multiple ways to save and restore state for your LiveView processes. You can use an external cache like Redis, your database, or even the browser itself. Sometimes there are situations where you either can't or don't want to store the state on the server. In situations like that, you do have the option of storing the state in the user's browser. This post explains how you use the browser to store state and how your LiveView process can get it back later. We'll go through the code so you can add something similar to your own project. We cover what data to store, how to do it securely, and restoring the state on demand."
  },
  %{
    "id" => 2,
    "author" => "Mika Kalathil",
    "title" => "Creating Reusable Ecto Code",
    "category" => "elixir ecto sql",
    "content" => "Creating a highly reusable Ecto API is one of the ways we can create long-term sustainable code for ourselves, while growing it with our application to allow for infinite combination possibilites and high code reusability. If we write our Ecto code correctly, we can not only have a very well defined split between query definition and combination/execution using our context but also have the ability to re-use the queries we design individually, together with others to create larger complex queries."
  },
  %{
    "id" => 3,
    "author" => "Mark Ericksen",
    "title" => "ThinkingElixir 079: Collaborative Music in LiveView with Nathan Willson",
    "category" => "elixir podcast liveview",
    "content" => "In episode 79 of Thinking Elixir, we talk with Nathan Willson about GEMS, his collaborative music generator written in LiveView. He explains how it's built, the JS sound library integrations, what could be done by Phoenix and what is done in the browser. Nathan shares how he deployed it globally to 10 regions using Fly.io. We go over some of the challenges he overcame creating an audio focused web application. It's a fun open-source project that pushes the boundaries of what we think LiveView apps can do!"
  },
  %{
    "id" => 4,
  	"title" => "ThinkingElixir 078: Logflare with Chase Granberry",
    "author" => "Mark Ericksen",
    "category" => "elixir podcast logging logflare",
    "content" => "In episode 78 of Thinking Elixir, we talk with Chase Granberry about Logflare. We learn why Chase started the company, what Logflare does, how it's built on Elixir, about their custom Elixir logger, where the data is stored, how it's queried, and more! We talk about dealing with the constant stream of log data, how Logflare is collecting and displaying metrics, and talk more about Supabase acquiring the company!",
  }
]

index = Index.add_documents(index, documents)

Search Index

After building our index, we need to query it for relevant posts based on our criteria. But before we do that, let me explain what the search results look like:

[
  %{
    matched: 3,
    positions: %{
      "category" => [{7, 4}],
      "content" => [{27, 4}, {239, 4}],
      "title" => [{18, 4}]
    },
    ref: 2,
    score: 1.1853013470487557
  }
]

As seen above, a list of maps is returned and each map contains specific keys, matched, positions, ref, and score. See the definitions below:

matched: this field tells the number of attributes where the given query matches
score: the value shows how well the document ranks compared to other documents
ref: this is the document id
positions: this is a map that shows the positions of the matching words in the document

Now that we understand what each field stands for, let's search for the word "elixir" in the index. We get the below results:

results = Index.search(index, "elixir")

# value of results variable
[
  %{
    matched: 2,
    positions: %{
      "category" => [{0, 6}],
      "content" => [{26, 7}, {157, 7}, {184, 6}]
    },
    ref: 4,
    score: 0.4791274455433391
  },
  %{
    matched: 2,
    positions: %{
      "category" => [{0, 6}],
      "content" => [{26, 7}]
    },
    ref: 3,
    score: 0.3984945971205254
  },
  %{
    matched: 1,
    positions: %{"category" => [{0, 6}]},
    ref: 2,
    score: 0.28834807779546184
  },
  %{
    matched: 1,
    positions: %{"category" => [{0, 6}]},
    ref: 1,
    score: 0.28834807779546184
  }
]

Looking at the results we can identify what documents match the search criteria. The code snippets show the fundamental way to use Elasticlunr in your application.

Nested Document Attributes

As seen in the previous examples, all documents indexed were without nested attributes. But Imagine a situation where your data source returns records with nested attributes, and you want to search by these attributes - it's possible with Elasticlunr.

For this use case, let's assume our data source returns a list of users with their addresses, and you want to index this information. To get this to work when defining your index structure, you will need to include the top-level attribute (address):

alias Elasticlunr.{Index, Pipeline}

pipeline = Pipeline.new(Pipeline.default_runners())

users_index =
  Index.new(pipeline: pipeline)
  |> Index.add_field("name")
  |> Index.add_field("address")
  |> Index.add_field("education")

Automatically, Elasticlunr will flatten the nested attributes to the level that when using the advanced query DSL you can use dot notation to filter the search results. Now, let's add a few user objects to the index:

documents = [
  %{
    "id" => 1,
    "name" => "rose mary",
    "education" => "BSc.",
    "address" => %{
      "line1" => "Brooklyn Street",
      "line2" => "4181",
      "city" => "Portland",
      "state" => "Oregon",
      "country" => "USA"
    }
  },
  %{
    "id" => 2,
    "name" => "jason richard",
    "education" => "Msc.",
    "address" => %{
      "line1" => "Crown Street",
      "line2" => "2057",
      "city" => "St Malo",
      "state" => "Quebec",
      "country" => "CA"
    }
  },
  %{
    "id" => 3,
    "name" => "peters book",
    "education" => "BSc.",
    "address" => %{
      "line1" => "Murry Street",
      "line2" => "2285",
      "city" => "Norfolk",
      "state" => "Virginia",
      "country" => "USA"
    }
  },
  %{
    "id" => 4,
    "name" => "jason mount",
    "education" => "Highschool",
    "address" => %{
      "line1" => "Aspen Court",
      "line2" => "2057",
      "city" => "Boston",
      "state" => "Massachusetts",
      "country" => "USA"
    }
  }
]

users_index = Index.add_documents(users_index, documents)

After indexing the users' records, searching for "jason murry" in the index we get the below results:

results = Index.search(users_index, "jason murry")

# value of results variable
[
  %{
    matched: 1,
    positions: %{"address.line1" => [{0, 5}]},
    ref: 3,
    score: 0.6910333283103078
  },
  %{
    matched: 1,
    positions: %{"name" => [{0, 5}]},
    ref: 4,
    score: 0.47830918795337357
  },
  %{
    matched: 1,
    positions: %{"name" => [{0, 5}]},
    ref: 2,
    score: 0.47830918795337357
  }
]

Index Manager

One situation you will find yourself in when using this library is where to persist your indexes but look no further than the manager provided by the library. The manager includes different CRUD functions to help you manage your index after mutating the state.

alias Elasticlunr.{Index, IndexManager}

index = Index.new(name: "test_index")

{:ok, _} = IndexManager.save(index)

index = Index.add_field(index, "title")

...

# after making changes to the index like adding documents or new fields
# you should call the IndexManager.update/1 to keep the updated index in the manager
index = IndexManager.update(index)

...

# use the IndexManager.get/1 to the index from the manager

index = IndexManager.get("test_index")

:not_running = IndexManager.get("non_existent_index")

See the documentation to find other available functions on the manager.

Conclusion

I'm glad to see you get to the end of this blog post whose aim is to introduce you to the basics of the not-so-bright full-text search library – it's a capable library 😄. In the following part of this series, I will be writing about the advanced Query DSL, Index Serialization for storage.

In short, I can't wait to see how you use it in your applications and the problems it solves 🤗.