How to Use Elasticsearch Scoring

How to Use Elasticsearch Scoring Elasticsearch is a powerful, distributed search and analytics engine built on Apache Lucene. One of its most critical yet often misunderstood features is scoring —the algorithmic process that determines how relevant each document is to a given search query. Without a solid grasp of Elasticsearch scoring, even well-structured indexes and precise queries can return m

Nov 10, 2025 - 12:15
Nov 10, 2025 - 12:15
 2

How to Use Elasticsearch Scoring

Elasticsearch is a powerful, distributed search and analytics engine built on Apache Lucene. One of its most critical yet often misunderstood features is scoringthe algorithmic process that determines how relevant each document is to a given search query. Without a solid grasp of Elasticsearch scoring, even well-structured indexes and precise queries can return misleading or suboptimal results. Whether youre building an e-commerce product search, a content recommendation engine, or a log analysis dashboard, understanding and fine-tuning scoring is essential to delivering accurate, fast, and user-satisfying search experiences.

Scoring in Elasticsearch is not a black box. Its a transparent, configurable, and highly customizable system rooted in the TF-IDF (Term Frequency-Inverse Document Frequency) model and extended with modern enhancements like BM25, field boosts, function scores, and custom scripts. This tutorial will guide you through the mechanics of Elasticsearch scoring, show you how to control it with practical examples, and equip you with best practices to optimize your search relevance across real-world use cases.

Step-by-Step Guide

Understanding the Default Scoring Mechanism: BM25

Elasticsearch uses BM25 (Best Match 25) as its default scoring algorithm starting from version 5.0. BM25 is an improvement over the older TF-IDF model, offering better handling of term saturation and document length normalization. It calculates relevance based on three primary factors:

  • Term Frequency (TF): How often the search term appears in the document. More occurrences increase relevance, but with diminishing returns.
  • Inverse Document Frequency (IDF): How rare the term is across the entire index. Rare terms carry more weight.
  • Document Length Normalization: Shorter documents are rewarded when they contain the query term, as they are more likely to be focused on the topic.

To see how Elasticsearch scores your documents, add the explain=true parameter to any search request. For example:

GET /products/_search

{

"query": {

"match": {

"name": "wireless headphones"

}

},

"explain": true

}

The response will include an explanation object for each hit, breaking down the score into its constituent parts. This is invaluable for debugging why certain documents rank higher than others.

Step 1: Indexing Data with Appropriate Field Types

Scoring effectiveness begins at indexing. Ensure your fields are mapped correctly. Text fields are analyzed and used for full-text search, while keyword fields are not. Misusing a keyword field for text search will result in no scoringonly exact matches.

Example mapping for a product index:

PUT /products

{

"mappings": {

"properties": {

"name": {

"type": "text",

"analyzer": "standard",

"boost": 2.0

},

"description": {

"type": "text",

"analyzer": "english"

},

"category": {

"type": "keyword"

},

"price": {

"type": "float"

}

}

}

}

Notice the boost: 2.0 on the name field. This means matches in the product name will contribute twice as much to the overall score as matches in the description. This is a foundational step in influencing relevance.

Step 2: Constructing Basic Match Queries

The simplest way to trigger scoring is with a match query:

GET /products/_search

{

"query": {

"match": {

"name": "wireless headphones"

}

}

}

Elasticsearch will analyze the query string wireless headphones into two terms, then find documents containing either or both. Each term contributes to the BM25 score. Documents with both terms will generally score higher than those with only one.

To see how scoring behaves, compare results from:

  • A document with wireless headphones in the name
  • A document with wireless in the name and headphones in the description
  • A document with wireless headphones in the description only

With the boost on name, the first document should rank highest, even if the third document has both termsbecause term location matters.

Step 3: Using Boolean Logic with Bool Queries

For complex relevance control, use the bool query. It allows you to combine multiple clauses: must, should, must_not, and filter.

Each should clause contributes to the score; must clauses are required but also contribute. filter clauses affect document inclusion but not scoring.

Example: Boost documents that are in stock and have high ratings:

GET /products/_search

{

"query": {

"bool": {

"must": [

{

"match": {

"name": "wireless headphones"

}

}

],

"should": [

{

"term": {

"in_stock": true

}

},

{

"range": {

"rating": {

"gte": 4.5

}

}

}

],

"minimum_should_match": 1

}

}

}

In this example, a product must contain wireless headphones in the name (must), but gets a scoring boost if its in stock or has a rating of 4.5 or higher (should). The minimum_should_match: 1 ensures at least one of the should conditions is met for inclusion.

Step 4: Applying Field-Level Boosts

Field boosts multiply the score contribution of a term match in a specific field. You can apply them directly in the query:

GET /products/_search

{

"query": {

"multi_match": {

"query": "wireless headphones",

"fields": [

"name^3",

"description^1.5",

"category^0.5"

]

}

}

}

Here, matches in the name field are weighted 3x higher than matches in the description, and 6x higher than matches in the category. This is extremely useful when you know certain fields are more indicative of relevance.

Be cautious: excessive boosting can lead to overfitting. A field boost of 10x might cause irrelevant documents with a single keyword match in that field to dominate results.

Step 5: Using Function Score Queries for Custom Logic

Function score queries allow you to modify scores using mathematical functions, scripts, or decay functions. This is where Elasticsearch scoring becomes truly powerful.

Example: Boost products with recent updates and higher sales volume:

GET /products/_search

{

"query": {

"function_score": {

"query": {

"match": {

"name": "wireless headphones"

}

},

"functions": [

{

"gauss": {

"last_updated": {

"origin": "now",

"scale": "7d",

"offset": "2d",

"decay": 0.5

}

}

},

{

"weight": 1.5,

"filter": {

"range": {

"sales_last_month": {

"gte": 100

}

}

}

}

],

"score_mode": "multiply",

"boost_mode": "sum"

}

}

}

This query applies two scoring functions:

  1. A gaussian decay function on last_updated: Documents updated within the last 2 days get full score; relevance decays exponentially over 7 days, reaching 50% after 7 days.
  2. A weight function: Adds a 1.5x multiplier if the product sold more than 100 units last month.

The score_mode: multiply means the BM25 score is multiplied by the function scores. boost_mode: sum means the function scores are added to the base score. You can choose from multiply, sum, avg, max, or min.

Step 6: Using Script Scores for Advanced Customization

For highly specific business logic, use script-based scoring. Scripts are written in Painless (Elasticsearchs secure scripting language).

Example: Score products based on a custom formula combining price, rating, and popularity:

GET /products/_search

{

"query": {

"function_score": {

"query": {

"match": {

"name": "wireless headphones"

}

},

"script_score": {

"script": {

"source": "(_score * 0.6) + (doc['rating'].value * 0.3) + (Math.log10(doc['sales_last_month'].value + 1) * 0.1)"

}

},

"boost_mode": "replace"

}

}

}

This script:

  • Takes the original BM25 score and weights it at 60%
  • Adds 30% from the products rating (on a 15 scale)
  • Adds 10% based on the logarithm of sales (to avoid extreme outliers)

The boost_mode: replace means the final score is entirely determined by the scriptno original BM25 score is retained. This is powerful but requires careful tuning to avoid losing semantic relevance.

Step 7: Testing and Iterating with Explain

Always use explain=true during development. It reveals the exact formula Elasticsearch used to calculate each documents score.

Look for:

  • Which terms contributed most
  • Whether boosts were applied correctly
  • If any function scores were ignored or misconfigured

Example output snippet:

"explanation": {

"value": 4.2,

"description": "sum of:",

"details": [

{

"value": 2.1,

"description": "weight(name:wireless in 12) [PerFieldSimilarity], result of:",

"details": [...]

},

{

"value": 1.5,

"description": "function score, score mode [sum]",

"details": [

{

"value": 1.0,

"description": "function score: gauss(last_updated), score of 0.8"

}

]

}

]

}

Use this feedback loop to adjust boosts, functions, or filters until results align with user expectations.

Best Practices

1. Start Simple, Then Add Complexity

Many teams jump straight into function score queries and custom scripts. This is a mistake. Begin with basic match queries and field boosts. Only introduce complexity when you observe clear relevance gaps. Over-engineering scoring leads to unmaintainable code and unpredictable behavior.

2. Use Field Boosts Before Function Scores

Field boosts are simpler, faster, and easier to debug than function scores. If you need to prioritize one field over another (e.g., title over body), use ^2 or ^3 instead of writing a script.

3. Avoid Over-Boosting

Boosting a field by 10x or more can cause the system to ignore semantic relevance entirely. A document with a single keyword match in a heavily boosted field may outrank a document with multiple relevant terms across multiple fields. This leads to poor user experience.

4. Normalize Numerical Features

If youre using script scoring with numerical fields like price, rating, or sales, normalize them first. A product with 10,000 sales shouldnt score 100x higher than one with 100 sales. Use logarithmic scaling: Math.log10(sales + 1)this creates a more natural relevance curve.

5. Use Filters for Non-Relevance Criteria

Dont use must or should for filters that dont affect relevancelike date ranges, categories, or availability. Use filter clauses instead. Filters are cached and dont impact scoring, making queries faster and more predictable.

6. Test with Real User Queries

Dont rely on hypothetical queries. Collect real search terms from your application logs. Test your scoring configuration against a representative sample of 50100 real queries. Measure precision (how many top results are relevant) and recall (how many relevant results appear in top 10).

7. Monitor Scoring Over Time

As your data grows, scoring behavior can shift. A term that was rare becomes common. A product category becomes oversaturated. Schedule periodic reviews of your scoring logicespecially after major data updates or feature launches.

8. Document Your Scoring Logic

Scoring configurations are often the most opaque part of a search system. Create a living document that explains:

  • Which fields are boosted and why
  • What function scores are applied and their business rationale
  • How changes are tested and validated

This ensures knowledge doesnt live only in one engineers head.

9. Use Query Time vs. Index Time Boosts Wisely

Boosts can be applied at index time (in the mapping) or query time (in the search request). Query-time boosts are more flexible and recommended. Index-time boosts are static and harder to change without reindexing.

10. Consider User Personalization

For advanced applications, incorporate user behavior into scoring. For example, if a user frequently clicks on products in a certain category, temporarily boost that category in their search results. Use stored user preferences or session data to dynamically adjust the should clauses or function scores.

Tools and Resources

Elasticsearch Explain API

As mentioned, the explain=true parameter is your most important tool. It turns scoring from an abstract concept into a transparent, inspectable process. Always use it during development and debugging.

Kibana Dev Tools

Kibanas Dev Tools console provides a clean interface for testing queries, viewing responses, and analyzing scores. You can save and share query templates, making collaboration easier.

Elasticsearch Ranking Evaluation API

Elasticsearch 7.10+ includes the _rank_eval API, which lets you evaluate the quality of your search results against a set of labeled judgments. For example:

POST /_rank_eval

{

"requests": [

{

"id": "query_1",

"query": {

"match": {

"name": "wireless headphones"

}

},

"ratings": [

{

"doc_id": "123",

"rating": 2

},

{

"doc_id": "456",

"rating": 5

}

]

}

]

}

This API calculates metrics like Mean Reciprocal Rank (MRR) and Discounted Cumulative Gain (DCG), giving you a quantitative measure of how well your scoring performs.

Logstash and Query Analytics

Use Logstash or your application logs to capture user search queries. Analyze them with Kibana to identify:

  • Top search terms
  • Queries with low click-through rates
  • Queries returning no results

This data informs which queries need scoring improvements.

Open Source Libraries

  • elasticsearch-dsl-py (Python): A high-level library for building complex queries programmatically.
  • elasticsearch-js (JavaScript/Node.js): Official client with support for all scoring features.
  • Searchkick (Ruby on Rails): Simplifies Elasticsearch integration and includes built-in relevance tuning.

Documentation and Community

Real Examples

Example 1: E-Commerce Product Search

Scenario: A user searches for running shoes. You want results to prioritize:

  • Products with running shoes in the name
  • Products with high ratings (?4.5)
  • Products with high sales volume
  • Products currently in stock
  • Products updated in the last 30 days

Implementation:

GET /products/_search

{

"query": {

"function_score": {

"query": {

"multi_match": {

"query": "running shoes",

"fields": [

"name^4",

"description^1.2"

]

}

},

"functions": [

{

"gauss": {

"last_updated": {

"origin": "now",

"scale": "30d",

"decay": 0.7

}

}

},

{

"weight": 1.8,

"filter": {

"range": {

"rating": {

"gte": 4.5

}

}

}

},

{

"weight": 1.5,

"filter": {

"range": {

"sales_last_month": {

"gte": 50

}

}

}

},

{

"weight": 1.3,

"filter": {

"term": {

"in_stock": true

}

}

}

],

"score_mode": "sum",

"boost_mode": "multiply"

}

},

"size": 10

}

Results: Products with running shoes in the name and high ratings appear first. Even if a product has running in the name and shoes in the description, it still ranks well due to the multi-match. In-stock items with recent updates get an extra nudge.

Example 2: Content Search for a News Site

Scenario: A user searches for climate change policy. You want to prioritize:

  • Articles from major publishers
  • Recent articles (last 7 days)
  • Articles with high engagement (shares, comments)
  • Articles tagged with policy or government

Implementation:

GET /articles/_search

{

"query": {

"function_score": {

"query": {

"match": {

"content": "climate change policy"

}

},

"functions": [

{

"gauss": {

"published_at": {

"origin": "now",

"scale": "7d",

"decay": 0.8

}

}

},

{

"weight": 2.0,

"filter": {

"term": {

"publisher": "the-new-york-times"

}

}

},

{

"script_score": {

"script": {

"source": "Math.log10(doc['shares'].value + doc['comments'].value + 1) * 0.8"

}

}

},

{

"weight": 1.2,

"filter": {

"terms": {

"tags": [

"policy",

"government",

"regulation"

]

}

}

}

],

"score_mode": "sum",

"boost_mode": "multiply"

}

}

}

Result: A recent article from The New York Times about climate policy with 5,000 shares ranks higher than an older article from a lesser-known blogeven if the blog article has more keyword matches.

Example 3: Internal Knowledge Base Search

Scenario: Employees search for onboarding checklist. You want to prioritize:

  • Documents edited by the HR team
  • Documents viewed frequently by other employees
  • Documents updated in the last 90 days

Implementation:

GET /kb_articles/_search

{

"query": {

"function_score": {

"query": {

"match": {

"content": "onboarding checklist"

}

},

"functions": [

{

"weight": 1.7,

"filter": {

"term": {

"author_department": "hr"

}

}

},

{

"script_score": {

"script": {

"source": "Math.log10(doc['views_last_30d'].value + 1) * 1.2"

}

}

},

{

"gauss": {

"last_edited": {

"origin": "now",

"scale": "90d",

"decay": 0.6

}

}

}

],

"score_mode": "sum",

"boost_mode": "multiply"

}

}

}

Result: The most-viewed, HR-approved, recently updated document rises to the topeven if another document has more keyword matches but is outdated or authored by an inactive team.

FAQs

What is the difference between TF-IDF and BM25?

TF-IDF (Term Frequency-Inverse Document Frequency) is an older scoring model that rewards rare terms and frequent occurrences but doesnt account for document length. BM25 improves upon TF-IDF by normalizing scores based on document length and introducing a saturation function for term frequencymeaning after a certain number of occurrences, additional matches add diminishing returns. Elasticsearch uses BM25 by default because it performs better in real-world scenarios.

Can I use custom scoring algorithms in Elasticsearch?

Yes. You can use script_score with Painless scripts to implement any custom scoring logic, including machine learning models if you precompute scores externally. However, complex scripts can impact performance, so use them judiciously and test thoroughly.

Why is my document not appearing in search results even though it matches the query?

It may be scoring too low. Use explain=true to see the score breakdown. Common causes: the term appears in a low-boosted field, the document is too long (reducing relevance), or a filter is excluding it. Also check if the field is mapped as keyword instead of text.

How do I boost documents from a specific category without affecting relevance?

Use a should clause with a term filter inside a bool query. For example:

"should": [

{

"term": {

"category": "electronics"

}

}

]

This adds a small relevance boost without forcing inclusion. Combine with minimum_should_match: 0 if you dont want to require the category match.

Does boosting a field make it more important than other fields?

Yes, but not infinitely. Elasticsearch normalizes scores across the entire result set. A field boosted by 10x wont necessarily dominate if the content in other fields is significantly more relevant. However, excessive boosting can skew results, so use moderation.

How often should I re-evaluate my scoring strategy?

At least quarterly. If your data changes rapidly (e.g., new products, trending topics), monthly reviews are recommended. Use the Ranking Evaluation API to track performance over time.

Can I use Elasticsearch scoring with multilingual content?

Yes. Use language-specific analyzers (e.g., analyzer: "french") for each field. BM25 works across languages, but proper tokenization and stemming are essential. Consider using the multi_field type to index the same content in multiple languages for better recall.

What happens if I disable scoring entirely?

You can use score_mode: none in function score queries or use a constant_score query to assign all matching documents the same score. This is useful for filtering or when relevance is determined externally (e.g., by a machine learning model). However, you lose the benefit of Elasticsearchs relevance ranking.

Conclusion

Elasticsearch scoring is not just a technical detailits the heartbeat of your search experience. When configured correctly, it transforms a basic keyword matcher into an intelligent, context-aware engine that understands user intent, business priorities, and content quality. From the default BM25 algorithm to advanced function scores and custom scripts, Elasticsearch gives you unprecedented control over relevance.

But with great power comes great responsibility. The key to mastering Elasticsearch scoring lies in iterative testing, transparent documentation, and a deep understanding of your users needs. Start with simple field boosts. Use the explain API religiously. Measure performance with real queries. Avoid over-engineering. And always remember: the goal is not to maximize scoresits to maximize user satisfaction.

As your data grows and your use cases evolve, your scoring strategy must evolve too. Treat it as a living system, not a one-time configuration. By applying the principles and practices outlined in this guide, youll build search experiences that are not just fast and accuratebut genuinely useful.