How to Search Data in Elasticsearch

How to Search Data in Elasticsearch Elasticsearch is a powerful, distributed search and analytics engine built on Apache Lucene. It enables real-time search across vast datasets with high scalability and performance. Whether you're analyzing log files, powering e-commerce product discovery, or building full-text search applications, mastering how to search data in Elasticsearch is essential for an

alex

Nov 10, 2025 - 12:12

How to Search Data in Elasticsearch

Elasticsearch is a powerful, distributed search and analytics engine built on Apache Lucene. It enables real-time search across vast datasets with high scalability and performance. Whether you're analyzing log files, powering e-commerce product discovery, or building full-text search applications, mastering how to search data in Elasticsearch is essential for any developer, data engineer, or analyst working with modern data stacks.

Unlike traditional relational databases that rely on structured queries and rigid schemas, Elasticsearch excels at unstructured and semi-structured data. It uses inverted indexes to deliver sub-second search results, supports complex queries including fuzzy matching, phrase searches, and boolean logic, and integrates seamlessly with tools like Kibana, Logstash, and Beats. Understanding how to effectively search data in Elasticsearch unlocks the ability to extract meaningful insights from massive volumes of information quickly and accurately.

This comprehensive guide walks you through the entire processfrom basic queries to advanced search techniquesensuring you can confidently retrieve, filter, and analyze data in Elasticsearch. By the end, youll know how to construct efficient queries, apply best practices, leverage key tools, and troubleshoot common issuesall critical skills for production-grade search applications.

Step-by-Step Guide

1. Setting Up Elasticsearch

Before you can search data, you need a running Elasticsearch instance. The easiest way to get started is by using Docker. Run the following command to start Elasticsearch version 8.x:

docker run -p 9200:9200 -p 9300:9300 -e "discovery.type=single-node" docker.elastic.co/elasticsearch/elasticsearch:8.12.0

Once the container is up, verify the installation by sending a GET request to http://localhost:9200. You should receive a JSON response containing cluster details like version, name, and cluster UUID.

For production environments, consider deploying Elasticsearch on Kubernetes, AWS Elasticsearch Service, or Azure Managed Elasticsearch. Ensure proper security configurations, including TLS encryption, role-based access control (RBAC), and firewall rules.

2. Indexing Sample Data

To practice searching, you need data. Elasticsearch stores data in indices, which are similar to tables in relational databases. Each index contains documentsJSON objects representing individual records.

Lets create an index called products and add a few sample documents:

PUT /products
{
"settings": {
"number_of_shards": 1,
"number_of_replicas": 0
},
"mappings": {
"properties": {
"name": { "type": "text" },
"description": { "type": "text" },
"price": { "type": "float" },
"category": { "type": "keyword" },
"in_stock": { "type": "boolean" },
"created_at": { "type": "date", "format": "yyyy-MM-dd HH:mm:ss" }
}
}
}

Now insert sample documents:

POST /products/_bulk
{ "index": { "_id": "1" } }
{ "name": "Wireless Bluetooth Headphones", "description": "Noise-cancelling headphones with 30-hour battery life", "price": 199.99, "category": "Electronics", "in_stock": true, "created_at": "2024-01-15 10:30:00" }
{ "index": { "_id": "2" } }
{ "name": "Organic Cotton T-Shirt", "description": "Soft, eco-friendly cotton t-shirt in multiple colors", "price": 29.99, "category": "Clothing", "in_stock": true, "created_at": "2024-01-16 14:20:00" }
{ "index": { "_id": "3" } }
{ "name": "Smart Thermostat", "description": "Programmable thermostat with mobile app control", "price": 249.99, "category": "Electronics", "in_stock": false, "created_at": "2024-01-14 09:15:00" }
{ "index": { "_id": "4" } }
{ "name": "Yoga Mat", "description": "Non-slip, eco-friendly yoga mat with carrying strap", "price": 45.50, "category": "Sports", "in_stock": true, "created_at": "2024-01-17 11:45:00" }
{ "index": { "_id": "5" } }
{ "name": "Coffee Maker", "description": "Programmable drip coffee maker with thermal carafe", "price": 89.99, "category": "Home", "in_stock": true, "created_at": "2024-01-13 16:10:00" }

After indexing, confirm the data is present using:

GET /products/_search

3. Basic Search Queries

The most fundamental search in Elasticsearch is the _search endpoint. It accepts a JSON body with a query object.

Match All Query Returns all documents in the index:

GET /products/_search
{
"query": {
"match_all": {}
}
}

Match Query Searches for terms within text fields (analyzed):

GET /products/_search
{
"query": {
"match": {
"name": "headphones"
}
}
}

This returns the wireless Bluetooth headphones document because headphones appears in the name field. Elasticsearch analyzes the text, tokenizes it, and matches against the inverted index.

Term Query Searches for exact values in keyword fields (not analyzed):

GET /products/_search
{
"query": {
"term": {
"category": "Electronics"
}
}
}

Unlike match, term does not analyze the input. It looks for exact matches. This is ideal for filters on structured fields like category, in_stock, or id.

4. Combining Queries with Boolean Logic

Elasticsearch supports complex queries using the bool query, which combines multiple conditions using must, should, must_not, and filter.

Must (AND) All conditions must be true:

GET /products/_search
{
"query": {
"bool": {
"must": [
{ "match": { "name": "coffee" } },
{ "range": { "price": { "gte": 50, "lte": 100 } } }
]
}
}
}

This returns only the coffee maker, since it matches both the term coffee in the name and falls within the price range.

Should (OR) At least one condition must be true:

GET /products/_search
{
"query": {
"bool": {
"should": [
{ "term": { "category": "Electronics" } },
{ "term": { "category": "Sports" } }
],
"minimum_should_match": 1
}
}
}

Must Not (NOT) Exclude documents matching a condition:

GET /products/_search
{
"query": {
"bool": {
"must": [
{ "match": { "description": "mat" } }
],
"must_not": [
{ "term": { "category": "Sports" } }
]
}
}
}

This returns no results because the only document with mat in the description is the yoga mat, which is in the Sports category.

5. Filtering Results with Post-Filter and Query Context

Use filter context for conditions that dont affect scoring. Filters are cached and faster than queries because they only return yes/no answers.

GET /products/_search
{
"query": {
"bool": {
"must": [
{ "match": { "name": "headphones" } }
],
"filter": [
{ "term": { "in_stock": true } }
]
}
}
}

Here, the match query determines relevance (score), while the filter ensures only in-stock items are returned. Filters are ideal for narrowing results by status, date ranges, or categories.

6. Sorting and Pagination

Sort results by one or more fields:

GET /products/_search
{
"query": {
"match_all": {}
},
"sort": [
{ "price": { "order": "asc" } },
{ "name": { "order": "asc" } }
]
}

Paginate using from and size:

GET /products/_search
{
"query": {
"match_all": {}
},
"from": 2,
"size": 2
}

This skips the first two results and returns the next two. For deep pagination (>10,000 documents), use search_after instead of from for better performance:

GET /products/_search
{
"query": {
"match_all": {}
},
"sort": [
{ "price": "asc" },
{ "_id": "asc" }
],
"size": 2,
"search_after": [29.99, "2"]
}

Use the last sort values from the previous response as the search_after parameter to fetch the next page.

7. Highlighting Search Terms

Highlighting helps users see where their search terms matched in the results. Enable it with the highlight parameter:

GET /products/_search
{
"query": {
"match": {
"description": "coffee maker"
}
},
"highlight": {
"fields": {
"description": {}
}
}
}

The response includes a highlight section with <em> tags around matched terms, making it easy to render in UIs:

"highlight": {
"description": [
"Programmable drip coffee maker with thermal carafe"
]
}

8. Aggregations for Data Analysis

Aggregations allow you to group and summarize data, similar to SQLs GROUP BY. Theyre invaluable for dashboards and analytics.

Terms Aggregation Count documents by category:

GET /products/_search
{
"size": 0,
"aggs": {
"categories": {
"terms": {
"field": "category.keyword"
}
}
}
}

Result:

"aggregations": {
"categories": {
"buckets": [
{ "key": "Electronics", "doc_count": 2 },
{ "key": "Clothing", "doc_count": 1 },
{ "key": "Home", "doc_count": 1 },
{ "key": "Sports", "doc_count": 1 }
]
}
}

Metrics Aggregation Calculate average price:

GET /products/_search
{
"size": 0,
"aggs": {
"avg_price": {
"avg": {
"field": "price"
}
}
}
}

Combine aggregations for powerful insights:

GET /products/_search
{
"size": 0,
"aggs": {
"categories": {
"terms": {
"field": "category.keyword"
},
"aggs": {
"avg_price": {
"avg": {
"field": "price"
}
}
}
}
}
}

This returns average price per categoryperfect for product performance analysis.

9. Using the Query DSL for Advanced Scenarios

Elasticsearchs Query DSL (Domain Specific Language) is a JSON-based language for constructing complex queries. Beyond basic match and term queries, explore:

Prefix Query Match terms starting with a given string: "prefix": { "name": "wire" }
Wildcard Query Use * and ? for pattern matching: "wildcard": { "name": "*head*" }
Regexp Query Use regular expressions: "regexp": { "name": ".*coffee.*" }
Fuzzy Query Handle typos: "fuzzy": { "name": "hedphones" }
Range Query Filter by numeric or date ranges: "range": { "price": { "gt": 50 } }
Exists Query Find documents with a field present: "exists": { "field": "in_stock" }

Use these sparinglywildcard and regexp queries can be slow on large datasets. Prefer prefix or term queries where possible.

Best Practices

1. Use Keyword Fields for Filtering and Aggregations

Always map fields used for exact matching, sorting, or aggregations as keyword, not text. Text fields are analyzed and split into tokens, making them unsuitable for exact lookups. For example, searching for "Electronics" in a text field may fail if the analyzer lowercases it to "electronics". Use category.keyword instead.

2. Avoid Deep Pagination

Using from and size beyond 10,000 documents degrades performance and consumes memory. Use search_after for infinite scrolling or cursor-based pagination. Its stateless and scales efficiently.

3. Optimize Index Settings for Your Use Case

For write-heavy workloads, reduce replicas and increase refresh intervals. For search-heavy applications, increase shards (but not too manyaim for 1050 GB per shard). Use index.codec to compress data and reduce disk usage.

4. Use Filters Instead of Queries When Possible

Filters are cached and faster. If you dont need relevance scoring (e.g., filtering by date or status), always use filter context within a bool query.

5. Limit Returned Fields with Source Filtering

Use _source to return only necessary fields:

GET /products/_search
{
"_source": ["name", "price", "category"],
"query": {
"match_all": {}
}
}

This reduces network traffic and improves response times, especially with large documents.

6. Monitor Query Performance with Profile API

To debug slow queries, use the profile endpoint:

GET /products/_search
{
"profile": true,
"query": {
"match": {
"description": "coffee"
}
}
}

The response includes detailed timing for each phase of the queryhelpful for identifying bottlenecks like expensive filters or large result sets.

7. Index Data with Proper Mapping Upfront

Always define mappings explicitly before indexing. Auto-detection can lead to incorrect types (e.g., treating numbers as strings). Use index templates to enforce consistent mappings across time-based indices.

8. Use Index Lifecycle Management (ILM)

For time-series data (logs, metrics), use ILM to automatically rollover indices, delete old data, and optimize storage. This prevents unbounded growth and ensures performance stability.

9. Secure Your Cluster

Enable X-Pack security features: authenticate users, assign roles, encrypt traffic with TLS, and restrict network access. Never expose Elasticsearch to the public internet without authentication.

10. Test Queries with Realistic Data Volumes

Performance characteristics change dramatically at scale. Use synthetic data generators or real production snapshots to test query latency, memory usage, and throughput before deploying to production.

Tools and Resources

1. Kibana

Kibana is the official visualization and exploration tool for Elasticsearch. It provides:

Dev Tools console for writing and testing queries
Discover tab for browsing raw documents
Visualize and Dashboard builders for aggregations
Monitoring and alerting features

Access Kibana at http://localhost:5601 after installing it alongside Elasticsearch.

2. Postman and cURL

Use Postman for GUI-based API testing or cURL for scripting and automation:

curl -X GET "localhost:9200/products/_search?pretty" -H 'Content-Type: application/json' -d'
{
"query": {
"match": {
"name": "headphones"
}
}
}'

3. Elasticsearch API Reference

Always refer to the official documentation: Elasticsearch Search API. It includes examples, parameters, and version-specific behavior.

4. Elasticsearch Query DSL Visualizer

Third-party tools like Elastics Query DSL Guide and online visualizers help you build and understand complex queries without writing code.

5. Elasticsearch Plugins

Install plugins for extended functionality:

Analysis-ICU Enhanced Unicode text analysis
Analysis-Phonetic Soundex and Metaphone for fuzzy matching
Analysis-Stopwords Custom stop word lists

6. OpenSearch

For open-source alternatives, consider OpenSearcha fork of Elasticsearch 7.10.2 with active community development. It supports nearly identical APIs and is used by AWS.

7. Learning Resources

Elasticsearch: The Definitive Guide Free online book by Elastic
Elasticsearch in Action Book by Radu Gheorghe and others
Elastic University Free courses on search, analytics, and administration
YouTube Channels Elastic, Logz.io, and DevOps with Elasticsearch

8. Monitoring Tools

Use Prometheus + Grafana, Datadog, or Elastic Observability to monitor cluster health, query latency, heap usage, and disk I/O. Set alerts for high CPU, low disk space, or slow search times.

Real Examples

Example 1: E-Commerce Product Search

Scenario: A retail website needs to search products by name, filter by category and price, sort by relevance and popularity, and show highlights.

GET /products/_search
{
"query": {
"bool": {
"must": [
{
"multi_match": {
"query": "wireless headphones",
"fields": ["name^3", "description"],
"type": "best_fields"
}
}
],
"filter": [
{ "term": { "in_stock": true } },
{ "range": { "price": { "lte": 250 } } }
]
}
},
"sort": [
{ "_score": { "order": "desc" } },
{ "popularity_score": { "order": "desc" } }
],
"highlight": {
"fields": {
"name": {},
"description": {}
}
},
"_source": ["name", "price", "category", "in_stock"],
"size": 10
}

Key features:

multi_match with field boosting (name^3) prioritizes matches in the product name
filter ensures only in-stock, affordable items appear
sort combines relevance and custom popularity score
highlight improves UX by marking search terms

Example 2: Log Analysis with Time-Based Filtering

Scenario: A DevOps team searches application logs for errors in the last 24 hours.

GET /logs-2024.06.15/_search
{
"query": {
"bool": {
"must": [
{ "match": { "message": "error" } },
{ "match": { "service": "payment-service" } }
],
"filter": [
{
"range": {
"@timestamp": {
"gte": "now-24h",
"lte": "now"
}
}
}
]
}
},
"sort": [
{ "@timestamp": { "order": "desc" } }
],
"size": 50,
"_source": ["@timestamp", "service", "level", "message"]
}

Uses:

Time-based index pattern (logs-YYYY.MM.DD)
now-24h for dynamic time ranges
Sorted by timestamp for chronological review

Example 3: User Behavior Analytics

Scenario: A SaaS company analyzes user clickstream data to find top-used features.

GET /user_events/_search
{
"size": 0,
"aggs": {
"feature_clicks": {
"terms": {
"field": "feature_name.keyword",
"size": 10,
"order": { "_count": "desc" }
}
},
"avg_session_duration": {
"avg": {
"field": "session_duration_seconds"
}
}
}
}

Result: Top 10 most clicked features and average session lengthused to prioritize product improvements.

Example 4: Fuzzy Search for Typo Tolerance

Scenario: A search bar accepts misspelled queries like iphon instead of iPhone.

GET /products/_search
{
"query": {
"fuzzy": {
"name": {
"value": "iphon",
"fuzziness": "AUTO",
"prefix_length": 1
}
}
}
}

Uses fuzzy matching with prefix_length to avoid excessive performance cost. Returns iPhone even with a typo.

FAQs

What is the difference between match and term queries in Elasticsearch?

The match query analyzes the input text and searches across analyzed fields (like text), making it ideal for full-text search. The term query looks for exact matches in non-analyzed fields (like keyword) and is used for filters, aggregations, and exact lookups.

Why is my Elasticsearch search slow?

Slow searches can be caused by: too many shards, large result sets, unoptimized mappings, deep pagination, complex nested queries, or insufficient hardware. Use the Profile API to identify bottlenecks and optimize accordingly.

Can I search across multiple indices at once?

Yes. You can search multiple indices by specifying them in the URL: GET /products,logs/_search or use wildcards: GET /logs-*/_search. You can also search all indices with GET /_search.

How do I handle case-insensitive searches?

For text fields, use an analyzer like lowercase during indexing. For exact matches on keyword fields, use a normalizer with lowercase transformation.

What is the maximum number of results Elasticsearch can return?

By default, Elasticsearch limits results to 10,000 for performance reasons. To increase this, adjust the index.max_result_window setting, but consider using search_after instead for better scalability.

How do I delete documents matching a search query?

Use the Delete By Query API:

POST /products/_delete_by_query
{
"query": {
"match": {
"category": "Outdated"
}
}
}

Does Elasticsearch support SQL queries?

Yes, via the SQL API: GET /_sql?format=txt with a JSON body containing "query": "SELECT name FROM products WHERE price > 100". It translates SQL into native Elasticsearch queries.

How do I update a document in Elasticsearch?

Use the Update API:

POST /products/_update/1
{
"doc": {
"in_stock": false
}
}

Or reindex the entire document using PUT /products/_doc/1.

Can I use Elasticsearch for real-time analytics?

Absolutely. With aggregations, ingest pipelines, and near real-time indexing (default refresh interval: 1 second), Elasticsearch is ideal for dashboards, monitoring, and live analytics.

What happens if my Elasticsearch cluster runs out of disk space?

Elasticsearch will block writes to prevent data loss. Monitor disk usage and configure ILM policies to auto-delete old indices or move them to colder storage tiers.

Conclusion

Searching data in Elasticsearch is both an art and a science. Mastering the Query DSL, understanding the difference between query and filter contexts, leveraging aggregations for analytics, and applying best practices for performance and scalability are critical skills for anyone working with modern search and data systems.

From e-commerce product discovery to log analysis and user behavior tracking, Elasticsearch powers some of the most demanding search applications in the world. By following the step-by-step guide, adopting the recommended best practices, using the right tools, and studying real-world examples, you can build fast, accurate, and scalable search experiences that deliver real business value.

Remember: Elasticsearch is not a replacement for relational databasesits a complementary tool optimized for search and analytics. Use it where it shines: full-text search, fuzzy matching, real-time filtering, and large-scale aggregation. With the knowledge in this guide, youre now equipped to harness its full potential and solve complex data retrieval challenges with confidence.

alex

How to Search Data in Elasticsearch

How to Search Data in Elasticsearch

Step-by-Step Guide

1. Setting Up Elasticsearch

2. Indexing Sample Data

3. Basic Search Queries

4. Combining Queries with Boolean Logic

5. Filtering Results with Post-Filter and Query Context

6. Sorting and Pagination

7. Highlighting Search Terms

8. Aggregations for Data Analysis

9. Using the Query DSL for Advanced Scenarios

Best Practices

1. Use Keyword Fields for Filtering and Aggregations

2. Avoid Deep Pagination

3. Optimize Index Settings for Your Use Case

4. Use Filters Instead of Queries When Possible

5. Limit Returned Fields with Source Filtering

6. Monitor Query Performance with Profile API

7. Index Data with Proper Mapping Upfront

8. Use Index Lifecycle Management (ILM)

9. Secure Your Cluster

10. Test Queries with Realistic Data Volumes

Tools and Resources

1. Kibana

2. Postman and cURL

3. Elasticsearch API Reference

4. Elasticsearch Query DSL Visualizer

5. Elasticsearch Plugins

6. OpenSearch

7. Learning Resources

8. Monitoring Tools

Real Examples

Example 1: E-Commerce Product Search

Example 2: Log Analysis with Time-Based Filtering

Example 3: User Behavior Analytics

Example 4: Fuzzy Search for Typo Tolerance

FAQs

What is the difference between match and term queries in Elasticsearch?

Why is my Elasticsearch search slow?

Can I search across multiple indices at once?

How do I handle case-insensitive searches?

What is the maximum number of results Elasticsearch can return?

How do I delete documents matching a search query?

Does Elasticsearch support SQL queries?

How do I update a document in Elasticsearch?

Can I use Elasticsearch for real-time analytics?

What happens if my Elasticsearch cluster runs out of disk space?

Conclusion

Related Posts

Popular Posts

Recommended Posts

Popular Tags