
Internal Working of Elasticsearch: From Log Ingestion to Storage
When it comes to searching and analyzing large volumes of logs, few technologies are as widely adopted as Elasticsearch. Whether you’re running it standalone, as part of the ELK stack (Elasticsearch, Logstash, Kibana), or with Beats and Fluentd, Elasticsearch has become the backbone of log analytics, observability, and search at scale.
But while many engineers know how to query logs in Kibana or ship logs with Logstash/Fluentd, far fewer understand what’s happening under the hood: how logs are ingested, transformed, indexed, and stored inside Elasticsearch.
In this post, I’ll break down the internal architecture of Elasticsearch ingestion and storage, covering:
-
How documents are ingested.
-
The role of analyzers and inverted indexes.
-
How data is written to shards, segments, and Lucene.
-
How Elasticsearch balances durability with speed using transaction logs and segment merges.
-
ASCII flow diagrams for clarity.
Why SREs Should Care About Elasticsearch Internals
Understanding Elasticsearch’s internals is critical when you’re responsible for its reliability, performance, and scale:
-
Explains why disk I/O spikes during heavy indexing.
-
Helps tune refresh intervals, replicas, and segment merging.
-
Makes sense of cluster sizing and shard allocation decisions.
-
Crucial for troubleshooting slow queries or indexing backlogs.
-
Lets you set the right expectations for retention, durability, and fault tolerance.
High-Level Elasticsearch Architecture
At a high level, Elasticsearch works like this:
-
Clients (Logstash, Beats, Fluentd, Apps) send JSON documents to Elasticsearch.
-
Elasticsearch nodes accept the requests and route them to the appropriate shard.
-
Logs are processed through analyzers (tokenizers, filters) to build inverted indexes.
-
Data is written to in-memory structures and persisted to disk.
-
Periodically, data is flushed into Lucene segments and merged for efficiency.
-
Queries search across inverted indexes spread across primary and replica shards.
Log Ingestion Flow
Let’s say your application logs a request like this:
{
"timestamp": "2025-08-20T13:00:00Z",
"level": "ERROR",
"message": "Database connection failed for user admin"
}
When this document is sent to Elasticsearch, here’s the journey it takes:
-
Routing to the right shard (based on
_id
or custom routing). -
Parsing JSON into an internal document format.
-
Analyzing text fields (like
message
) using analyzers. -
Writing to transaction log (translog) for durability.
-
Storing in memory buffer for fast indexing.
-
Flushing to segment files on disk (Lucene).
-
Merging segments over time to optimize queries.
Transaction Log (Translog) — The WAL of Elasticsearch
Just like Prometheus uses a Write-Ahead Log (WAL), Elasticsearch uses a Transaction Log (translog).
-
Every document written is first appended to the translog.
-
This ensures durability — if a node crashes, Elasticsearch replays the translog to recover unflushed operations.
-
Translog lives in:
<data-dir>/nodes/0/indices/<index>/translog/
This makes Elasticsearch resilient to crashes, but also means heavy indexing generates lots of disk writes.
In-Memory Buffers and Refresh
After writing to translog, documents go to an in-memory buffer.
-
This buffer is periodically “refreshed” (default: 1s).
-
Refresh creates a new Lucene segment and makes documents searchable almost instantly.
-
But this comes at a cost: frequent refreshes = high I/O.
That’s why for log-heavy systems, tuning index.refresh_interval
is a common optimization.
Lucene Segments (Immutable Data Files)
At the heart of Elasticsearch lies Apache Lucene.
-
Every document is stored inside segments.
-
Segments are immutable files on disk.
-
Each segment contains an inverted index (mapping terms → documents).
Example for the message
field:
"Database connection failed for user admin"
After analysis (tokenization, lowercasing, stopword removal), this might become:
["database", "connection", "failed", "user", "admin"]
These terms go into the inverted index:
database → [doc 1]
connection → [doc 1]
failed → [doc 1]
user → [doc 1]
admin → [doc 1]
This is what makes Elasticsearch queries so fast — it doesn’t scan raw logs, it searches inverted indexes.
Segment Merging
Because refresh happens frequently, Elasticsearch creates many small segments.
-
Small segments = inefficient queries.
-
Elasticsearch runs background merge processes to combine small segments into larger ones.
-
During merges, deleted documents are purged, and storage is optimized.
This explains why you see CPU and disk I/O spikes during indexing-heavy workloads.
Shards and Replicas
-
Each Elasticsearch index is split into shards.
-
Shards are just Lucene indexes behind the scenes.
-
Primary shards handle writes.
-
Replica shards ensure redundancy and distribute queries.
For example:
-
Index with 3 primary shards and 1 replica = total 6 shards.
-
Each shard contains its own set of segments, translogs, and inverted indexes.
End-to-End Log Ingestion Flow
Log Producer (App / Logstash / Beats / Fluentd)
|
v
+----------------------+
| Elasticsearch Node |
+----------------------+
|
v
+----------------------+
| Routing to Shard |
+----------------------+
|
v
+----------------------+
| Transaction Log | (translog for durability)
+----------------------+
|
v
+----------------------+
| In-Memory Buffer | (documents waiting to be flushed)
+----------------------+
|
v
+----------------------+
| Lucene Segment | (immutable, searchable)
+----------------------+
|
v
+----------------------+
| Segment Merge | (optimize storage, remove deletes)
+----------------------+
|
v
+----------------------+
| Searchable Index |
+----------------------+
Data Path Inside a Shard
Incoming Document
|
v
+----------------------+
| Analyzer | (tokenizer + filters)
+----------------------+
|
v
+----------------------+
| Tokens | (terms like "database", "failed")
+----------------------+
|
v
+----------------------+
| Inverted Index | (term → doc mapping)
+----------------------+
|
v
+----------------------+
| Segment File (Disk) |
+----------------------+
Query Execution Path
Search Query
|
v
+----------------------+
| Coordinator Node |
+----------------------+
| |
v v
+-----------+ +-----------+
| Primary | | Replica |
| Shard | | Shard |
+-----------+ +-----------+
| |
v v
Partial Results Partial Results
\ /
\ /
v v
+----------------------+
| Merge & Rank Results |
+----------------------+
|
v
Final Results to Client
Why This Matters for SREs
-
Durability vs Performance Tradeoff
-
Translog ensures durability but increases disk writes.
-
Tuning
flush
andsync
intervals is key.
-
-
Refresh Intervals
-
Default 1s refresh is great for real-time search.
-
For pure log storage, increasing refresh interval improves indexing throughput.
-
-
Segment Merging Overhead
-
Merges consume CPU/disk.
-
On log-heavy clusters, plan for this overhead.
-
-
Shard Sizing
-
Too many small shards = overhead.
-
Too few large shards = hotspots.
-
Rule of thumb: 20–50 GB per shard for logs.
-
-
Retention & Deletion
-
Deleting old indices is cheaper than deleting documents.
-
Use ILM (Index Lifecycle Management) for automation.
-
Analogy — Elasticsearch as a Library
Think of Elasticsearch log ingestion as running a huge library:
-
Translog → rough notes written down before putting books on shelves.
-
In-Memory Buffer → a cart holding new arrivals waiting to be shelved.
-
Lucene Segments → shelves of books (immutable, well-organized).
-
Segment Merging → reorganizing shelves to save space and remove old junk.
-
Shards & Replicas → multiple wings of the library with duplicate collections for reliability.
So, how is Elasticsearch evolving?
Elasticsearch is evolving with:
-
Frozen tiers → store old logs cheaply on object storage.
-
Searchable snapshots → query directly from S3/Blob without restoring.
-
ILM policies → automate log aging (hot → warm → cold → frozen).
But the fundamental model of translog → memory → segments → merged indexes remains core.
So, to Conclude:
Elasticsearch may look like “just a search engine,” but its internals are closer to a distributed database + search system hybrid.
Understanding the flow from log ingestion to Lucene segments helps SREs and DevOps engineers:
-
Tune performance.
-
Control disk usage.
-
Debug indexing slowdowns.
-
Build reliable observability stacks with EFK/ELK.
So next time your cluster spikes CPU during indexing, or your queries slow down, you’ll know exactly what’s happening under the hood.
<Note: I've used ChatGPT for a few things while researching and restructuring this blog!>