Internal Working of Elasticsearch: From Log Ingestion to Storage

When it comes to searching and analyzing large volumes of logs, few technologies are as widely adopted as Elasticsearch. Whether you’re running it standalone, as part of the ELK stack (Elasticsearch, Logstash, Kibana), or with Beats and Fluentd, Elasticsearch has become the backbone of log analytics, observability, and search at scale.

But while many engineers know how to query logs in Kibana or ship logs with Logstash/Fluentd, far fewer understand what’s happening under the hood: how logs are ingested, transformed, indexed, and stored inside Elasticsearch.

In this post, I’ll break down the internal architecture of Elasticsearch ingestion and storage, covering:

How documents are ingested.
The role of analyzers and inverted indexes.
How data is written to shards, segments, and Lucene.
How Elasticsearch balances durability with speed using transaction logs and segment merges.
ASCII flow diagrams for clarity.

Why SREs Should Care About Elasticsearch Internals

Understanding Elasticsearch’s internals is critical when you’re responsible for its reliability, performance, and scale:

Explains why disk I/O spikes during heavy indexing.
Helps tune refresh intervals, replicas, and segment merging.
Makes sense of cluster sizing and shard allocation decisions.
Crucial for troubleshooting slow queries or indexing backlogs.
Lets you set the right expectations for retention, durability, and fault tolerance.

High-Level Elasticsearch Architecture

At a high level, Elasticsearch works like this:

Clients (Logstash, Beats, Fluentd, Apps) send JSON documents to Elasticsearch.
Elasticsearch nodes accept the requests and route them to the appropriate shard.
Logs are processed through analyzers (tokenizers, filters) to build inverted indexes.
Data is written to in-memory structures and persisted to disk.
Periodically, data is flushed into Lucene segments and merged for efficiency.
Queries search across inverted indexes spread across primary and replica shards.

Log Ingestion Flow

Let’s say your application logs a request like this:

{
  "timestamp": "2025-08-20T13:00:00Z",
  "level": "ERROR",
  "message": "Database connection failed for user admin"
}

When this document is sent to Elasticsearch, here’s the journey it takes:

Routing to the right shard (based on _id or custom routing).
Parsing JSON into an internal document format.
Analyzing text fields (like message) using analyzers.
Writing to transaction log (translog) for durability.
Storing in memory buffer for fast indexing.
Flushing to segment files on disk (Lucene).
Merging segments over time to optimize queries.

Transaction Log (Translog) — The WAL of Elasticsearch

Just like Prometheus uses a Write-Ahead Log (WAL), Elasticsearch uses a Transaction Log (translog).

Every document written is first appended to the translog.
This ensures durability — if a node crashes, Elasticsearch replays the translog to recover unflushed operations.

Translog lives in:

<data-dir>/nodes/0/indices/<index>/translog/

This makes Elasticsearch resilient to crashes, but also means heavy indexing generates lots of disk writes.

In-Memory Buffers and Refresh

After writing to translog, documents go to an in-memory buffer.

This buffer is periodically “refreshed” (default: 1s).
Refresh creates a new Lucene segment and makes documents searchable almost instantly.
But this comes at a cost: frequent refreshes = high I/O.

That’s why for log-heavy systems, tuning index.refresh_interval is a common optimization.

Lucene Segments (Immutable Data Files)

At the heart of Elasticsearch lies Apache Lucene.

Every document is stored inside segments.
Segments are immutable files on disk.
Each segment contains an inverted index (mapping terms → documents).

Example for the message field:

"Database connection failed for user admin"

After analysis (tokenization, lowercasing, stopword removal), this might become:

["database", "connection", "failed", "user", "admin"]

These terms go into the inverted index:

database  → [doc 1]
connection → [doc 1]
failed    → [doc 1]
user      → [doc 1]
admin     → [doc 1]

This is what makes Elasticsearch queries so fast — it doesn’t scan raw logs, it searches inverted indexes.

Segment Merging

Because refresh happens frequently, Elasticsearch creates many small segments.

Small segments = inefficient queries.
Elasticsearch runs background merge processes to combine small segments into larger ones.
During merges, deleted documents are purged, and storage is optimized.

This explains why you see CPU and disk I/O spikes during indexing-heavy workloads.

Shards and Replicas

Each Elasticsearch index is split into shards.
Shards are just Lucene indexes behind the scenes.
Primary shards handle writes.
Replica shards ensure redundancy and distribute queries.

For example:

Index with 3 primary shards and 1 replica = total 6 shards.
Each shard contains its own set of segments, translogs, and inverted indexes.

End-to-End Log Ingestion Flow

Log Producer (App / Logstash / Beats / Fluentd)
          |
          v
+----------------------+
| Elasticsearch Node   |
+----------------------+
          |
          v
+----------------------+
| Routing to Shard     |
+----------------------+
          |
          v
+----------------------+
| Transaction Log      |  (translog for durability)
+----------------------+
          |
          v
+----------------------+
| In-Memory Buffer     |  (documents waiting to be flushed)
+----------------------+
          |
          v
+----------------------+
| Lucene Segment       |  (immutable, searchable)
+----------------------+
          |
          v
+----------------------+
| Segment Merge        |  (optimize storage, remove deletes)
+----------------------+
          |
          v
+----------------------+
| Searchable Index     |
+----------------------+

Data Path Inside a Shard

Incoming Document
        |
        v
+----------------------+
| Analyzer             |  (tokenizer + filters)
+----------------------+
        |
        v
+----------------------+
| Tokens               |  (terms like "database", "failed")
+----------------------+
        |
        v
+----------------------+
| Inverted Index       |  (term → doc mapping)
+----------------------+
        |
        v
+----------------------+
| Segment File (Disk)  |
+----------------------+

Query Execution Path

Search Query
      |
      v
+----------------------+
| Coordinator Node     |
+----------------------+
      |             |
      v             v
+-----------+   +-----------+
| Primary   |   | Replica   |
| Shard     |   | Shard     |
+-----------+   +-----------+
      |             |
      v             v
Partial Results   Partial Results
      \             /
       \           /
        v         v
+----------------------+
| Merge & Rank Results |
+----------------------+
        |
        v
Final Results to Client

Why This Matters for SREs

Durability vs Performance Tradeoff
- Translog ensures durability but increases disk writes.
- Tuning flush and sync intervals is key.
Refresh Intervals
- Default 1s refresh is great for real-time search.
- For pure log storage, increasing refresh interval improves indexing throughput.
Segment Merging Overhead
- Merges consume CPU/disk.
- On log-heavy clusters, plan for this overhead.
Shard Sizing
- Too many small shards = overhead.
- Too few large shards = hotspots.
- Rule of thumb: 20–50 GB per shard for logs.
Retention & Deletion
- Deleting old indices is cheaper than deleting documents.
- Use ILM (Index Lifecycle Management) for automation.

Analogy — Elasticsearch as a Library

Think of Elasticsearch log ingestion as running a huge library:

Translog → rough notes written down before putting books on shelves.
In-Memory Buffer → a cart holding new arrivals waiting to be shelved.
Lucene Segments → shelves of books (immutable, well-organized).
Segment Merging → reorganizing shelves to save space and remove old junk.
Shards & Replicas → multiple wings of the library with duplicate collections for reliability.

So, how is Elasticsearch evolving?

Elasticsearch is evolving with:

Frozen tiers → store old logs cheaply on object storage.
Searchable snapshots → query directly from S3/Blob without restoring.
ILM policies → automate log aging (hot → warm → cold → frozen).

But the fundamental model of translog → memory → segments → merged indexes remains core.

So, to Conclude:

Elasticsearch may look like “just a search engine,” but its internals are closer to a distributed database + search system hybrid.

Understanding the flow from log ingestion to Lucene segments helps SREs and DevOps engineers:

Tune performance.
Control disk usage.
Debug indexing slowdowns.
Build reliable observability stacks with EFK/ELK.

So next time your cluster spikes CPU during indexing, or your queries slow down, you’ll know exactly what’s happening under the hood.

<Note: I've used ChatGPT for a few things while researching and restructuring this blog!>