How Blockchain Indexers Work: Building Real-Time DeFi Data Pipelines

June 10, 2025·9 min read

On-chain data is relentless. Every block adds thousands of events — swaps, deposits, liquidations, transfers — and if you're building a DeFi product, you need to query that data fast without hammering a node. That's what blockchain indexers solve.

The problem with raw RPC calls

Every Ethereum node exposes a JSON-RPC interface. You can call eth_getLogs, eth_call, or eth_getTransactionReceipt directly. But this approach falls apart quickly at scale. Fetching the full event history for a smart contract means paginating across potentially millions of blocks. Rate limits kick in. Latency spikes. And every new page load triggers the same expensive queries.

The naive solution — cache it in Redis — only delays the problem. What you actually need is a persistent, indexed copy of the blockchain state relevant to your application. That's an indexer.

What an indexer actually does

At its core, an indexer is a process that listens to a blockchain node in real time, decodes raw event logs into structured records, and writes them to a database your application can query normally.

The pipeline has three stages:

01.Ingestion — subscribe to new blocks via WebSocket or poll via HTTP. For each block, fetch logs matching your contract addresses and event signatures.
02.Decoding — raw logs are ABI-encoded hex. Use the contract ABI to decode them into typed structs. A Transfer(address,address,uint256) event becomes a proper object with from, to, and amount.
03.Storage — write the decoded events to PostgreSQL (or MongoDB if you prefer schema flexibility). Index on block number, contract address, and relevant fields.

Building the Curvance indexer

When I built the indexer for Curvance — a DeFi lending/borrowing protocol — the requirement was simple: keep a fresh, queryable snapshot of all user positions, market states, and liquidation events across multiple chains.

The stack was Node.js + Ethers.js + PostgreSQL. Here's the rough architecture that worked:

# indexer architecture

BlockListener → poll every 12s (1 ETH block)

EventFilter → filter by contract + topic hash

ABIDecoder → decode logs → typed DTOs

UpsertWorker → batch writes with ON CONFLICT

ReorgHandler → rollback on chain reorgs

The trickiest part wasn't the happy path — it was handling chain reorgs. When a block gets reorganized out of the canonical chain, any events you indexed from it need to be rolled back. Solution: never delete indexed events, instead mark them with a status column and always query only confirmed events (blocks older than 12 confirmations).

Performance tricks that actually mattered

·Batch eth_getLogs calls across block ranges (500 blocks per request) instead of one-at-a-time.
·Use ON CONFLICT DO UPDATE (upsert) instead of checking existence before insert — cuts write latency in half.
·Partition the events table by month. Queries scoped to recent data only scan one partition.
·Maintain a cursor table that records the last indexed block. On restart, resume from there instead of re-indexing everything.
·For multi-chain setups, run one indexer worker per chain in separate processes to avoid head-of-line blocking.

When to use an off-the-shelf indexer

Building your own indexer makes sense when you have custom business logic baked into the transformation step — for example, computing a user's net position across multiple contract events. For simpler cases, tools like The Graph, Ponder, or Envio will get you there faster with less infrastructure to maintain.

The rule of thumb: if you can define your data needs as a subgraph schema, use The Graph. If you need joins across off-chain data, custom aggregations, or real-time mutation callbacks — roll your own.

The source for the Curvance indexer is not public, but the architecture described here is production-tested. If you're building something similar and want to talk through the design, reach out.