How Blockchain Indexers Work: Building Real-Time DeFi Data Pipelines
On-chain data is relentless. Every block adds thousands of events — swaps, deposits, liquidations, transfers — and if you're building a DeFi product, you need to query that data fast without hammering a node. That's what blockchain indexers solve.
The problem with raw RPC calls
Every Ethereum node exposes a JSON-RPC interface. You can call eth_getLogs, eth_call, or eth_getTransactionReceipt directly. But this approach falls apart quickly at scale. Fetching the full event history for a smart contract means paginating across potentially millions of blocks. Rate limits kick in. Latency spikes. And every new page load triggers the same expensive queries.
The naive solution — cache it in Redis — only delays the problem. What you actually need is a persistent, indexed copy of the blockchain state relevant to your application. That's an indexer.
What an indexer actually does
At its core, an indexer is a process that listens to a blockchain node in real time, decodes raw event logs into structured records, and writes them to a database your application can query normally.
The pipeline has three stages:
- 01.Ingestion — subscribe to new blocks via WebSocket or poll via HTTP. For each block, fetch logs matching your contract addresses and event signatures.
- 02.Decoding — raw logs are ABI-encoded hex. Use the contract ABI to decode them into typed structs. A
Transfer(address,address,uint256)event becomes a proper object withfrom,to, andamount. - 03.Storage — write the decoded events to PostgreSQL (or MongoDB if you prefer schema flexibility). Index on block number, contract address, and relevant fields.
Building the Curvance indexer
When I built the indexer for Curvance — a DeFi lending/borrowing protocol — the requirement was simple: keep a fresh, queryable snapshot of all user positions, market states, and liquidation events across multiple chains.
The stack was Node.js + Ethers.js + PostgreSQL. Here's the rough architecture that worked:
The trickiest part wasn't the happy path — it was handling chain reorgs. When a block gets reorganized out of the canonical chain, any events you indexed from it need to be rolled back. Solution: never delete indexed events, instead mark them with a status column and always query only confirmed events (blocks older than 12 confirmations).
Performance tricks that actually mattered
- ·Batch eth_getLogs calls across block ranges (500 blocks per request) instead of one-at-a-time.
- ·Use ON CONFLICT DO UPDATE (upsert) instead of checking existence before insert — cuts write latency in half.
- ·Partition the events table by month. Queries scoped to recent data only scan one partition.
- ·Maintain a cursor table that records the last indexed block. On restart, resume from there instead of re-indexing everything.
- ·For multi-chain setups, run one indexer worker per chain in separate processes to avoid head-of-line blocking.
When to use an off-the-shelf indexer
Building your own indexer makes sense when you have custom business logic baked into the transformation step — for example, computing a user's net position across multiple contract events. For simpler cases, tools like The Graph, Ponder, or Envio will get you there faster with less infrastructure to maintain.
The rule of thumb: if you can define your data needs as a subgraph schema, use The Graph. If you need joins across off-chain data, custom aggregations, or real-time mutation callbacks — roll your own.
The source for the Curvance indexer is not public, but the architecture described here is production-tested. If you're building something similar and want to talk through the design, reach out.