Database Architecture: From Blockchain Events to Queryable Intelligence

Behind EF-Map's responsive interface sits a sophisticated database architecture that transforms raw blockchain events into actionable intelligence. We're indexing millions of on-chain transactions, aggregating player activity, and serving queries in milliseconds—all while staying synchronized with a blockchain that processes thousands of events per hour.

This post pulls back the curtain on our database design: how we ingest blockchain data, optimize for spatial queries, and serve frontend requests at scale.

The Problem: Blockchain Data is Slow and Expensive

EVE Frontier runs on a blockchain, which creates exciting opportunities (transparency, ownership, programmability) but also technical challenges:

1. RPC latency: Blockchain queries take 200-500ms vs. 1-5ms for traditional databases

2. Rate limits: Public RPC endpoints limit requests to ~100/sec

3. No spatial indexes: Blockchains don't have PostGIS or geospatial query support

4. Event ordering: Transactions arrive out of order due to block reorganizations

We needed a local database that mirrors blockchain state but optimizes for our specific queries: "Which systems had mining activity in the last hour?" or "Show all gates owned by this corporation."

Solution: PostgreSQL + Continuous Indexing

We built a three-layer architecture:

Layer 1: Primordium Indexer (Blockchain → Postgres)

We run an indexer based on Lattice's Primordium framework. It subscribes to Smart Assembly events on-chain and writes them to Postgres in near real-time.

The indexer handles:

Event decoding: Parse raw transaction logs into structured data
Normalization: Convert blockchain addresses to readable entity IDs
Enrichment: Add metadata (system names, corporation info) from reference tables
Deduplication: Handle block reorgs and ensure exactly-once processing

Here's a simplified schema for Smart Gate events:

CREATE TABLE smart_gate_events (
    event_id BIGSERIAL PRIMARY KEY,
    tx_hash VARCHAR(66) NOT NULL,
    block_number BIGINT NOT NULL,
    timestamp TIMESTAMPTZ NOT NULL,
    event_type VARCHAR(50), -- 'created', 'acl_updated', 'destroyed'
    
    -- Gate identification
    gate_id VARCHAR(66) NOT NULL,
    system_from VARCHAR(66),
    system_to VARCHAR(66),
    
    -- Ownership and access
    owner_id VARCHAR(66),
    acl_characters TEXT[],
    acl_corporations TEXT[],
    
    -- Metadata
    gate_name VARCHAR(255),
    metadata JSONB,
    
    UNIQUE(tx_hash, event_type, gate_id)
);

-- Indexes for common queries
CREATE INDEX idx_gate_events_gate ON smart_gate_events(gate_id, timestamp DESC);
CREATE INDEX idx_gate_events_system ON smart_gate_events(system_from);
CREATE INDEX idx_gate_events_owner ON smart_gate_events(owner_id);
CREATE INDEX idx_gate_events_block ON smart_gate_events(block_number);

The indexer processes blocks at ~30 blocks/sec, keeping our database <5 seconds behind chain tip under normal load.

Layer 2: Reference Data (Static Star Map)

Not all data comes from the blockchain. Star positions, system names, region boundaries—these are game constants that rarely change. We load them once from CCP's Static Data Export (SDE):

CREATE TABLE solar_systems (
    system_id VARCHAR(66) PRIMARY KEY,
    system_name VARCHAR(100) NOT NULL,
    region_id VARCHAR(66),
    constellation_id VARCHAR(66),
    
    -- 3D position (light-years)
    pos_x DOUBLE PRECISION,
    pos_y DOUBLE PRECISION,
    pos_z DOUBLE PRECISION,
    
    -- Spatial index for proximity queries
    position geometry(POINT, 4326),
    
    -- Game mechanics
    security_status NUMERIC(3,2),
    star_type VARCHAR(50)
);

-- PostGIS spatial index
CREATE INDEX idx_systems_position ON solar_systems USING GIST(position);

The geometry(POINT, 4326) column enables spatial queries using PostGIS:

-- Find all systems within 50 light-years of a target
SELECT s2.system_name, 
       ST_Distance(s1.position, s2.position) as distance_ly
FROM solar_systems s1
CROSS JOIN solar_systems s2
WHERE s1.system_id = '0x1234...'
  AND ST_DWithin(s1.position, s2.position, 50)
ORDER BY distance_ly;

This runs in ~10ms for typical queries, even with 200,000+ systems. Without PostGIS, we'd need to calculate Euclidean distance in application code (slow and error-prone).

Layer 3: Aggregation Tables (Hourly Rollups)

Raw event data is too granular for frontend queries. We run hourly aggregation jobs that pre-compute statistics:

CREATE TABLE system_activity_hourly (
    system_id VARCHAR(66),
    hour TIMESTAMPTZ,
    
    -- Activity metrics
    mining_events INTEGER DEFAULT 0,
    destruction_events INTEGER DEFAULT 0,
    deployment_events INTEGER DEFAULT 0,
    
    -- Economic metrics
    mining_value_isk DECIMAL(20,2) DEFAULT 0,
    destruction_value_isk DECIMAL(20,2) DEFAULT 0,
    
    -- Player counts
    unique_characters INTEGER DEFAULT 0,
    unique_corporations INTEGER DEFAULT 0,
    
    PRIMARY KEY (system_id, hour)
);

-- Materialized view for region-level rollups
CREATE MATERIALIZED VIEW region_activity_daily AS
SELECT 
    ss.region_id,
    DATE(sah.hour) as day,
    SUM(sah.mining_events) as mining_events,
    SUM(sah.mining_value_isk) as mining_value,
    SUM(sah.unique_characters) as unique_characters
FROM system_activity_hourly sah
JOIN solar_systems ss ON sah.system_id = ss.system_id
GROUP BY ss.region_id, DATE(sah.hour);

-- Refresh materialized view every hour
REFRESH MATERIALIZED VIEW region_activity_daily;

Now when the frontend asks "show region activity for the last 30 days," we query the materialized view (30 rows per region) instead of millions of raw events. Query time drops from 5 seconds to 20ms.

Query Optimization: The 95% Rule

We profiled our database access and found 95% of queries fell into 5 patterns:

Get current state for a single entity (gate, system, character)
List entities by region or constellation
Aggregate activity for last N hours
Find entities within distance of a point
Search by name or owner

We optimized aggressively for these patterns:

Pattern 1: Entity State → Maintain a current_state table updated on each event:

CREATE TABLE smart_gate_current (
    gate_id VARCHAR(66) PRIMARY KEY,
    -- Latest state (denormalized for speed)
    system_from VARCHAR(66),
    system_to VARCHAR(66),
    owner_id VARCHAR(66),
    acl_characters TEXT[],
    last_updated TIMESTAMPTZ,
    last_event_id BIGINT REFERENCES smart_gate_events(event_id)
);

Pattern 2: Spatial Listing → B-tree indexes on region/constellation columns

Pattern 3: Time Series → Pre-aggregated hourly/daily tables

Pattern 4: Proximity → PostGIS spatial indexes

Pattern 5: Search → Full-text search indexes:

CREATE INDEX idx_systems_name_search 
    ON solar_systems 
    USING GIN(to_tsvector('english', system_name));

These optimizations brought our p95 query latency from 800ms to 15ms—a 53x improvement.

Scaling: Read Replicas and Connection Pooling

As traffic grew, we hit connection limits. Postgres defaults to ~100 concurrent connections, but our worker processes needed more.

We implemented PgBouncer for connection pooling:

[databases]
efmap = host=localhost port=5432 dbname=efmap

[pgbouncer]
pool_mode = transaction
max_client_conn = 1000
default_pool_size = 20
reserve_pool_size = 5

This lets 1000 clients share 20 database connections. Connections are recycled after each transaction, dramatically reducing resource usage.

For read-heavy workloads (frontend queries), we added a read replica that lags ~2 seconds behind primary:

// Route read queries to replica
const dbRead = new Pool({ 
    host: 'postgres-replica.internal', 
    max: 20 
});

// Route writes to primary
const dbWrite = new Pool({ 
    host: 'postgres-primary.internal', 
    max: 5 
});

This offloaded 80% of queries from the primary, improving write throughput.

Real-Time Monitoring: Grafana Dashboards

We visualize database health in Grafana:

Indexer lag: Seconds behind chain tip (alert if >30s)
Query performance: p50/p95/p99 latency by query type
Connection usage: Active connections vs. pool size
Table sizes: Disk usage by table (detect unbounded growth)
Replication lag: Primary → replica delay

This observability caught several production issues:

Runaway query causing CPU spike (missing index on new JOIN)
Materialized view refresh taking too long (added partial index)
Indexer stuck on malformed transaction (added error handling)

Lessons for Blockchain-Indexed Databases

Building this system taught us several key principles:

1. Index locally, serve globally. Never query the blockchain directly from user-facing APIs. Always use a local database.

2. Spatial queries need spatial indexes. PostGIS is a game-changer for geography-based games.

3. Aggregate aggressively. Raw events are for auditing, not querying. Pre-compute statistics.

4. Connection pooling is essential. Database connections are expensive—reuse them.

5. Monitor everything. Blockchain indexing is complex—you need visibility to diagnose issues quickly.

Future Enhancements

We're planning several database improvements:

TimescaleDB: Migrate time-series data to a specialized TSDB for better compression and performance
GraphQL API: Replace REST with GraphQL for flexible client queries
Read-through cache: Add Redis layer for frequently accessed entities
Blockchain archival: Move old blocks (>6 months) to cold storage to reduce primary DB size

Our database architecture is the invisible foundation that makes EF-Map feel fast and responsive. Users see instant search results, smooth map interactions, and up-to-date statistics—but behind the scenes, it's hundreds of optimized queries, indexes, and aggregation pipelines working in concert.

Interested in the technical details? We're considering open-sourcing our indexer schema and query patterns—let us know if that would be valuable for your own blockchain gaming projects.

Smart Gate Authorization: Blockchain-Powered Access Control - How the snapshot exporter transforms indexed blockchain data into KV storage
Region Statistics: Visualizing Player Activity Across New Eden - The hourly aggregation jobs that roll up blockchain events into geographic insights
Cloudflare KV Optimization: Reducing Costs by 93% - How we balance Postgres queries with KV caching for optimal performance