All articles
ArticleMarch 6, 2026

Real-time Product Analytics with Postgres → ClickHouse → dbt → Grafana

By AI Guy in LA

Real-time Product Analytics with Postgres → ClickHouse → dbt → Grafana

If you run your app on Postgres and need sub-second analytics without crushing OLTP, this stack works reliably in production:
– Ingest: Postgres → ClickHouse (MaterializedPostgreSQL or Kafka + Debezium)
– Transform: dbt (dbt-clickhouse)
– Serve: Grafana dashboards + alerts

Use cases
– Funnel and retention analysis
– Feature usage and cohort metrics
– Operational dashboards (latency, errors, throughput)
– Finance-lite rollups (orders, MRR, refunds)

Reference architecture
– Source DB: Postgres 14+ on managed cloud (RDS/Cloud SQL)
– Analytics store: ClickHouse Cloud or self-managed cluster
– Replication: ClickHouse MaterializedPostgreSQL (for simplicity) or Kafka (for multi-sink / heavy transforms)
– Transform: dbt runner (GitHub Actions, Dagster, or Airflow)
– Viz: Grafana with ClickHouse plugin
– Storage/lineage: S3/GCS parquet exports (optional but recommended)
– Secrets: environment or Vault; network via PrivateLink/VPC peering

Table design in Postgres
– Prefer narrow event table + dimension tables
– Use UUID or bigint IDs; avoid wide JSONB for hot paths
– For events:
– event_id (UUID, PK)
– user_id
– occurred_at (timestamptz)
– event_name (text)
– properties (jsonb) — sparse, normalized over time
– For orders/subscriptions:
– immutable facts + status changes as events
– avoid in-place updates on hot rows

ClickHouse schema (denormalized for reads)
– Use MergeTree with time + shard key
– Example:

CREATE TABLE analytics.events
(
event_id UUID,
user_id String,
event_name LowCardinality(String),
occurred_at DateTime64(3, ‘UTC’),
properties_json JSON,
ingest_ts DateTime DEFAULT now()
)
ENGINE = MergeTree
PARTITION BY toYYYYMM(occurred_at)
ORDER BY (event_name, occurred_at, user_id)
SETTINGS index_granularity = 8192;

– Add projections or materialized views for frequent queries (daily rollups, funnels)

Option A: Direct Postgres → ClickHouse via MaterializedPostgreSQL
Good when you control Postgres and schema churn is moderate.

Steps
1) Enable logical replication in Postgres:
– rds.logical_replication = 1 (RDS) or wal_level = logical
– Create a replication user with REPLICATION
2) In ClickHouse:

CREATE DATABASE pg_repl ENGINE = MaterializedPostgreSQL(
‘postgres-host:5432’, ‘app_db’, ‘replicator_user’, ‘REDACTED’,
‘public’, — schema
16 — max threads
);

— Exposed tables appear under pg_repl.

3) Hydrate analytics tables from the replicated ones:
CREATE TABLE analytics.events AS
SELECT
event_id,
user_id::String,
event_name,
occurred_at,
properties as properties_json
FROM pg_repl.events
ENGINE = MergeTree
PARTITION BY toYYYYMM(occurred_at)
ORDER BY (event_name, occurred_at, user_id);

Notes
– MaterializedPostgreSQL tails WAL and applies row-level changes
– For deletes/updates, ensure primary keys exist in Postgres
– For high write volume, separate heavy tables into their own ClickHouse databases to parallelize apply

Option B: Postgres → Kafka (Debezium) → ClickHouse
Use when you need multi-sink, schema registry, or custom buses.

– Debezium captures CDC to Kafka with Avro/JSON schema
– ClickHouse reads via Kafka engine + materialized views
– Handle upserts with ReplacingMergeTree or CollapsingMergeTree keyed by primary_id + version/flag

dbt transforms (dbt-clickhouse)
– Create semantic models and rollups; keep them small and incremental
– Example model: sessions_daily.sql

{{
config(
materialized=’incremental’,
unique_key=’session_day_user’,
incremental_strategy=’delete+insert’
)
}}

WITH sessions AS (
SELECT
user_id,
toStartOfDay(occurred_at) AS session_day,
countIf(event_name = ‘session_start’) AS starts,
countIf(event_name = ‘session_end’) AS ends
FROM {{ source(‘analytics’, ‘events’) }}
{% if is_incremental() %}
WHERE occurred_at >= date_sub(day, 3, today())
{% endif %}
GROUP BY user_id, session_day
)
SELECT
concat(user_id, ‘_’, toString(session_day)) AS session_day_user,
user_id,
session_day,
starts,
ends
FROM sessions;

– Schedule:
– Small models every 5 minutes
– Heavy rollups hourly
– Backfills in off-peak

Grafana dashboards
– Install ClickHouse data source
– Panels:
– Events per minute with 1m/5m rollups
– DAU/WAU/MAU with asof joins
– Feature adoption by cohort week
– p50/p95 latency by endpoint
– Error rate by service/version
– Add alert rules on thresholds (error_rate > 2%, DAU drop > 20% d/d)

Performance and cost tips
– ClickHouse:
– Partition by month, order by (event_name, occurred_at, user_id)
– Use LowCardinality for strings
– Keep JSON for long tail props, but extract top N to typed columns for filters
– Set max_threads per query; use quotas for noisy tenants
– dbt:
– Favor incremental + small windows
– Stage raw to clean, then marts; limit cross-joins
– Postgres:
– Keep CDC slot monitored; alert on replication lag
– Index primary keys; avoid mass vacuum stalls
– Storage:
– TTL old raw events to S3 via ClickHouse S3 TTL or periodic exports
– Costs:
– Use ClickHouse Cloud autoscaling with sane concurrency caps
– Compress JSON and avoid SELECT *

Reliability and ops
– Health checks:
– Replication delay (Postgres WAL LSN vs ClickHouse apply lag)
– Kafka consumer lag (if using Debezium)
– dbt run success rates and duration SLAs
– Schema changes:
– Additive first (new columns), then backfill, then drop
– Use dbt contracts/tests and column-level lineage
– Dedupe:
– For CDC upserts, use ReplacingMergeTree with a version column
– For at-least-once ingestion, dedupe on event_id in dbt staging
– Backfills:
– Snapshot Postgres to S3 (pg_dump or logical snapshot), bulk copy into ClickHouse, then reattach WAL

Security
– Restrict replication role to specific DB and tables
– Private networking between services
– Rotate secrets; limit Grafana viewer vs editor roles
– PII handling: hash/email_tokenize in ingestion; store raw PII in a separate restricted table

Quick start checklist
– Stand up ClickHouse Cloud and Grafana
– Enable Postgres logical replication
– Create MaterializedPostgreSQL database in ClickHouse
– Define analytics.events table and initial projections
– Configure dbt-clickhouse; build staging and marts
– Publish Grafana dashboards and alerts
– Add monitors for lag, error rates, and costs

What this delivers
– Sub-second reads on billions of events
– Minimal load on Postgres OLTP
– Clear, versioned transforms
– Actionable dashboards and alerts for product and ops

AI Guyin LA

Practical AI engineering and consulting for real organizations in Los Angeles. Websites, chatbots, workflow automation, and custom software — built with privacy-first guardrails.

Based in Los Angeles, California

Get a free audit

30 minutes. No pitch. Three practical recommendations for your site and workflows.

Book your audit

© 2026 AI Guy in LA. Built locally in Los Angeles, CA.

Privacy-first · Compliance-aware · Human in the loop