Engineering · Distributed systems

Why I picked Redis Streams over Phoenix PubSub for InstaEscrow

Phoenix PubSub is the obvious default for Elixir-native real-time. Three things made me reach for Redis Streams instead. The trade is worth it.

19 November 2024·6 min read

InstaEscrow is the M-Pesa escrow product I built solo for Kenya's social commerce market. Buyers fund a hold; sellers ship; buyers confirm; funds release. The whole flow is event-driven on the backend. Every state change emits something downstream services and clients listen to: notifications, ledgers, mobile push, dispute timelines, the AI analytics pipeline. All of it sits on top of one question: what's the right transport for cross-service events?

For an Elixir-first stack, Phoenix PubSub is the obvious default. It's local, fast, BEAM-native, and ships with Phoenix. I started there. Two weeks in, I ripped it out and replaced it with Redis Streams. Here's why.

The three things PubSub couldn't do

1. Persistence across consumer downtime

Phoenix PubSub is a fan-out broker, not a log. If you publish to a topic and the only listener is a worker that's currently restarting, the message disappears. Forever. There's no replay. There's no durability. For real-time UI updates that's fine. A missed event re-renders on the next interaction. For a financial ledger entry, it's catastrophic.

Redis Streams is a log. XADD appends. XREAD consumes. A worker that crashes after reading but before acknowledging will read the same entry again on restart. Combined with consumer groups and explicit XACK, this gives me at-least-once delivery with idempotent handlers, which is exactly the contract a payment system needs.

2. Reconnection replay for SSE clients

InstaEscrow's mobile app and dashboard hold long-lived SSE connections. On a backgrounded phone the connection drops every few minutes. Reconnecting clients send a Last-Event-ID header and expect to receive everything they missed during the gap.

With Phoenix PubSub, replay is your problem to build. You'd need to log events to Postgres or some other store, query on reconnect, merge with the live PubSub feed, deduplicate. That's an entire subsystem.

With Redis Streams, replay is the primitive. Each entry has a Redis-assigned ID like 1714405391000-0. The reconnect becomes XREAD STREAMS user:42 1714405391000-0. Redis returns everything since that ID, then I switch the connection back to the blocking XREAD BLOCK for new events. No deduplication needed because IDs are monotonic.

3. Operability

Phoenix PubSub is opaque. There's no “show me the queue depth,” no pending message count, no way to tell whether a slow consumer is about to OOM the producer. You're staring at BEAM observability through the wrong end of the telescope.

Redis Streams is loud. XLEN shows the stream length. XPENDING shows messages delivered to a consumer group but not yet acknowledged, your in-flight work, with idle time per message. XINFO STREAM shows the last delivered ID per group. Every Redis-aware monitoring stack already understands these primitives. When something is slow, I know within seconds where the lag is.

The trade

I lose two things by leaving PubSub.

One extra hop. A Phoenix process publishes to Redis instead of broadcasting in-memory. For a hot single-region cluster this is sub-millisecond; Redis runs on the same VPC. For cross-service events, the latency is dominated by the eventual Phoenix Channel push back to the client anyway, so the marginal cost is invisible.

An operational dependency. Redis becomes critical infrastructure. If Redis is down, eventing stops. I accept this because Redis was already critical for caching and rate-limiting, and because pgBackRest gives me continuous WAL archiving so the rest of the system survives Redis being temporarily unhealthy.

Where this falls down

Don't use Redis Streams for true broadcast (fan-out to thousands of connected clients in a single region). PubSub still wins there because every subscriber gets the message in O(1) without polling. Specifically: I use Phoenix PubSub for in-region client fan-out after a Streams consumer has read the canonical event. Streams is the source of truth; PubSub is the local distribution layer.

The shape that fell out

InstaEscrow ended up with a clean two-layer model: services publish domain events to Redis Streams (one stream per aggregate type), a per-user fan-out worker picks up events relevant to a user and broadcasts them locally via Phoenix PubSub to the user's connected SSE/WS sessions. New services subscribe to the streams without any coordination with the publishers. Replay just works.

That's the answer I'd give in an interview: PubSub is great for local fan-out, Streams is great for cross-service eventing, and you want both.