Outbox Pattern
Implement the transactional outbox pattern to reliably publish events to Kafka after a database write.
The situation
Your payments team just shipped a feature: every order placed by a customer gets mirrored into a ledger service for finance reporting. The integration looks simple— the order service publishes an order.created event to Kafka, and the ledger service consumes it. Everything worked fine during testing.
Then in production order-service's PostgreSQL server experienced an outage. During the outage, every POST /orders returned a 500. After recovering from this next day Finance team is asking why their ledger totals don't match order totals. Finance ledger seems to have orders which doesn't exist in orders database. Ledger entries are immuable because of several accounting and legal reasons. Finance team can't simply delete these phantom entries. Finance team should now have to manually create thousands of credit notes manually.
Your job: debug how can this happened and fix it so that order-service and ledger-service becomes transactional under any circumstance?
Your dev environment on nurburg.dev
The environment has two Node.js services, two Postgres databases, and a Kafka cluster pre-provisioned for you.
| Resource | Detail |
| ---------------- | ----------------------------------------------- |
| order-service | port 3000, database orderdb on pg-orders |
| ledger-service | port 4000, database ledgerdb on pg-ledger |
| Kafka topic | outbox-events |
Starting the services
Run each service in a separate terminal from its directory:
# order-service
cd order-service && npm run dev
# ledger-service
cd ledger-service && npm run dev
Connecting to the databases
# order-service database
cd order-service && npm run psql
# ledger-service database
cd ledger-service && npm run psql
Schema files
⚠️ BEFORE STARTING DEVELOPMENT ⚠️
order-service/schema.sqlhas the schema fororder-service. Apply the schema before development- usingnpm run psql -- -f schema.sqlledger-service/schema.sqldefines theledger_entriestable. Apply the schema before development- usingnpm run psql -- -f schema.sql- existing table schemas shouldn't be altered in any way. New tables could be added to schema.sql. For development you should manually create these tables by connecting to Postgres from command-line tool as describe in the preceding section.
Hitting the API
You should use curl commands to check for API behavior
# Create an order
curl -X POST http://localhost:3000/orders \
-H "Content-Type: application/json" \
-d '{"customerId": "customer-1", "amount": 100, "currency": "USD"}'
# Check order total
curl http://localhost:3000/orders/total
# Check ledger total
curl http://localhost:4000/ledger/total
Observable symptoms
The current implementation publishes to Kafka before committing the transaction. Reproduce the failure:
- Start both services
order-serviceandledger-serviceusing commands from the section "Starting the services" . Try creating few orders using curl. You'll get few failures in the begining and then all succcesses. - Compare orders total and ledgers total. You'll notice orders total is less than ledgers total.
The task
Implement the transactional outbox pattern in order-service:
- Add an
outboxtable toorder-service/schema.sql. Each row holds the serialised event payload and apublishedflag. Remember to create this table in order-service's database by connecting to the database.` - In
POST /orders, write theordersrow and anoutboxrow in the same transaction. Remove the synchronous Kafkaproducer.sendfrom the request path entirely. - Add a background relay process using a
setIntervalloop in the same process— that polls unpublished outbox rows, produces them to Kafka, and marks thempublished = true. ledger-servicemust consumeoutbox-eventsand persist entries toledger_entriesso thatGET /ledger/totalreflects all processed orders.
🏆 Success criteria- After posting 10 orders and waiting a few seconds, even if you observe failures while creating orders the total should match after a few seconds. 🏆
Constraints
- Use
nurburg-libsfor all database (PgPool) and Kafka (Kafka) clients. Direct use ofpgorkafkajsbypasses test harness instrumentation. This'll break scoring. - Don't modify anything inside
.nurburgdev/— that directory is owned by the eval engine. - Don't alter the
orderstable schema inorder-service/schema.sql; you may add new tables. - Don't alter the
ledger_entriestable schema inledger-service/schema.sql. GET /healthcheckon both services must always return200 OK, regardless of Kafka or database state.
Evaluation criteria
- Functional tests The test suite posts 10 orders while simulating failures, waits 5 seconds. Then it checks if
ledger_serviceandorder_servicetotals are matching.
Hints
Hint 1— outbox table shape
Your outbox table needs at minimum: a primary key, the serialised event payload, a published boolean defaulting to false, and a created_at timestamp. Something like:
CREATE TABLE outbox (
id TEXT PRIMARY KEY DEFAULT gen_random_uuid()::TEXT,
payload JSONB NOT NULL,
published BOOLEAN NOT NULL DEFAULT false,
created_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
Hint 2— safe polling under concurrency
If you ever run multiple relay instances, two workers could pick up the same row. Use SELECT ... FOR UPDATE SKIP LOCKED to grab a batch of unpublished rows without contention:
SELECT * FROM outbox
WHERE published = false
ORDER BY created_at
LIMIT 10
FOR UPDATE SKIP LOCKED;
What you're actually learning
- Transactional outbox pattern Writing events to a database table inside the same transaction as the domain record, then relaying them asynchronously. Read more
- Dual-write problem Why writing to two systems (DB + message broker) in sequence is fundamentally unreliable.
- At-least-once delivery The relay may publish a message more than once if it crashes between send and mark-published. Consumers should be idempotent.
FOR UPDATE SKIP LOCKEDA Postgres primitive that makes polling queues safe under concurrent workers without explicit locking overhead.