Case StudiesMeridian Financial

FinTechMeridian Financial

10x Throughput for Digital Banking

We migrated Meridian's 8-year-old Rails monolith to a Go microservices architecture with full Kubernetes orchestration. Result: deployment time dropped from 40 minutes to 6, and outages essentially disappeared.

8 months (Feb 2025 – Oct 2025)

3 backend engineers, 1 DevOps engineer, 1 platform architect, 1 PM

GoKubernetesAWSKafkaPostgreSQLCI/CD

99.97%

Uptime

Up from 98.2% before migration (3 outages/month → 0 in last 6 months)

47ms

p95 Latency

Down from 890ms under the Rails monolith during peak hours

42.3%

Infrastructure Cost Reduction

Right-sized Kubernetes pods replaced over-provisioned EC2 instances

6 min

Deploy Time

Down from 40-minute rolling restarts with zero-downtime canary deploys

The Challenge

What We Were Up Against

Meridian Financial had been running a Ruby on Rails monolith for eight years. What started as a fast MVP had grown into a 340K-line codebase handling 2.3 million daily transactions. The single PostgreSQL instance was hitting connection pool limits during market hours, Sidekiq background workers were single-threaded and backing up queues by 40+ minutes during peak load, and the deployment process required a 40-minute rolling restart that the team could only risk running once a week — usually on Sunday nights. The breaking point came when three production outages in a single month triggered an SLA review from their largest institutional client.

Database Bottleneck

Single PostgreSQL instance handling both OLTP and reporting queries. Connection pool maxing out at 2.3M daily transactions, causing request queuing during US market hours (9:30 AM – 4:00 PM ET).

Deployment Risk

40-minute rolling restarts with no canary deployment capability. Every deploy was all-or-nothing, limiting releases to weekly Sunday windows and making hotfixes terrifying.

Queue Backlog

Sidekiq workers processing transaction reconciliation were single-threaded. During peak hours, the queue would back up by 40+ minutes, delaying settlement confirmations to partner banks.

Observability Gaps

No distributed tracing, no structured logging. When outages hit, the team was SSH-ing into production boxes and tailing log files to diagnose issues.

Constraints & Requirements

Zero downtime during migration — $12M+ daily transaction volume couldn't be interrupted

SOC 2 Type II compliance must be maintained throughout

Existing API contracts with 14 partner bank integrations couldn't change

Team of 6 Rails developers needed to be productive during the transition

Our Approach

How We Built It

We chose a strangler fig pattern over a big-bang rewrite. Each service was extracted incrementally behind a facade, letting the legacy system continue operating while new services came online. Go was selected over Rust — the team needed a language with a gentler learning curve and faster hiring pipeline, and Go's goroutine model was a natural fit for the concurrent transaction processing workload.

Audit & Foundation

Weeks 1–6

Mapped all domain boundaries in the monolith. Set up Kubernetes cluster, CI/CD pipelines, observability stack (Datadog APM + structured logging), and the API gateway that would route traffic between legacy and new services.

Domain boundary map with 7 identified service candidates

EKS cluster with Terraform IaC

GitHub Actions CI pipeline with 11-minute full suite

API gateway with traffic splitting capability

Auth & Identity Service

Weeks 7–12

Extracted the authentication service first — it had the clearest boundaries and lowest risk. Implemented JWT-based auth with refresh token rotation, replacing the legacy cookie-based session system.

Go auth service with JWT + refresh token rotation

Session migration script (zero-downtime cutover for 180K active sessions)

Rate limiting and brute-force protection

Integration tests against all 14 partner bank auth flows

Transaction Processing Engine

Weeks 13–22

The core extraction. Built the transaction processing service in Go with Apache Kafka for event streaming. Chose Kafka over RabbitMQ because we needed message replay capability for transaction audit trails — a hard SOC 2 requirement.

Go transaction service handling 2.3M+ daily transactions

Kafka event bus with 7-day retention for audit replay

CQRS pattern separating read/write paths

Dual-write period with consistency verification jobs

Data Migration & Read Replicas

Weeks 23–28

Migrated from single PostgreSQL to a primary + 2 read replica setup. Separated reporting queries onto replicas, freeing the primary for transactional writes. Used pglogical for zero-downtime replication setup.

PostgreSQL primary + 2 read replicas

Reporting queries routed to replicas (eliminated query contention)

Automated backup verification with point-in-time recovery testing

Database connection pooling with PgBouncer

Legacy Decommission & Optimization

Weeks 29–34

Decommissioned remaining Rails routes, optimized Kubernetes resource allocation based on 6 weeks of production metrics, and implemented auto-scaling policies.

Full traffic cutover to Go services

HPA auto-scaling tuned to transaction volume patterns

Runbook documentation for on-call team

Load testing to 5x current peak (validated headroom)

Key Features

What We Built

Event-Driven Transaction Pipeline

Replaced synchronous transaction processing with a Kafka-based event pipeline that handles 2.3M+ daily transactions with sub-50ms latency.

Technical Detail

Each transaction flows through a 3-stage pipeline: validation → processing → settlement. Kafka partitioning by account ID ensures ordering guarantees per account while allowing parallel processing across accounts. Dead letter queues with automatic retry handle transient failures.

Zero-Downtime Deployment Pipeline

Canary deployments with automated rollback reduced deploy time from 40 minutes to 6 minutes and eliminated deployment-related outages entirely.

Technical Detail

ArgoCD manages GitOps-based deployments to EKS. Each deploy starts with 5% canary traffic, auto-promotes after 3 minutes if error rate stays below 0.1%, and auto-rolls back if p99 latency exceeds 200ms. The full rollout completes in 6 minutes.

CQRS Read/Write Separation

Separated read and write paths to eliminate database contention during peak trading hours.

Technical Detail

Write operations hit the primary PostgreSQL instance through the Go transaction service. Read operations (dashboards, reports, partner queries) are served from eventually-consistent read replicas with a max lag of 150ms. The API gateway routes based on HTTP method and path prefix.

Distributed Tracing & Alerting

Full request tracing from API gateway through all microservices, with intelligent alerting that reduced mean time to detection from 23 minutes to under 90 seconds.

Technical Detail

Datadog APM with OpenTelemetry instrumentation. Every request gets a trace ID propagated through Kafka headers. PagerDuty integration with escalation policies based on severity. Custom Datadog monitors for transaction settlement SLAs.

Tech Stack

Why We Chose What We Chose

Backend

Go 1.22

Goroutine concurrency model ideal for transaction processing. Faster hiring than Rust, with comparable performance for this workload.

Apache Kafka

Needed message replay for SOC 2 audit trails. RabbitMQ doesn't support replay without additional tooling.

PostgreSQL 16

Existing data model was relational. Migrating to a different database type would have doubled the project timeline.

gRPC

Inter-service communication needed strong typing and was latency-sensitive. REST was reserved for external-facing APIs only.

Infrastructure

AWS EKS

Client was already on AWS. EKS gave us managed Kubernetes without the overhead of self-hosting the control plane.

Terraform

Infrastructure as code for reproducible environments. Chose over Pulumi because the ops team already knew HCL.

ArgoCD

GitOps-based deployments with built-in canary analysis. Preferred over Flux for its UI and rollback capabilities.

PgBouncer

Connection pooling to handle 2.3M daily transactions without exhausting PostgreSQL's connection limit.

Observability

Datadog APM

End-to-end distributed tracing with Kafka span propagation. Client already had a Datadog contract.

PagerDuty

On-call rotation management with escalation policies. Integrated with Datadog for automated incident creation.

OpenTelemetry

Vendor-agnostic instrumentation. If the client switches from Datadog, the instrumentation stays.

CI/CD

GitHub Actions

Client's code was already on GitHub. Actions eliminated the need for a separate CI server.

Trivy

Container image scanning for CVE detection, required for SOC 2 compliance.

SonarQube

Static analysis for code quality gates. Blocks merges if coverage drops below 85%.

Impact

Before & After

Metric

Before

After

Deploy Frequency

1x/week (Sunday nights)

14x/week average

Deploy Duration

40 minutes

6 minutes

Incident Response (MTTD)

23 minutes

87 seconds

Transaction Latency (p95)

890ms

47ms

Monthly Outages

3 average

0 in last 6 months

Infrastructure Cost

$38K/month

$21.9K/month

Engineering Quality

How We Ship

Test Coverage

87% unit, 94% integration on critical transaction paths

CI/CD Pipeline

GitHub Actions — 11-minute full suite with parallelized test shards

Monitoring

Datadog APM + PagerDuty with 90-second MTTD

Deploy Frequency

14 deploys/week average via ArgoCD canary

“The team didn't just rewrite our stack — they understood why our old architecture failed under load and designed something that actually scales with our transaction volume. The strangler fig approach meant we never had to do a scary big-bang cutover. Our institutional clients noticed the latency improvement before we even told them about the migration.”

David Park

CTO, Meridian Financial

Ongoing

What's Next

Extracting the notification service (currently still in the Rails monolith)

Implementing multi-region failover for disaster recovery

Building a real-time fraud detection pipeline on the Kafka event stream

MORE BUILDLESS BREAK

Start building with a team that cares. No credit card required.

Get Started