Tutorial Oct 24, 2023 · By Alex Chen

Canary Deployments: The Safest Way to Ship to Production

A practical guide to incrementally rolling out changes and catching regressions before they hit your entire user base.

What is a canary deployment — and why the name?

A canary deployment is a risk mitigation technique where you release a new version of an application to a small subset of users before rolling it out to everyone. Named after the canary in a coal mine, the idea is that if the canary dies, the miners know to get out — or at least, to turn back.

Diagram illustrating traffic split between stable version and canary version
Figure 1: Traffic flow in a typical canary rollout.

Unlike a rolling update where every change affects the entire fleet immediately, a canary lets you validate behavior in the wild. If the new code introduces a memory leak, you'll see it in the canary metrics long before it crashes your stable production environment.

Step-by-step guide

We recommend a 5% → 20% → 100% rollout strategy, validating key metrics at each step.

1. Traffic Split

Direct 5% of production traffic to the new version. The rest remains on the stable version. This is usually done via a load balancer or service mesh configuration.

2. Metric Observation

Monitor latency, error rates, and custom business metrics. Launchpad aggregates these in real-time. Look for anomalies in the canary pod's logs compared to the stable baseline.

3. Promote or Abort

If metrics look good, increase traffic to 20% or 50%. If an error spike occurs, Launchpad can automatically roll back. If confidence is high, proceed to 100%.

Common failure modes and configuration

The most dangerous part of a canary is the "blind spot" where metrics might look good but the user experience is degraded. Configure these gates to catch them.

  • Latency Spikes: Set a 200ms threshold on p95. If the new version is slower, pause traffic immediately.
  • Error Rate Thresholds: Define an absolute limit (e.g., 0.5% 5xx errors). Launchpad's gate engine enforces this.
  • Database Connection Pool Exhaustion: Monitor connection counts. A sudden drop in connections on the new pod often means it's crashing under load.

Code sample: Canary block

Here is a minimal configuration for a canary rollout on Launchpad. We use the canary stage to define the rollout strategy.

launchpad.yml
1pipeline: canary-rollout
2on: [push]
3 
4stages:
5  - deploy:stable
6  - canary:
7    traffic: 5%
8    metric: http_5xx
9    threshold: 0.5
10    pause: 60s
11    on_fail: rollback
12    on_success: promote
13    notify: slack#alerts

Comparison: Canary vs. Blue-Green vs. Rolling

Understanding the trade-offs helps you choose the right strategy for your stack.

Canary

Gradual traffic shift. Best for gradual confidence building. Slightly complex to implement at scale.

Blue-Green

Switch traffic instantly between two identical environments. Zero downtime, but double infrastructure costs.

Rolling

Update pods one by one. Simple to understand. Higher risk of partial failure affecting a subset of users.

Related resources

Launchpad Docs

Read the full documentation on deployment strategies and orchestration.

Read docs →

Changelog

See the latest features added to the platform, including our new canary engine.

View changelog →

Ship with confidence

Automate canary deployments in minutes.

Launchpad handles the traffic splitting, metrics monitoring, and automatic rollbacks so you don't have to think about it during deploy time.

Ship code. Not excuses.