Quick Definition
Trunk based development (TBD) is a source control workflow where developers integrate small, frequent changes directly into a single mainline branch (the “trunk”) and use short-lived feature flags or very short-lived branches to keep the trunk deployable.
Analogy: Think of a kitchen where everyone adds small ingredients to the pot frequently and tastes continuously so the final dish is always close to ready, versus cooking separate full dishes and merging them at the end.
Formal technical line: A continuous integration practice that emphasizes a single shared branch with frequent commits, automated builds, and fast feedback loops.
Other common meanings:
- The most common meaning: Continuous integration on a single main branch with small commits and feature toggles.
- A release discipline: Using trunk as the continuously releasable artifact.
- A collaboration model: Emphasizing pair programming and frequent syncs.
- A deployment strategy component: Often paired with canary or progressive delivery.
What is trunk based development?
What it is:
- A development workflow where the main branch (trunk) is the central integration point and is kept in a releasable state.
- Developers commit small, incremental changes frequently (often multiple times per day).
- Feature flags, toggles, or branch-by-abstraction are used for incomplete work to avoid long-lived feature branches.
What it is NOT:
- Not equivalent to trunk-only without any gating; not a license to merge broken code.
- Not a replacement for proper review, testing, or CI/CD controls.
- Not the same as simply deleting feature branches; it requires cultural and automation changes.
Key properties and constraints:
- Short-lived changes: branches, if used, should live hours to a few days.
- Continuous integration: every commit triggers automated builds and tests.
- Trunk must be deployable: automated tests and gating ensure trunk quality.
- Feature management: robust toggles and rollout controls are required.
- Fast feedback: build/test/merge times must be short to avoid developer blocking.
- Permission and policy: lightweight gating with automated policy checks.
Where it fits in modern cloud/SRE workflows:
- Works well with immutable infrastructure and pipelines that build artifacts in CI and deploy via CD.
- Aligns with GitOps and declarative infrastructure flows where the trunk describes desired state.
- Enables SREs to focus on service-level metrics and reliability instead of long merge conflict firefights.
- Facilitates progressive delivery patterns like canaries and feature flags for safety.
Diagram description (text-only):
- Developers fork local workspaces and run local tests.
- Small change committed and pushed to trunk or a short-lived branch.
- CI pipeline runs unit tests, lint, build artifact, and publishes to registry.
- CD pipeline deploys to staging, runs integration tests and canary rollout to production.
- Feature flag controls exposure; monitoring and SLO checks gate further rollout.
- If rollback needed, either toggle off feature flag or roll forward with patch.
trunk based development in one sentence
A workflow that prioritizes frequent, small merges into a single shared mainline combined with automation and feature management to keep the mainline continuously deployable.
trunk based development vs related terms (TABLE REQUIRED)
| ID | Term | How it differs from trunk based development | Common confusion |
|---|---|---|---|
| T1 | GitFlow | Uses long-lived branches and release branches | Often thought as CI focused |
| T2 | Feature Branching | Long-lived branches per feature | People mix short feature branches with feature flagging |
| T3 | GitHub Flow | Similar but emphasizes pull requests for trunk | Assumed to always be trunk based |
| T4 | Trunk-only | Enforced direct commits to trunk with no branches | Sometimes used interchangeably with TBD |
| T5 | Release Branching | Branch per release for stabilization | Confused as safer alternative |
| T6 | Forking Workflow | Contributors work in forks and PRs to trunk | Mistaken as same as short-lived branches |
Row Details (only if any cell says “See details below”)
- None
Why does trunk based development matter?
Business impact:
- Faster time to market: Smaller changes integrate faster, reducing lead time for features and fixes.
- Reduced merge risk: Frequent integration minimizes large merge conflicts that delay releases.
- Increased customer trust: Faster bug fixes and incremental improvements maintain service reliability.
- Risk mitigation: Feature flags and progressive delivery reduce blast radius for new changes.
Engineering impact:
- Higher developer productivity: Less context switching and smaller review scopes.
- Faster recovery: Smaller change sets are easier to reason about when incidents occur.
- Velocity retention: Teams spend less time cleaning up integration debt and more time delivering value.
SRE framing:
- SLIs/SLOs become central acceptance gates for deployments.
- Error budgets drive release velocity; SREs and product teams negotiate acceptable risk.
- Toil reduction: Automation of CI/CD and gating reduces repetitive manual integration tasks.
- On-call impact: Smaller changes reduce cognitive load during incident triage but require reliable observability.
What breaks in production (realistic examples):
- A feature toggle was enabled in production by mistake, exposing unfinished API endpoints and causing increased 5xx errors.
- A small change to shared library caused unexpected serialization regressions across multiple services.
- A build artifact with a transitive dependency update introduced latency spikes under load.
- A configuration drift between staging and production caused a dependency to fail post-deploy.
- An automated migration coupled with a partial rollout resulted in data inconsistency for a subset of users.
Where is trunk based development used? (TABLE REQUIRED)
| ID | Layer/Area | How trunk based development appears | Typical telemetry | Common tools |
|---|---|---|---|---|
| L1 | Edge and CDN | Config and small rules updated on trunk with staged rollout | Config deploy success, latency, cache hit | CI, CD, feature flags |
| L2 | Network | IaC changes via trunk with blue-green rollouts | Provision time, error rates | Terraform, pipelines |
| L3 | Service / API | Frequent small commits and toggles for new endpoints | Latency, error rate, throughput | CI, CD, observability |
| L4 | Application UI | Canary releases and feature flags for UX changes | Conversion, JS errors, load times | Feature flags, A/B tools |
| L5 | Data / DB | Migrations kept small and gated, trunk deploys migration steps | Migration success, slow queries | Migration tools, CI |
| L6 | Kubernetes | GitOps manifests in trunk and progressive rollout strategies | Pod restarts, rollout duration | GitOps, k8s, helm |
| L7 | Serverless | Small function updates deployed via trunk with traffic split | Invocation errors, cold starts | Managed cloud deploy tools |
| L8 | CI/CD | Pipelines defined in trunk as code, atomic changes | Pipeline success, duration | CI systems, pipeline-as-code |
| L9 | Observability | Dashboards and alerts updated in trunk for services | Alert rate, dashboard changes | Prometheus, tracing |
| L10 | Security | Policy-as-code in trunk with automated scans | Scan failures, policy violations | SAST, policy engines |
Row Details (only if needed)
- None
When should you use trunk based development?
When it’s necessary:
- Teams need rapid integration and deployment with continuous delivery goals.
- High change frequency where merge conflicts and integration debt are slowing delivery.
- Strong automation exists for CI, tests, and progressive delivery.
- SRE and product teams require tight control of release risk via feature flags and SLOs.
When it’s optional:
- Projects with infrequent releases and low collaboration overhead.
- Prototypes or experiments where separate branches are easier for isolated work.
- Teams lacking investment in automation and monitoring may opt for gradual adoption.
When NOT to use / overuse it:
- When regulations require isolated review and long stabilization windows for audit before merging (unless automation can satisfy audit).
- When a single commit can change many unrelated systems and there is no feature flagging or rollback mechanism.
- In mono-repos without scalable CI where builds take hours and block developers.
Decision checklist:
- If you have automated CI tests and < 15 minute CI feedback -> adopt TBD.
- If you need long stabilization windows for compliance -> consider gated release branches with trunk-style continuous integration.
- If team size is small and release cadence slow -> optional to adopt; evaluate costs.
- If using microservices with independent deploys and CI, TBD often improves velocity.
Maturity ladder:
- Beginner: Local testing, trunk with PRs, basic CI, manual toggles.
- Intermediate: Full CI, automated tests, basic feature flag system, canary deploys.
- Advanced: GitOps, progressive delivery, automated SLO gates, full observability and chaos testing.
Example decisions:
- Small team (5 developers): Enable trunk commits with mandatory CI and short-lived branches under 24 hours; use simple feature flags for risky features.
- Large enterprise (200+ devs): Use trunk with clear ownership, scoped directories, CI parallelization, feature flag orchestration, and SLO gates; maintain branch protections and policy-as-code.
How does trunk based development work?
Components and workflow:
- Developer creates a small change locally and runs local tests.
- Commit pushed to trunk or a short-lived branch that will be merged quickly.
- CI pipeline triggers unit tests, linting, and artifact build.
- Built artifact is stored/versioned in an artifact registry.
- CD pipeline deploys to staging; integration tests run automatically.
- Feature flag controls exposure; rollout begins as canary or percentage.
- Observability and SLO checks validate behavior; if green, progressively increase traffic.
- If issues, toggle off or patch quickly and continue.
Data flow and lifecycle:
- Code -> CI -> Artifact -> Registry -> CD -> Environment(s) -> Monitoring -> Feedback
- Feature flags control runtime behavior without requiring code rollback.
- Migrations are broken into backward-compatible steps coordinated by toggles and short-lived runbooks.
Edge cases and failure modes:
- Long-running database migrations that cannot be toggled require careful choreography.
- Shared library changes can cause cascading failures if not properly versioned.
- CI pipeline flakiness can block merges and erode developer trust.
- Feature flag debt where toggles are not removed increases complexity.
Practical examples (pseudocode commands, not in tables):
- Example CI step: run unit tests, build artifact, publish to registry, and run contract tests against staging.
- Feature toggle usage: check flag before executing new logic and default to safe fallback.
Typical architecture patterns for trunk based development
- Microservice + Feature Flags: Use small services with independent deploys and centralized feature flag manager for rollout control.
- GitOps for Kubernetes: Manifests live in trunk; Git push triggers reconciliation by cluster controllers.
- Monorepo with CI Matrix: Single trunk with scoped builds and cached artifacts to reduce CI time.
- Branch-by-abstraction: Use abstraction layers to merge incomplete features safely without toggles.
- Serverless incremental rollout: Use traffic splitting in managed services combined with trunk-based deployment.
Failure modes & mitigation (TABLE REQUIRED)
| ID | Failure mode | Symptom | Likely cause | Mitigation | Observability signal |
|---|---|---|---|---|---|
| F1 | Broken trunk | CI failures after merge | Insufficient tests or flaky tests | Improve test coverage and gate merges | CI failure rate |
| F2 | Feature flag misconfig | Users see unfinished feature | Flag misconfiguration or missing guard | Harden flag checks and default to off | Spike in errors after rollout |
| F3 | Long CI time | Developers delayed by builds | Unoptimized pipelines or monorepo scale | Parallelize and cache builds | Pipeline duration metric |
| F4 | Shared lib regression | Multiple services fail | Unversioned lib changes | Publish semver artifact and pin versions | Error bursts across services |
| F5 | Migration outage | Data errors or downtime | Non-backward compatible migrations | Split migrations and use feature toggles | DB error rate and latency |
| F6 | Config drift | Production-only failures | Manual config changes outside trunk | Enforce GitOps and policy checks | Config drift alerts |
| F7 | Alert storms | On-call overload | Poor alert thresholds after deploy | Tune alerts and use grouping | Alert per deploy metric |
Row Details (only if needed)
- None
Key Concepts, Keywords & Terminology for trunk based development
- Trunk — The mainline branch in source control — Single integration point — Confusion with trunk-only
- Feature toggle — Runtime switch for features — Allows partial rollout — Toggle debt if not removed
- Short-lived branch — Branch lives hours to days — Minimizes merge conflicts — Becomes long-lived if workflow breaks
- Continuous integration — Automated build and test per commit — Reduces integration risk — Flaky CI undermines trust
- Continuous delivery — Deployable artifact for frequent release — Lowers lead time — Requires automated pipelines
- Canary deployment — Gradual traffic rollout to subset — Limits blast radius — Poor canary metrics cause blind spots
- Blue-green deployment — Swap between identical environments — Fast rollback path — Costlier resource footprint
- GitOps — Declarative desired state in Git — Single source of truth for infra — Needs reconciliation controllers
- Feature management — Governance around toggles — Controls exposure — Risk of inconsistent flag states
- Roll forward — Fix by applying new change rather than reverting — Faster when small changes — Risky without fast feedback
- Rollback — Revert to previous known good state — Recovery option — Heavy-handed if feature flags available
- Artifact registry — Stores built artifacts — Enables immutable deployments — Mismanagement can cause version confusion
- Semantic versioning — Versioning scheme for artifacts — Helps compatibility — Not always strictly followed
- Immutable infrastructure — Deploy new instances instead of modifying existing — Reduces configuration drift — Requires good CI
- Contract testing — Ensures service interfaces remain compatible — Prevents integration failures — Needs consumer/provider discipline
- Tracing — Distributed request visibility — Helps diagnose regressions — Overhead if not sampled
- Observability — Metrics, logs, traces combined — Essential for SLO-based gates — Incomplete signals cause blind spots
- SLI — Service level indicator — Measures behavior tied to user experience — Incorrect definition misleads
- SLO — Service level objective — Target for SLI to manage reliability — Too strict SLO blocks innovation
- Error budget — Allowable error within SLO — Balances speed vs reliability — Misused as a license for unsafe releases
- Progressive delivery — Controlled gradual rollouts — Aligns with TBD — Requires orchestration and metrics
- Git branch protection — Rules on who can merge — Prevents risky merges — Overly strict rules slow velocity
- Pull request — Code review mechanism — Enables collaboration — Can be misused as lengthening isolation
- Pipeline as code — CI/CD pipeline definitions in repo — Versioned with code — Misconfigured pipeline code risks builds
- Infrastructure as code — Declarative infra in version control — Enables review and drift detection — Requires secrets handling
- Feature flag orchestration — Tooling to manage flags centrally — Reduces human error — Adds another control plane to maintain
- Short-lived feature branch — Temporary isolated development — Allows work without immediate trunk changes — Must be short-lived
- Code owner reviews — Designated reviewers for specific files — Ensures domain expertise — Can bottleneck merges
- Pair programming — Two developers work on same change — Reduces defects — Resource allocation challenge
- Monorepo — Single repo for many services — Simplifies cross-service changes — Requires scaled CI
- Polyrepo — Many repos per service — Clear ownership — Cross-repo changes harder
- Backward compatible change — New code compatible with older consumers — Enables safe rollouts — Requires careful design
- Forward compatible change — Older code runs with newer dependencies — Harder to achieve but valuable for stepwise rollout
- Toggle lifecycle — Creation, use, and removal of a flag — Prevents tech debt — Skipping removal causes complexity
- Runbook — Operational steps for incidents — Actionable instructions — Poor maintenance renders them useless
- Playbook — Higher-level guidance for incidents — Contextual steps — Too generic to be actionable
- Chaos engineering — Controlled failure injection — Validates resilience — Needs guardrails to avoid harm
- Observability debt — Missing or poor metrics/traces — Hinders diagnosis — Invest in instrumentation early
- Dependency pinning — Locking library versions — Prevents surprise updates — May delay security fixes
- Drift detection — Identify config divergence — Prevents unexpected behavior — Needs continuous scans
- Security scanning — Automated vulnerability checks — Reduces risk — Requires pipeline integration
- Deployment window — Preferred time for releases — Balances users and risk — Rigid windows can slow fixes
- Auditability — Traceable history of changes — Necessary for compliance — Requires policy-as-code
How to Measure trunk based development (Metrics, SLIs, SLOs) (TABLE REQUIRED)
| ID | Metric/SLI | What it tells you | How to measure | Starting target | Gotchas |
|---|---|---|---|---|---|
| M1 | Lead time for changes | Time from commit to production | Time between commit and prod deploy | < 1 day for many teams | Monorepo builds can inflate this |
| M2 | Deployment frequency | How often code reaches production | Number of successful deploys per day/week | Multiple deploys per day | Frequency without quality is harmful |
| M3 | Change failure rate | Fraction of deploys causing failures | Incidents per deploy | < 5% initially | Depends on incident severity |
| M4 | Mean time to recovery | Time to restore service after failure | Time from incident start to resolution | < 1 hour desirable | On-call staffing affects this |
| M5 | CI pass rate | Percent of successful CI runs | Successful CI runs / total runs | > 95% | Flaky tests distort signal |
| M6 | Time to merge | Time PRs spend open before merge | Average PR age | < 24 hours | Review bottlenecks increase this |
| M7 | Feature flag coverage | Share of risky features behind flags | Count of features with flags | Aim high for risky changes | Over-flagging causes debt |
| M8 | SLI compliance | SLI within SLO over window | Query SLI over time window | SLO dependent | Misdefined SLI misleads |
| M9 | Build duration | Time CI takes to build and test | CI pipeline timing | < 10-30 minutes | Heavy integration tests slow devs |
| M10 | Rollback rate | Fraction of deploys rolled back | Rollbacks per deploys | Low number targeted | Rollback may be rare when flags exist |
Row Details (only if needed)
- None
Best tools to measure trunk based development
Tool — CI/CD system (example: Git-based CI)
- What it measures for trunk based development: Build duration, pass rate, deployment frequency.
- Best-fit environment: Monorepo and polyrepo; cloud or on-prem CI.
- Setup outline:
- Define pipeline as code in repo.
- Run unit tests and lint on push.
- Build artifact and publish.
- Trigger CD on artifact publish.
- Strengths:
- Central control of build process.
- Integrates with SCM hooks.
- Limitations:
- Can be slow without parallelization.
- Requires maintenance for complex pipelines.
Tool — Feature flag system
- What it measures for trunk based development: Flag usage, exposure, user segments.
- Best-fit environment: Any environment requiring targeted rollouts.
- Setup outline:
- Integrate SDKs into services.
- Create flags and set defaults.
- Implement targeting and rollout rules.
- Strengths:
- Low-risk releases and quick rollback.
- Fine-grained control over traffic.
- Limitations:
- Operational overhead and toggle debt.
- Security needs for flag management.
Tool — Observability platform (metrics, tracing)
- What it measures for trunk based development: SLIs, latency, error rates, traces per request.
- Best-fit environment: Microservices and distributed apps.
- Setup outline:
- Instrument code for metrics and traces.
- Define SLI queries.
- Create dashboards and SLOs.
- Strengths:
- Deep insight into runtime behavior.
- Enables SLO gating for releases.
- Limitations:
- Storage and cost for high-cardinality traces.
- Requires consistent instrumentation.
Tool — GitOps controller
- What it measures for trunk based development: Manifest drift, deployment reconciliations, commit-to-deploy time.
- Best-fit environment: Kubernetes clusters.
- Setup outline:
- Store manifests in trunk.
- Configure controller to watch repo and apply changes.
- Monitor reconciliation status.
- Strengths:
- Declarative infrastructure and audit trail.
- Quick recovery via Git.
- Limitations:
- Needs RBAC and policy integration.
- Handling secrets requires care.
Tool — Artifact registry
- What it measures for trunk based development: Published artifact versions and promotion state.
- Best-fit environment: Any build-and-deploy pipeline.
- Setup outline:
- Configure CI to publish artifacts.
- Tag and promote artifacts through environments.
- Strengths:
- Immutable artifacts aid traceability.
- Enables rollbacks and reproducibility.
- Limitations:
- Storage costs and lifecycle management.
Recommended dashboards & alerts for trunk based development
Executive dashboard:
- Panels: Deployment frequency, lead time trend, overall SLO compliance, change failure rate.
- Why: High-level health indicators for business stakeholders.
On-call dashboard:
- Panels: Current active incidents, error rates by service, latency percentiles, recent deploys, alerts by severity.
- Why: Immediate triage context and correlation to recent changes.
Debug dashboard:
- Panels: Recent traces for failing requests, per-endpoint latency, recent deploys and commit IDs, feature flag states, DB query latency.
- Why: Provides detailed signals for restoring service.
Alerting guidance:
- Page (pager) vs ticket: Page only for severe, user-impacting SLO breaches or production outage. Create ticket for lower-severity degradations and non-urgent regressions.
- Burn-rate guidance: If error budget burn exceeds a defined multiple (e.g., 4x baseline), consider pausing risky changes; adjust thresholds per SLO.
- Noise reduction tactics: Deduplicate alerts by grouping using service and cluster tags, dedupe similar symptoms into single signals, suppress alerts during known maintenance windows.
Implementation Guide (Step-by-step)
1) Prerequisites – Version control system with trunk branch. – Automated CI pipeline configured. – Artifact registry available. – Feature flag system or equivalent toggle mechanism. – Observability (metrics, logs, traces) and SLO definitions. – Security scans integrated in pipeline.
2) Instrumentation plan – Identify SLIs per service and user journey. – Add metrics for request success, latency percentiles, and throughput. – Instrument trace points for critical path. – Ensure logs are structured and correlate with request IDs.
3) Data collection – Configure agents or SDKs to send metrics and traces to observability platform. – Set retention and sampling to balance cost and fidelity. – Ensure deployment metadata (commit ID, deploy ID) is attached to telemetry.
4) SLO design – Define SLIs that reflect user experience (e.g., 95th percentile latency). – Set SLO windows (30 days / 14 days) and derive error budgets. – Create burn-rate alert thresholds to manage release pacing.
5) Dashboards – Build executive, on-call, and debug dashboards. – Link deployment metadata to charts to correlate regressions with commits.
6) Alerts & routing – Define alert rules for SLO breaches, deploy failures, and critical errors. – Route alerts to appropriate on-call teams and escalation policies. – Implement dedupe and grouping rules.
7) Runbooks & automation – Create runbooks for common issues and deploy regressions. – Automate rollback steps or feature flag toggling where possible. – Include runbook executor scripts for common remediation commands.
8) Validation (load/chaos/game days) – Run load tests that mirror production patterns before wide rollout. – Run chaos experiments on staging and limited production canaries. – Use game days to validate runbooks and on-call processes.
9) Continuous improvement – Retrospect after incidents and releases. – Remove stale feature flags and reduce toggle debt. – Invest in CI performance and test reliability.
Checklists
Pre-production checklist:
- CI pipeline green for trunk and PRs.
- Unit and integration tests passing for the change.
- Feature flag created and default safe state set.
- SLI checks defined for impacted flows.
- Deployment plan and rollback/runbook documented.
Production readiness checklist:
- Artifact published and immutable.
- Gradual rollout strategy defined (percentage, stages).
- Monitoring dashboards ready and linked to deploy metadata.
- Alert routing checked and on-call person available.
- Backout toggle or rollback path validated.
Incident checklist specific to trunk based development:
- Identify whether the incident correlates with recent deploy ID or commit.
- Determine feature flag state for affected service and toggle off if safe.
- If rollback required, either roll artifact or deploy a hotfix to trunk.
- Capture telemetry and traces for root cause analysis.
- Run postmortem using deploy metadata to determine fix and prevent recurrence.
Examples (Kubernetes and managed cloud service)
- Kubernetes example:
- What to do: Store manifests in trunk, use GitOps controller, implement canary with traffic-splitting service mesh, attach deploy ID to pod labels.
- What to verify: Reconciliation success, pod health, canary error rates under SLO thresholds.
-
What “good” looks like: Canary completes with stable SLI and full rollout completes automated.
-
Managed cloud service example (serverless):
- What to do: Build function artifact in CI, publish version, use managed traffic-splitting to route 10% to new version, attach deploy metadata.
- What to verify: Invocation errors, cold start latency, downstream resource saturation.
- What “good” looks like: No SLO violations during 10% traffic and gradual increase to 100%.
Use Cases of trunk based development
-
Microservice API evolution – Context: Rapid API iteration across many small services. – Problem: Merge conflicts and increased rollback risk. – Why TBD helps: Small commits and feature flags enable safe evolution. – What to measure: Deployment frequency, change failure rate, API error rates. – Typical tools: CI, artifact registry, feature flags, observability.
-
Frontend UI experiments – Context: Frequent UX A/B tests and incremental improvements. – Problem: Long-lived branches block other UI work. – Why TBD helps: Feature toggles and canaries reduce user impact. – What to measure: JS error rate, conversion metrics, rollout adoption. – Typical tools: Feature flags, analytics, Sentry-like error tracking.
-
Database schema change – Context: Need to evolve schema without downtime. – Problem: Long locks and incompatible migrations. – Why TBD helps: Break migrations into backward-compatible steps and coordinate via toggles. – What to measure: Migration latency, DB error rate, query latency change. – Typical tools: Migration management, feature flags, observability.
-
Kubernetes cluster config changes – Context: Adjust cluster autoscaling or operator updates. – Problem: Hard to test cluster-level changes safely. – Why TBD helps: GitOps and progressive rollout minimize risk. – What to measure: Pod restart count, node utilization, reconcile failures. – Typical tools: GitOps controller, kube metrics, service mesh.
-
Serverless function updates – Context: Frequent changes to handlers and event logic. – Problem: Cold starts and throttling after mass deploys. – Why TBD helps: Traffic split and small deploys reduce impact. – What to measure: Invocation errors, latency p95, throttles. – Typical tools: Managed deploy tools, feature flags.
-
Shared library rollout – Context: Core library update used by many services. – Problem: Breaking changes propagate quickly. – Why TBD helps: Feature flags and pinned versions allow controlled adoption. – What to measure: Failures across services, consumer compatibility. – Typical tools: Artifact registry, canary consumers, contract tests.
-
Regulatory change deployment – Context: New compliance code required across services. – Problem: Need audit trail and controlled rollout. – Why TBD helps: Trunk provides single source of truth and auditable commits. – What to measure: Compliance checks passed, deployment traceability. – Typical tools: Policy-as-code, CI scans, audit logging.
-
Observability config updates – Context: Adding or modifying instrumentation config. – Problem: Broken dashboards or alert storms after changes. – Why TBD helps: Small changes reviewed and rolled out gradually with SLI checks. – What to measure: Alert rate, dashboard errors, telemetry volume. – Typical tools: Metrics platform, log pipeline, dashboard-as-code.
-
Data pipeline change – Context: ETL transformation update impacting downstream reports. – Problem: Data loss or schema mismatch. – Why TBD helps: Small commits and staged rollout prevent large blast radius. – What to measure: Data lag, job errors, output schema validity. – Typical tools: CI for data jobs, feature flags, data quality checks.
-
Security patching at scale – Context: Critical vulnerability requires fast patching. – Problem: Coordinating across many repos and services. – Why TBD helps: Fast commits to trunk combined with automated pipelines accelerate patch rollout. – What to measure: Patch deployment coverage, vulnerability scan results. – Typical tools: Dependency scanners, CI, artifact registry.
Scenario Examples (Realistic, End-to-End)
Scenario #1 — Kubernetes canary for user-facing API
Context: A payments API service needs a new fraud detection check. Goal: Deploy new logic with minimal risk to payment success rates. Why trunk based development matters here: Frequent commits and small changes reduce review friction; canary rollout and feature flags control exposure. Architecture / workflow: Trunk contains service code and k8s manifests; GitOps controller applies manifests; service mesh manages traffic splitting; observability records SLIs. Step-by-step implementation:
- Implement new fraud check behind flag.
- Commit to trunk and CI builds artifact.
- Publish artifact and update manifest in trunk.
- GitOps controller reconciles and creates new deployment.
- Split 5% traffic to canary via service mesh.
- Monitor SLI for payment success rate and latency.
- If stable over monitoring window, increase to 25% then 100%. What to measure: Payment success SLI, error rate, latency p95, CPU/memory. Tools to use and why: CI/CD, GitOps controller, service mesh, feature flag system, observability stack. Common pitfalls: Flag not defaulting to safe state; missing deploy metadata in telemetry. Validation: Canary passes SLO for 30 minutes at each stage. Outcome: New fraud detection rolled out safely with near-zero impact.
Scenario #2 — Serverless managed-PaaS progressive rollout
Context: A serverless image-processing function with heavy traffic. Goal: Reduce deployment risk and monitor cold-start impact. Why trunk based development matters here: Trunk-based CI ensures rapid patching and consistent artifacts; traffic split controls rollouts. Architecture / workflow: CI builds function artifact, publishes version; cloud provider manages traffic shift; logs and metrics correlate with deploy ID. Step-by-step implementation:
- Add change guarded by config flag.
- CI publishes new function version.
- Route 1% traffic to new version via provider traffic split.
- Monitor invocation errors and latency.
- If metrics stable, increase traffic gradually. What to measure: Invocation errors, cold-start latency p95, downstream queue depth. Tools to use and why: Managed cloud deploy, feature flags, logging and metrics. Common pitfalls: Throttling at downstream services; insufficient sampling of cold start traces. Validation: Function passes metrics for 24 hours at incremental traffic. Outcome: Safe rollout without full user impact.
Scenario #3 — Postmortem after production incident
Context: A production outage after a trunk merge introduced a regression in serialization. Goal: Restore service and prevent recurrence via process change. Why trunk based development matters here: Small commits should link to quick rollback and triage; feature flagging would have reduced impact. Architecture / workflow: Trunk commit triggered CI that missed an integration contract test. Step-by-step implementation:
- Identify commit ID from deploy metadata.
- Toggle off affected feature flag or roll forward with fix to trunk.
- Postmortem: root cause is missing contract test; add contract test to CI and require it before merge.
- Implement deployment SLO gating to block rollout on SLI degradation. What to measure: Time to identify faulty commit, time to recovery, number of similar regressions. Tools to use and why: CI, tracing, contract test framework. Common pitfalls: Missing deploy metadata makes traceability hard. Validation: Introduce test prevents similar regression in next 30 days. Outcome: Faster detection and improved CI coverage.
Scenario #4 — Cost-performance trade-off for a data pipeline
Context: ETL job costs spike during rollout of optimized transformation. Goal: Balance cost savings with throughput and latency. Why trunk based development matters here: Small merges enable benchmarking and incremental rollout. Architecture / workflow: Trunk contains pipeline code and job definitions; CI builds images; scheduling system runs canary runs on sample dataset. Step-by-step implementation:
- Implement optimized transform behind toggle and in toggle-enabled path.
- Run canary job on sample dataset in staging.
- Measure runtime cost, throughput, and output accuracy.
- If cost/performance acceptable, rollout to 10% of production data.
- Monitor job failures and data quality. What to measure: Job runtime, cost per run, data correctness metrics. Tools to use and why: CI, job scheduler, data quality tools, observability. Common pitfalls: Optimizations cause numerical drift in results. Validation: Data quality checks pass and cost reduces as expected. Outcome: Controlled cost optimization with acceptable accuracy.
Common Mistakes, Anti-patterns, and Troubleshooting
Format: Symptom -> Root cause -> Fix
- Symptom: CI failing frequently -> Root cause: Flaky tests -> Fix: Isolate and fix flaky tests, quarantine tests, add retry only after root cause fixed.
- Symptom: Long PR ages -> Root cause: Review bottleneck -> Fix: Expand code owners, use lightweight reviews, pair programming.
- Symptom: Feature visible to users prematurely -> Root cause: Flag default misconfigured -> Fix: Default feature flags to off in production, require review for flag enablement.
- Symptom: Rollback required often -> Root cause: Release without canary checks -> Fix: Implement progressive delivery and SLO gates before full rollout.
- Symptom: Multiple services break after lib update -> Root cause: Unpinned dependencies -> Fix: Use semver, release compatibility tests, consumer-driven contract tests.
- Symptom: High time to merge -> Root cause: Slow CI -> Fix: Parallelize tests, use cached artifacts, split heavy tests to nightly.
- Symptom: Production-only bug -> Root cause: Config drift -> Fix: Use GitOps for config, enforce drift detection.
- Symptom: Observability gaps during incident -> Root cause: Missing traces or metrics -> Fix: Add critical path instrumentation and request IDs.
- Symptom: Alert fatigue -> Root cause: Poor thresholding and duplicate alerts -> Fix: Tune thresholds, dedupe alerts, implement grouping.
- Symptom: Toggle debt accumulates -> Root cause: No lifecycle for flags -> Fix: Add TTLs and ownership for flags, remove stale flags during review.
- Symptom: Unauthorized config change -> Root cause: Direct cluster changes bypassing trunk -> Fix: Enforce policy and GitOps, restrict permissions.
- Symptom: Slow rollouts in monorepo -> Root cause: Full-repo builds for small change -> Fix: Implement path-based CI and partial builds.
- Symptom: Data inconsistency after migration -> Root cause: Non-backward compatible migration -> Fix: Use expand/contract pattern and toggles for schema changes.
- Symptom: Secrets leaked in logs -> Root cause: Improper logging config -> Fix: Mask secrets, enforce log sanitization.
- Symptom: Overly permissive merges -> Root cause: Weak branch protection -> Fix: Enforce CI pass and reviews before merge.
- Symptom: Lack of traceability for deploys -> Root cause: Missing deploy ID metadata -> Fix: Attach commit and deploy IDs to telemetry.
- Symptom: Build artifacts mismatched -> Root cause: Non-immutable artifact promotion -> Fix: Use immutable tags and promote artifacts between environments.
- Symptom: Slow incident response -> Root cause: Poor runbooks -> Fix: Create targeted runbooks with exact commands and verification steps.
- Symptom: Monitoring cost explosion -> Root cause: High-cardinality telemetry without sampling -> Fix: Apply sampling and reduce cardinality in instrumentation.
- Symptom: Developers avoid trunk -> Root cause: Fear of breaking trunk -> Fix: Improve CI reliability and add guard rails like feature toggles.
- Observability pitfall: Too many metrics without context -> Root cause: No defined SLIs -> Fix: Define SLI/SLO and focus instrumentation on them.
- Observability pitfall: High-cardinality metrics everywhere -> Root cause: Uncontrolled labels -> Fix: Limit label cardinality; aggregate where possible.
- Observability pitfall: Missing deploy correlation -> Root cause: No deploy metadata in spans/metrics -> Fix: Add deploy id tags to metrics and logs.
- Observability pitfall: Logs unstructured -> Root cause: Inconsistent logging libraries -> Fix: Standardize logging schema and enforce via linting.
- Observability pitfall: Alerts fire for expected spikes -> Root cause: No suppression for scheduled jobs -> Fix: Use maintenance windows and spike suppression rules.
Best Practices & Operating Model
Ownership and on-call:
- Define ownership at service level; owners responsible for SLOs and release readiness.
- On-call rotates among engineers familiar with service; include release duty in rotation when high-risk changes occur.
Runbooks vs playbooks:
- Runbooks: Specific step-by-step procedures for incidents; keep concise and executable.
- Playbooks: Higher-level strategies and decision trees; useful for triage and escalation.
Safe deployments:
- Prefer canaries and progressive delivery.
- Automate rollback by toggling flags or promoting an earlier immutable artifact.
- Keep deploys small and frequent.
Toil reduction and automation:
- Automate repetitive test and deployment steps.
- Automate flag lifecycle tasks like flag removal reminders.
- Prioritize automation for build caching, artifact promotion, and deployment gating.
Security basics:
- Run SAST and dependency scans in CI before merge.
- Protect trunk with branch policies and automated policy-as-code checks.
- Treat feature flags and their management plane as security-sensitive.
Weekly/monthly routines:
- Weekly: Review failed builds, flaky tests, and open toggles.
- Monthly: Remove stale flags, review SLO attainment, and cleanup unused artifacts.
What to review in postmortems related to trunk based development:
- Which commit(s) introduced the issue and why traceability failed.
- Whether feature flags were used and how effective they were.
- CI pipeline gaps, test coverage issues, and recommendations for automation.
What to automate first:
- CI tests for trunk and PR gating.
- Automated artifact publishing and promotion.
- Feature flag creation and enforcement in pipeline.
- Attach deploy metadata to telemetry.
Tooling & Integration Map for trunk based development (TABLE REQUIRED)
| ID | Category | What it does | Key integrations | Notes |
|---|---|---|---|---|
| I1 | CI System | Builds and tests commits | SCM, artifact registry, security scanners | Core for TBD |
| I2 | CD / GitOps | Deploys artifacts and reconciles manifest | CI, k8s, service mesh | Use for cluster automation |
| I3 | Feature Flags | Controls runtime behavior | SDKs, CI, observability | Manage flag lifecycle |
| I4 | Observability | Metrics, logs, traces | CI, deployments, feature flags | SLI/SLO source |
| I5 | Artifact Registry | Stores immutable artifacts | CI, CD | Promote artifacts between envs |
| I6 | Policy-as-code | Enforces compliance checks | SCM, CI | Gate merges and manifests |
| I7 | Contract Testing | Validates service contracts | CI, consumer repos | Prevent integration regressions |
| I8 | Dependency Scanners | Finds vulnerable dependencies | CI, artifact registry | Automate security checks |
| I9 | Service Mesh | Traffic control for canaries | CD, k8s, observability | Fine-grained rollout control |
| I10 | Runbook Executor | Automates runbook steps | On-call, CI | Reduces manual toil |
Row Details (only if needed)
- None
Frequently Asked Questions (FAQs)
How do I start adopting trunk based development?
Start small: ensure CI tests are reliable, add feature flags for risky changes, and require CI pass for merges. Run a pilot on one service.
How do I keep trunk deployable?
Enforce CI gating, automated tests, and small commits; use feature flags for incomplete work.
How do I handle database migrations with TBD?
Use the expand/contract pattern: add compatible columns first, deploy code that writes both formats, migrate data, then remove old schema.
What’s the difference between trunk based and GitFlow?
GitFlow uses long-lived branches like develop and release; trunk based emphasizes a single mainline and frequent small merges.
What’s the difference between trunk based and GitHub Flow?
GitHub Flow is similar but centers on a trunk with PR reviews; trunk based focuses on short-lived changes and frequent merges.
What’s the difference between trunk based and trunk-only?
Trunk-only implies no branches at all; trunk based allows short-lived branches and flags but keeps trunk releasable.
How do I measure success with trunk based development?
Track lead time, deployment frequency, change failure rate, MTTR, and SLO compliance.
How do I manage feature flags at scale?
Use a centralized flag management system, enforce ownership, and remove flags when no longer needed.
How do I prevent toggle debt?
Add TTLs and code-review checks to remove flags, and track flag inventory as part of backlog.
How do I avoid CI becoming a bottleneck?
Parallelize steps, use caching, run fast unit tests on push, and schedule heavy integration tests separately.
How do I make rollbacks safer?
Favor roll forward fixes and feature flag toggling; keep immutable artifacts and a documented rollback path.
How do I reduce noisy alerts after deploys?
Tune thresholds, group alerts, and align alerts with SLOs; suppress known maintenance windows.
How do I ensure compliance in trunk based workflows?
Integrate policy-as-code checks into CI and require audit logs for deploys and config changes.
How do I coordinate across multiple teams?
Use shared trunk conventions, scoped ownership, and standardized CI templates; create cross-team SLO agreements.
How do I keep feature flags secure?
Restrict access to flag management, log flag changes, and integrate flag approvals into workflow.
How do I train teams for TBD?
Run workshops on feature flags, CI best practices, and SLO-driven development; run game days to practice.
How do I know if trunk based development is right for my team?
Assess CI reliability, release frequency needs, and capacity to invest in feature management and observability.
Conclusion
Trunk based development is a practical, modern approach to source control and release management that reduces integration risk and enables faster, safer delivery when paired with automation, feature management, and observability. It requires cultural change, investment in CI/CD, and disciplined flag governance.
Next 7 days plan:
- Day 1: Audit current CI runtimes, flaky tests, and trunk health.
- Day 2: Identify one service to pilot short-lived branches and feature flags.
- Day 3: Add deploy metadata and basic SLIs for the pilot service.
- Day 4: Implement a feature flag for an upcoming change and merge to trunk.
- Day 5: Run a staged canary rollout and validate SLOs.
- Day 6: Review and remove any trivial toggle debt and document runbooks.
- Day 7: Hold a retro and define next sprint goals to scale adoption.
Appendix — trunk based development Keyword Cluster (SEO)
- Primary keywords
- trunk based development
- trunk based development workflow
- trunk based development guide
- trunk based development examples
- trunk based development vs gitflow
- trunk based development best practices
- trunk based development feature flags
- trunk based development CI CD
- trunk based development SLOs
-
trunk based development GitOps
-
Related terminology
- feature toggles
- feature flags lifecycle
- continuous integration practices
- continuous delivery pipelines
- canary deployment strategy
- blue green deployment pattern
- service mesh canary
- GitOps for Kubernetes
- trunk vs branch workflows
- short lived branches
- PR review for trunk
- semantic versioning in CI
- artifact registry management
- deploy metadata tagging
- SLI SLO error budget
- observability for releases
- tracing and deploy correlation
- monitoring deploy impact
- contract testing CI
- policy as code gating
- branch protection rules
- deployment frequency metric
- lead time for changes metric
- change failure rate metric
- mean time to recovery metric
- feature flag orchestration
- toggle debt management
- expand contract migration pattern
- immutable infrastructure pipelines
- pipeline as code examples
- build caching strategies
- parallel CI pipelines
- test flakiness mitigation
- runbooks for deploy rollback
- playbooks for incident triage
- chaos engineering game days
- on-call rotation for releases
- deployment runbook checklist
- staged rollout best practices
- traffic splitting techniques
- blue green vs canary comparison
- serverless progressive delivery
- Kubernetes GitOps workflows
- monorepo CI strategies
- polyrepo coordination tips
- dependency pinning in CI
- automated security scanning in pipeline
- audit logging for trunk changes
- deploy rollback automation
- telemetry tagging with commit ID
- observability debt reduction
- alert grouping and dedupe
- burn rate alerting guidance
- feature flag security controls
- flag default safety rules
- feature flag removal checklist
- flag ownership assignment
- SLO based deploy gating
- incremental database migration steps
- canary metrics to monitor
- artifact promotion workflow
- CI to CD handoff best practices
- GitOps reconciliation alerts
- manifest as code patterns
- helm and kustomize in trunk
- service level indicators examples
- SLO starting targets
- error budget policy examples
- observability dashboards for release
- executive release metrics
- on-call debug dashboard layout
- deploy correlating dashboards
- feature flag telemetry
- feature experimentation and A B testing
- release orchestration tools list
- trunk based development adoption plan
- trunk based development pilot checklist
- trunk based development maturity model
- trunk based development training plan
- trunk based development pitfalls
- trunk based development common mistakes
- trunk based development troubleshooting steps
- trunk based development incident response
- trunk based development postmortem checklist
- trunk based development security checks
- trunk based development compliance workflows
- trunk based development enterprise scale
- trunk based development for startups
- trunk based development for microservices
- trunk based development for serverless
- trunk based development for data pipelines
- trunk based development for frontend apps
- trunk based development for backend services
- trunk based development metrics dashboard
- trunk based development automation priorities
- trunk based development feature flagging patterns
- trunk based development Git workflow templates
- trunk based development CI best practices
- trunk based development CD best practices
- trunk based development observability checklist
- trunk based development SLI examples
- trunk based development SLO examples
- trunk based development alerting strategy
- trunk based development runbook examples
- trunk based development game day exercises
- trunk based development cost optimization tradeoff
- trunk based development performance tuning
- trunk based development rollback strategies
- trunk based development roll forward approach
- trunk based development release governance
- trunk based development cross-team coordination
- trunk based development tooling map
- trunk based development integration tests
- trunk based development contract tests
- trunk based development service ownership
- trunk based development observability pitfalls
- trunk based development test isolation
- trunk based development canary validation steps
- trunk based development blue green switch steps
- trunk based development feature gating rules
- trunk based deployment checklist
- trunk based release checklist
- trunk based migration pattern
- trunk based data pipeline strategy
- trunk based development serverless strategies
- trunk based development Kubernetes best practices
- trunk based development GitOps integration
- trunk based development artifact strategy
- trunk based development CI optimization
- trunk based development SLO driven development