Deployment Strategy
Part of: MPAC SmartPOS Cloud Platform - Product RequirementsVersion: 2.0 Last Updated: 2026-01-28
Overview
This document defines the deployment strategy for the MPAC SmartPOS Cloud Platform. The strategy emphasizes zero-downtime deployments, safe rollback mechanisms, and backward-compatible database migrations. The blue-green deployment pattern ensures that new versions can be thoroughly tested before switching production traffic, with instant rollback capability if issues arise.
Table of Contents
Blue-Green Deployment
Purpose: Enable zero-downtime deployments with instant rollback capability.
Deployment Flow
Blue Environment (current production)
└─ Running v2.0.0
└─ Handling 100% production traffic
└─ Target Group: TG-Blue
Green Environment (new version)
└─ Deploy v2.1.0
└─ ECS tasks started with new image
└─ Health checks pass
└─ Run smoke tests
└─ Run integration tests
└─ If all pass:
└─ Switch Route 53 weighted routing to Green (5% traffic)
└─ Monitor for 15 minutes (error rates, latency, logs)
└─ If stable: Increase to 50% traffic
└─ Monitor for 30 minutes
└─ If stable: Increase to 100% traffic
└─ Monitor for 1 hour
└─ If stable: Decommission Blue environment
└─ If issues: Rollback to Blue (instant)Deployment Phases
Phase 1: Green Environment Preparation
Provision Green Environment:
- Create new ECS task definitions with v2.1.0 image
- Start ECS tasks in Green target group
- Wait for health checks to pass (all tasks healthy)
Pre-Production Validation:
- Run smoke tests against Green environment
- Verify database connectivity
- Check external API integrations
- Validate authentication flows
Phase 2: Traffic Shifting (Gradual Cutover)
5% Traffic (Canary):
- Update Route 53 weighted routing policy:yaml
Blue: Weight 95 Green: Weight 5 - Monitor for 15 minutes:
- Error rate < 0.1%
- P95 latency < 500ms
- No 5xx errors
- Alert on anomalies
- Update Route 53 weighted routing policy:
50% Traffic:
- Update Route 53 weighted routing policy:yaml
Blue: Weight 50 Green: Weight 50 - Monitor for 30 minutes:
- Compare Blue vs Green metrics
- Check for memory leaks
- Validate transaction success rates
- Update Route 53 weighted routing policy:
100% Traffic (Full Cutover):
- Update Route 53 weighted routing policy:yaml
Blue: Weight 0 Green: Weight 100 - Monitor for 1 hour:
- Sustained performance
- No anomalies in logs
- Business metrics stable
- Update Route 53 weighted routing policy:
Phase 3: Blue Environment Decommission
- Wait Period: Keep Blue running for 1 hour after 100% cutover
- Verification: Confirm no rollback needed
- Decommission:
- Stop Blue ECS tasks
- Preserve Blue task definition for emergency rollback
- Archive Blue environment configuration
Traffic Shifting Automation
CloudFormation/Terraform Configuration:
# Route 53 Weighted Routing Policy
Route53RecordBlue:
Type: AWS::Route53::RecordSet
Properties:
HostedZoneId: !Ref HostedZoneId
Name: api.mpac-cloud.com
Type: A
SetIdentifier: "blue-environment"
Weight: !Ref BlueWeight # Initially 100
AliasTarget:
DNSName: !GetAtt ALBBlue.DNSName
HostedZoneId: !GetAtt ALBBlue.CanonicalHostedZoneID
Route53RecordGreen:
Type: AWS::Route53::RecordSet
Properties:
HostedZoneId: !Ref HostedZoneId
Name: api.mpac-cloud.com
Type: A
SetIdentifier: "green-environment"
Weight: !Ref GreenWeight # Initially 0
AliasTarget:
DNSName: !GetAtt ALBGreen.DNSName
HostedZoneId: !GetAtt ALBGreen.CanonicalHostedZoneIDMonitoring During Deployment
Key Metrics to Watch:
Metrics:
- ErrorRate:
Threshold: < 0.1%
Alert: error_rate > 0.5%
- Latency:
P50: < 100ms
P95: < 500ms
P99: < 1000ms
Alert: p95 > 1000ms
- Success Rate:
Threshold: > 99.9%
Alert: success_rate < 99.5%
- Database Connections:
Alert: connection_errors > 10
- Memory Usage:
Alert: memory_usage > 85%Rollback Triggers
Automatic Rollback Conditions:
- Error rate > 1% for 5 consecutive minutes
- P95 latency > 2000ms for 5 consecutive minutes
- Critical service unavailable (database, Redis)
- More than 10 5xx errors in 1 minute
Manual Rollback Decision:
- Business metric anomalies (transaction volume drop)
- Customer reports of issues
- Security concerns
- Data integrity issues
Database Migrations
Purpose: Safe, backward-compatible schema changes that don't block deployments or require downtime.
Backward-Compatible Migrations Only
Principle: All migrations must support both old and new code running simultaneously during blue-green deployment.
Two-Phase Deploy
Phase 1: Additive Schema Changes
Deploy schema changes that ADD new structures without removing old ones:
Add new columns (nullable or with defaults):
sql-- Migration 001: Add new column ALTER TABLE merchants ADD COLUMN external_id TEXT NULL; -- Add index for new column CREATE INDEX idx_merchants_external_id ON merchants(external_id);Deploy code changes:
- New code v2.1.0 uses
external_idcolumn - Old code v2.0.0 ignores
external_idcolumn - Both versions work correctly
- New code v2.1.0 uses
Gradual cutover:
- Blue (v2.0.0) and Green (v2.1.0) run simultaneously
- No conflicts because migration is additive
Phase 2: Deprecation and Cleanup
After full cutover, remove deprecated columns in a separate migration:
- Wait period: 24 hours after 100% traffic on Green
- Verify: No Blue environment running
- Deploy cleanup migration:sql
-- Migration 002: Remove deprecated column (after v2.1.0 rollout) ALTER TABLE merchants DROP COLUMN old_column_name;
Migration Examples
Example 1: Renaming a Column
Avoid this (breaks old code):
-- BAD: Old code will break
ALTER TABLE payments
RENAME COLUMN amount TO total_amount;Do this instead (two-phase):
-- Phase 1: Add new column, populate from old column
ALTER TABLE payments
ADD COLUMN total_amount DECIMAL(15, 2);
UPDATE payments
SET total_amount = amount
WHERE total_amount IS NULL;
-- Deploy new code that reads/writes total_amount
-- Old code still reads/writes amount
-- Phase 2: After full cutover (24h later)
ALTER TABLE payments
DROP COLUMN amount;Example 2: Changing Column Type
Avoid this (breaks old code):
-- BAD: Type change breaks old code
ALTER TABLE devices
ALTER COLUMN device_id TYPE UUID USING device_id::UUID;Do this instead (two-phase):
-- Phase 1: Add new column with new type
ALTER TABLE devices
ADD COLUMN device_uuid UUID;
-- Populate new column
UPDATE devices
SET device_uuid = device_id::UUID
WHERE device_uuid IS NULL;
-- Deploy new code that uses device_uuid
-- Old code continues using device_id
-- Phase 2: After full cutover (24h later)
ALTER TABLE devices
DROP COLUMN device_id;
ALTER TABLE devices
RENAME COLUMN device_uuid TO device_id;Example 3: Adding NOT NULL Constraint
Safe approach:
-- Phase 1: Add column as nullable with default
ALTER TABLE stores
ADD COLUMN timezone TEXT DEFAULT 'Asia/Tokyo';
-- Backfill existing rows
UPDATE stores
SET timezone = 'Asia/Tokyo'
WHERE timezone IS NULL;
-- Phase 2: After verification (24h later)
ALTER TABLE stores
ALTER COLUMN timezone SET NOT NULL;Migration Tools
svc-portal (Python/Alembic)
# Create new migration
cd mpac-smartpos/svc-portal
uv run alembic revision --autogenerate -m "Add external_id to merchants"
# Review generated migration
# Edit migration to ensure backward compatibility
# Apply migration
uv run alembic upgrade head
# Rollback if needed
uv run alembic downgrade -1svc-smarttab and mpac-pgw (Go/golang-migrate)
# Create new migration
cd mpac-smartpos/svc-smarttab
migrate create -ext sql -dir db/migrations -seq add_external_id_to_merchants
# Edit migration files:
# - 000001_add_external_id_to_merchants.up.sql
# - 000001_add_external_id_to_merchants.down.sql
# Apply migration
migrate -path db/migrations -database "postgresql://user:pass@localhost/smarttab" up
# Rollback if needed
migrate -path db/migrations -database "postgresql://user:pass@localhost/smarttab" down 1Migration Validation
Pre-Deployment Checklist:
- [ ] Migration tested in local development environment
- [ ] Migration tested in staging with production-like data volume
- [ ] Rollback migration tested and verified
- [ ] Both old and new code tested with migration applied
- [ ] No DROP statements for columns still in use
- [ ] No ALTER TYPE without dual-column approach
- [ ] Indexes added for new columns
- [ ] Migration time estimated (< 5 minutes for production)
Rollback Procedure
Purpose: Quick recovery from failed deployments or issues discovered post-deployment.
Instant Rollback (Traffic Shift)
When to Use: Issues detected during or immediately after deployment.
Steps:
Detect Issue:
- Automated alerts (error rate, latency)
- Manual identification (customer reports, monitoring)
Initiate Rollback:
bash# Update Route 53 weights back to Blue aws route53 change-resource-record-sets \ --hosted-zone-id Z123456 \ --change-batch file://rollback-to-blue.jsonjson{ "Changes": [ { "Action": "UPSERT", "ResourceRecordSet": { "Name": "api.mpac-cloud.com", "Type": "A", "SetIdentifier": "blue-environment", "Weight": 100, "AliasTarget": { "DNSName": "blue-alb.us-east-1.elb.amazonaws.com", "HostedZoneId": "Z1234567890ABC" } } }, { "Action": "UPSERT", "ResourceRecordSet": { "Name": "api.mpac-cloud.com", "Type": "A", "SetIdentifier": "green-environment", "Weight": 0, "AliasTarget": { "DNSName": "green-alb.us-east-1.elb.amazonaws.com", "HostedZoneId": "Z1234567890ABC" } } } ] }Verify Rollback:
- Monitor error rates drop
- Verify traffic routing to Blue
- Check health metrics stabilize
Post-Rollback Actions:
- Investigate root cause
- Fix issues in Green environment
- Test fix in staging
- Retry deployment when ready
Timeline: 30-60 seconds for traffic to fully shift back to Blue.
Database Migration Rollback
When to Use: Schema changes caused issues or need to be reverted.
Important: Only rollback migrations if:
- No data has been written to new columns
- Old schema is still compatible
- Tested rollback in staging first
Steps:
Stop Green Environment:
bash# Scale down Green ECS tasks to 0 aws ecs update-service \ --cluster mpac-prod \ --service svc-portal-green \ --desired-count 0Rollback Migration (svc-portal):
bashcd mpac-smartpos/svc-portal uv run alembic downgrade -1Rollback Migration (svc-smarttab):
bashcd mpac-smartpos/svc-smarttab migrate -path db/migrations -database "$DB_URL" down 1Verify Database State:
sql-- Check schema version SELECT * FROM alembic_version; -- svc-portal SELECT * FROM schema_migrations; -- svc-smarttab -- Verify tables \d merchantsRestart Blue Environment:
- Ensure Blue uses old code compatible with rolled-back schema
Warning: Database rollbacks are risky. Prefer forward-fixes when possible.
Feature Flag Rollback
When to Use: Disable specific features without full deployment rollback.
Mechanism:
# Feature flag service (LaunchDarkly or custom)
from feature_flags import is_enabled
@app.post("/v1/payments/create")
async def create_payment(request: PaymentRequest):
if is_enabled("new_payment_flow", user=request.user):
return await new_payment_flow(request)
else:
return await legacy_payment_flow(request)Rollback Steps:
Identify Problematic Feature:
- Correlate errors with feature flag usage
Disable Feature Flag:
bash# Via LaunchDarkly UI or API curl -X PATCH https://app.launchdarkly.com/api/v2/flags/default/new_payment_flow \ -H "Authorization: api-key-xyz" \ -d '{"variations": [{"value": false}]}'Verify Rollback:
- Monitor metrics improve
- Confirm feature disabled for all users
Timeline: Instant (< 5 seconds for flag propagation)
Rollback Decision Matrix
| Issue Type | Rollback Method | Timeline | Risk |
|---|---|---|---|
| High Error Rate | Traffic shift to Blue | 30-60s | Low |
| Performance Degradation | Traffic shift to Blue | 30-60s | Low |
| Feature Bug | Feature flag disable | < 5s | Very Low |
| Database Corruption | Database rollback + traffic shift | 5-10min | High |
| Security Vulnerability | Immediate traffic shift | < 30s | Low |
Cross-References
Related Sections
- AWS Infrastructure - Infrastructure components used in deployment
- Environments - Environment-specific deployment configurations
- CI/CD Pipeline - Automated deployment execution
Related Technical Sections
- Database Architecture - Schema design patterns
- Security Architecture - Security considerations during deployment
Navigation
Previous: AWS InfrastructureNext: EnvironmentsUp: Deployment Index