Deployment Guide
How to deploy the mpac-obs observability stack locally and to AWS.
Local Development
Prerequisites
- Docker Desktop (macOS/Windows) or Docker Engine + Docker Compose v2 (Linux)
- At least 4 GB of RAM allocated to Docker
Start the Stack
From the repository root:
bash
make upOr manually:
bash
cd stack
docker compose up -dVerify Health
bash
make statusAll five services should show as "running" (or "healthy" for those with health checks):
| Service | Health Check URL |
|---|---|
| Prometheus | http://localhost:9090/-/healthy |
| Loki | http://localhost:3100/ready |
| Tempo | http://localhost:3200/ready |
| Alloy | http://localhost:12345/-/ready |
| Grafana | http://localhost:3000/api/health |
Access Points
| Service | URL | Credentials |
|---|---|---|
| Grafana | http://localhost:3000 | admin / admin |
| Prometheus | http://localhost:9090 | - |
| Loki | http://localhost:3100 | - |
| Tempo | http://localhost:3200 | - |
| Alloy UI | http://localhost:12345 | - |
Stop the Stack
bash
make downClean Restart (Remove All Data)
bash
make clean
make upAWS Deployment
Architecture
In AWS, the observability stack runs on ECS Fargate with the following topology:
- Alloy: ECS Fargate service, receives OTLP from application services
- Prometheus, Loki, Tempo: ECS Fargate services (or EC2 for Dev Hybrid)
- Grafana: ECS Fargate service with persistent EFS volume
Service discovery is handled by AWS Cloud Map, providing DNS names like:
alloy.mpac-obs.{env}.local:4317prometheus.mpac-obs.{env}.local:9090
Prerequisites
- AWS CLI configured with appropriate credentials
- Access to the deployment IAM role
- The
mpac-infrarepository (contains CloudFormation templates)
Deploy via GitHub Actions
- Go to Actions -> CD - Deploy Observability Stack
- Click Run workflow
- Select the target environment (dev / staging / prod)
- Optionally enable Dry run to validate without deploying
- Click Run workflow
Deploy Manually
bash
# Ensure AWS credentials are configured
aws sts get-caller-identity
# Deploy infrastructure (from mpac-infra repo)
cd mpac-infra/observability
make deploy ENV=dev
# Update configs (push to S3)
aws s3 sync stack/alloy/ s3://mpac-obs-configs-dev/alloy/
aws s3 sync stack/prometheus/ s3://mpac-obs-configs-dev/prometheus/
aws s3 sync stack/loki/ s3://mpac-obs-configs-dev/loki/
aws s3 sync stack/tempo/ s3://mpac-obs-configs-dev/tempo/
# Trigger ECS service update
aws ecs update-service \
--cluster mpac-obs-dev \
--service alloy \
--force-new-deploymentEnvironment Configuration
| Parameter | Dev | Staging | Production |
|---|---|---|---|
| Prometheus retention | 14 days | 14 days | 30 days |
| Loki retention | 14 days | 14 days | 30 days |
| Tempo retention | 14 days | 14 days | 30 days |
| Alloy batch size | 1000 | 1000 | 2000 |
| Grafana replicas | 1 | 1 | 2 |
Rollback
To roll back to a previous configuration:
bash
# List previous task definition revisions
aws ecs list-task-definitions --family-prefix mpac-obs-alloy
# Update service to use previous revision
aws ecs update-service \
--cluster mpac-obs-dev \
--service alloy \
--task-definition mpac-obs-alloy:<previous-revision>Troubleshooting
Alloy Not Receiving Data
- Check Alloy is running:
curl http://localhost:12345/-/ready - Verify OTLP ports are open:
nc -zv localhost 4317 - Check Alloy logs:
docker compose logs alloy - Verify the bearer token matches between sender and receiver
Grafana Shows No Data
- Check datasource connectivity in Grafana: Settings -> Data Sources -> Test
- Verify Prometheus has data:
curl http://localhost:9090/api/v1/targets - Verify Loki has data:
curl http://localhost:3100/loki/api/v1/labels - Check that services are sending telemetry (look for OTLP connection logs)
High Memory Usage
- Loki: Reduce
max_size_mbin embedded cache config - Prometheus: Reduce
storage.tsdb.retention.time - Alloy: Reduce
send_batch_sizein batch processor