Skip to content

Deployment Guide

How to deploy the mpac-obs observability stack locally and to AWS.

Local Development

Prerequisites

  • Docker Desktop (macOS/Windows) or Docker Engine + Docker Compose v2 (Linux)
  • At least 4 GB of RAM allocated to Docker

Start the Stack

From the repository root:

bash
make up

Or manually:

bash
cd stack
docker compose up -d

Verify Health

bash
make status

All five services should show as "running" (or "healthy" for those with health checks):

ServiceHealth Check URL
Prometheushttp://localhost:9090/-/healthy
Lokihttp://localhost:3100/ready
Tempohttp://localhost:3200/ready
Alloyhttp://localhost:12345/-/ready
Grafanahttp://localhost:3000/api/health

Access Points

ServiceURLCredentials
Grafanahttp://localhost:3000admin / admin
Prometheushttp://localhost:9090-
Lokihttp://localhost:3100-
Tempohttp://localhost:3200-
Alloy UIhttp://localhost:12345-

Stop the Stack

bash
make down

Clean Restart (Remove All Data)

bash
make clean
make up

AWS Deployment

Architecture

In AWS, the observability stack runs on ECS Fargate with the following topology:

  • Alloy: ECS Fargate service, receives OTLP from application services
  • Prometheus, Loki, Tempo: ECS Fargate services (or EC2 for Dev Hybrid)
  • Grafana: ECS Fargate service with persistent EFS volume

Service discovery is handled by AWS Cloud Map, providing DNS names like:

  • alloy.mpac-obs.{env}.local:4317
  • prometheus.mpac-obs.{env}.local:9090

Prerequisites

  • AWS CLI configured with appropriate credentials
  • Access to the deployment IAM role
  • The mpac-infra repository (contains CloudFormation templates)

Deploy via GitHub Actions

  1. Go to Actions -> CD - Deploy Observability Stack
  2. Click Run workflow
  3. Select the target environment (dev / staging / prod)
  4. Optionally enable Dry run to validate without deploying
  5. Click Run workflow

Deploy Manually

bash
# Ensure AWS credentials are configured
aws sts get-caller-identity

# Deploy infrastructure (from mpac-infra repo)
cd mpac-infra/observability
make deploy ENV=dev

# Update configs (push to S3)
aws s3 sync stack/alloy/ s3://mpac-obs-configs-dev/alloy/
aws s3 sync stack/prometheus/ s3://mpac-obs-configs-dev/prometheus/
aws s3 sync stack/loki/ s3://mpac-obs-configs-dev/loki/
aws s3 sync stack/tempo/ s3://mpac-obs-configs-dev/tempo/

# Trigger ECS service update
aws ecs update-service \
  --cluster mpac-obs-dev \
  --service alloy \
  --force-new-deployment

Environment Configuration

ParameterDevStagingProduction
Prometheus retention14 days14 days30 days
Loki retention14 days14 days30 days
Tempo retention14 days14 days30 days
Alloy batch size100010002000
Grafana replicas112

Rollback

To roll back to a previous configuration:

bash
# List previous task definition revisions
aws ecs list-task-definitions --family-prefix mpac-obs-alloy

# Update service to use previous revision
aws ecs update-service \
  --cluster mpac-obs-dev \
  --service alloy \
  --task-definition mpac-obs-alloy:<previous-revision>

Troubleshooting

Alloy Not Receiving Data

  1. Check Alloy is running: curl http://localhost:12345/-/ready
  2. Verify OTLP ports are open: nc -zv localhost 4317
  3. Check Alloy logs: docker compose logs alloy
  4. Verify the bearer token matches between sender and receiver

Grafana Shows No Data

  1. Check datasource connectivity in Grafana: Settings -> Data Sources -> Test
  2. Verify Prometheus has data: curl http://localhost:9090/api/v1/targets
  3. Verify Loki has data: curl http://localhost:3100/loki/api/v1/labels
  4. Check that services are sending telemetry (look for OTLP connection logs)

High Memory Usage

  • Loki: Reduce max_size_mb in embedded cache config
  • Prometheus: Reduce storage.tsdb.retention.time
  • Alloy: Reduce send_batch_size in batch processor

MPAC — MP-Solution Advanced Cloud Service