mpac-obs Observability Architecture
Overview
mpac-obs is the centralized observability stack for the MPAC platform, providing:
- Logs - Aggregated via Loki
- Metrics - Stored in Prometheus
- Traces - Distributed tracing via Tempo
Deployment Topology
Where is Alloy Deployed?
Alloy runs inside the observability cluster (mpac-obs), NOT as sidecars in application clusters.
+-----------------------------------------------------------------------------+
| APPLICATION CLUSTERS |
+---------------------+---------------------+---------------------------------+
| mpac-pgw cluster | mpac-smartpos cluster | other clusters |
| +-----------+ | +-----------+ | |
| | mpac-pgw | | |svc-portal | | |
| | :8080 | | | :8002 | | |
| +-----+-----+ | +-----+-----+ | |
| | | | | |
| +-----+-----+ | +-----+-----+ | |
| |svc-smarttab| | | | | |
| | :8081 | | +-----------+ | |
| +-----+-----+ | | |
+---------+-----------+----------+----------+---------------------------------+
| |
| OTLP (4317/4318) |
| |
v v
+-----------------------------------------------------------------------------+
| MPAC-OBS CLUSTER (Observability) |
| |
| +-------------------------------------------------------------------+ |
| | ALLOY (Collector) | |
| | :4317 (gRPC) :4318 (HTTP) | |
| | +-----------+ +-----------+ +-----------+ | |
| | | OTLP | | Batch | | Export | | |
| | | Receiver |--->| Processor |--->| Router | | |
| | +-----------+ +-----------+ +-----+-----+ | |
| +------------------------------------------+------------------------+ |
| | |
| +------------------------------+-------------------+ |
| | | | |
| v v v |
| +-------------------+ +-------------------+ +-----------------+ |
| | PROMETHEUS | | LOKI | | TEMPO | |
| | (Metrics) | | (Logs) | | (Traces) | |
| | :9090 | | :3100 | | :3200 | |
| +---------+---------+ +---------+---------+ +--------+--------+ |
| | | | |
| +-------------------------+------------------------+ |
| | |
| v |
| +-------------------+ |
| | GRAFANA | |
| | (Dashboards) | |
| | :3000 | |
| +-------------------+ |
+-----------------------------------------------------------------------------+Why Centralized Alloy (Not Sidecars)?
| Approach | Pros | Cons |
|---|---|---|
| Centralized (Current) | Single config, easier management, lower resource overhead | Single point of failure, network hop |
| Sidecar per service | Resilient, lower latency | Config sprawl, higher resource usage |
For MPAC platform scale (~15,000 RPS), centralized Alloy is sufficient. Consider sidecars only if:
- Network latency between clusters exceeds 10ms
- You need per-service buffering for reliability
- Service teams need independent collector configs
Data Flow
1. Traces (OTLP -> Tempo)
Application (OpenTelemetry SDK)
|
| OTLP gRPC/HTTP
v
Alloy:4317/4318
|
| otelcol.receiver.otlp
| otelcol.processor.batch (5s, 1000 spans)
| otelcol.exporter.otlp
v
Tempo:4317
|
| stores traces
v
Grafana (TraceQL queries via :3200)2. Metrics (OTLP -> Prometheus)
Application (OpenTelemetry SDK)
|
| OTLP metrics
v
Alloy:4317/4318
|
| otelcol.receiver.otlp
| otelcol.processor.batch
| otelcol.exporter.prometheus
| prometheus.remote_write
v
Prometheus:9090/api/v1/write
|
| stores metrics (TSDB, 14d retention)
v
Grafana (PromQL queries)3. Logs (Two Paths)
Path A: OTLP Logs (from applications)
Application (OpenTelemetry SDK)
|
| OTLP logs
v
Alloy:4317/4318
|
| otelcol.receiver.otlp
| otelcol.exporter.loki
v
Loki:3100/loki/api/v1/pushPath B: Docker Logs (container stdout/stderr) - Local Only
Docker Container (stdout/stderr)
|
| Docker socket
v
Alloy (discovery.docker)
|
| loki.source.docker
| loki.process.json_logs (parse JSON, extract fields)
v
Loki:3100/loki/api/v1/pushService Discovery
Local Development (Docker Compose)
Services communicate via Docker network DNS:
prometheus:9090loki:3100tempo:4317alloy:4317
Alloy discovers containers via Docker socket mount:
yaml
volumes:
- /var/run/docker.sock:/var/run/docker.sock:roAWS Production (ECS Fargate)
Services use AWS Cloud Map for DNS:
prometheus.mpac-obs.{env}.local:9090loki.mpac-obs.{env}.local:3100tempo.mpac-obs.{env}.local:4317alloy.mpac-obs.{env}.local:4317
Application services connect to Alloy via:
OTEL_EXPORTER_OTLP_ENDPOINT=alloy.mpac-obs.{env}.local:4317Port Reference
| Service | Port | Protocol | Purpose |
|---|---|---|---|
| Alloy | 4317 | gRPC | OTLP receiver (apps send here) |
| Alloy | 4318 | HTTP | OTLP receiver (apps send here) |
| Alloy | 12345 | HTTP | Alloy UI/health |
| Prometheus | 9090 | HTTP | Metrics API, remote write |
| Loki | 3100 | HTTP | Log push/query API |
| Tempo | 3200 | HTTP | Trace query API (Grafana) |
| Tempo | 4317 | gRPC | OTLP receiver (internal, from Alloy) |
| Grafana | 3000 | HTTP | Dashboard UI |
Configuration Files
mpac-obs/
├── stack/
│ ├── alloy/
│ │ └── config.alloy # Alloy pipeline configuration
│ ├── prometheus/
│ │ └── prometheus.yml # Prometheus scrape config
│ ├── loki/
│ │ └── config.yaml # Loki storage/ingestion config
│ ├── tempo/
│ │ └── config.yaml # Tempo storage/receiver config
│ ├── grafana/
│ │ ├── provisioning/
│ │ │ └── datasources/ # Auto-configured datasources
│ │ └── dashboards/ # Pre-built dashboards
│ └── docker-compose.yml # Local stack definition
└── ...Application Integration
Go Services (mpac-pgw, svc-smarttab)
go
import "go.opentelemetry.io/otel"
// Configure OTLP exporter
exporter, _ := otlptracegrpc.New(ctx,
otlptracegrpc.WithEndpoint("localhost:4317"),
otlptracegrpc.WithInsecure(),
)Environment variables:
bash
OTEL_ENABLED=true
OTEL_EXPORTER_OTLP_ENDPOINT=localhost:4317 # Local
OTEL_EXPORTER_OTLP_ENDPOINT=alloy.mpac-obs.dev.local:4317 # AWS
OTEL_SERVICE_NAME=mpac-pgw.backendPython Services (svc-portal)
python
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter
exporter = OTLPSpanExporter(
endpoint="localhost:4317",
insecure=True,
)Correlation: Traces <-> Logs <-> Metrics
All telemetry is correlated via:
- trace_id / span_id - Links logs to specific traces
- service label - Groups all telemetry by service name
- request_id / corr_id - Application-level correlation
Alloy extracts these fields from JSON logs:
trace_id, span_id, request_id, corr_id, merchant_id, store_idIn Grafana:
- Click trace -> "Logs for this trace"
- Click log -> "View trace"
- Dashboard variables filter all panels by service