Skip to content

mpac-obs Observability Architecture

Overview

mpac-obs is the centralized observability stack for the MPAC platform, providing:

  • Logs - Aggregated via Loki
  • Metrics - Stored in Prometheus
  • Traces - Distributed tracing via Tempo

Deployment Topology

Where is Alloy Deployed?

Alloy runs inside the observability cluster (mpac-obs), NOT as sidecars in application clusters.

+-----------------------------------------------------------------------------+
|                        APPLICATION CLUSTERS                                  |
+---------------------+---------------------+---------------------------------+
|   mpac-pgw cluster   |   mpac-smartpos cluster    |      other clusters             |
|   +-----------+     |   +-----------+     |                                 |
|   |  mpac-pgw  |     |   |svc-portal |     |                                 |
|   |   :8080   |     |   |  :8002    |     |                                 |
|   +-----+-----+     |   +-----+-----+     |                                 |
|         |           |         |           |                                 |
|   +-----+-----+     |   +-----+-----+     |                                 |
|   |svc-smarttab|    |   |             |   |                                 |
|   |   :8081   |     |   +-----------+     |                                 |
|   +-----+-----+     |                     |                                 |
+---------+-----------+----------+----------+---------------------------------+
          |                      |
          |   OTLP (4317/4318)   |
          |                      |
          v                      v
+-----------------------------------------------------------------------------+
|                    MPAC-OBS CLUSTER (Observability)                       |
|                                                                              |
|  +-------------------------------------------------------------------+      |
|  |                       ALLOY (Collector)                            |      |
|  |                       :4317 (gRPC) :4318 (HTTP)                    |      |
|  |  +-----------+    +-----------+    +-----------+                   |      |
|  |  |   OTLP    |    |   Batch   |    |  Export   |                   |      |
|  |  | Receiver  |--->| Processor |--->|  Router   |                   |      |
|  |  +-----------+    +-----------+    +-----+-----+                   |      |
|  +------------------------------------------+------------------------+      |
|                                             |                                |
|              +------------------------------+-------------------+            |
|              |                              |                   |            |
|              v                              v                   v            |
|  +-------------------+     +-------------------+     +-----------------+     |
|  |   PROMETHEUS      |     |      LOKI         |     |     TEMPO       |     |
|  |   (Metrics)       |     |     (Logs)        |     |    (Traces)     |     |
|  |   :9090           |     |     :3100         |     |    :3200        |     |
|  +---------+---------+     +---------+---------+     +--------+--------+     |
|            |                         |                        |              |
|            +-------------------------+------------------------+              |
|                                      |                                       |
|                                      v                                       |
|                          +-------------------+                               |
|                          |     GRAFANA       |                               |
|                          |   (Dashboards)    |                               |
|                          |     :3000         |                               |
|                          +-------------------+                               |
+-----------------------------------------------------------------------------+

Why Centralized Alloy (Not Sidecars)?

ApproachProsCons
Centralized (Current)Single config, easier management, lower resource overheadSingle point of failure, network hop
Sidecar per serviceResilient, lower latencyConfig sprawl, higher resource usage

For MPAC platform scale (~15,000 RPS), centralized Alloy is sufficient. Consider sidecars only if:

  • Network latency between clusters exceeds 10ms
  • You need per-service buffering for reliability
  • Service teams need independent collector configs

Data Flow

1. Traces (OTLP -> Tempo)

Application (OpenTelemetry SDK)
    |
    | OTLP gRPC/HTTP
    v
Alloy:4317/4318
    |
    | otelcol.receiver.otlp
    | otelcol.processor.batch (5s, 1000 spans)
    | otelcol.exporter.otlp
    v
Tempo:4317
    |
    | stores traces
    v
Grafana (TraceQL queries via :3200)

2. Metrics (OTLP -> Prometheus)

Application (OpenTelemetry SDK)
    |
    | OTLP metrics
    v
Alloy:4317/4318
    |
    | otelcol.receiver.otlp
    | otelcol.processor.batch
    | otelcol.exporter.prometheus
    | prometheus.remote_write
    v
Prometheus:9090/api/v1/write
    |
    | stores metrics (TSDB, 14d retention)
    v
Grafana (PromQL queries)

3. Logs (Two Paths)

Path A: OTLP Logs (from applications)

Application (OpenTelemetry SDK)
    |
    | OTLP logs
    v
Alloy:4317/4318
    |
    | otelcol.receiver.otlp
    | otelcol.exporter.loki
    v
Loki:3100/loki/api/v1/push

Path B: Docker Logs (container stdout/stderr) - Local Only

Docker Container (stdout/stderr)
    |
    | Docker socket
    v
Alloy (discovery.docker)
    |
    | loki.source.docker
    | loki.process.json_logs (parse JSON, extract fields)
    v
Loki:3100/loki/api/v1/push

Service Discovery

Local Development (Docker Compose)

Services communicate via Docker network DNS:

  • prometheus:9090
  • loki:3100
  • tempo:4317
  • alloy:4317

Alloy discovers containers via Docker socket mount:

yaml
volumes:
  - /var/run/docker.sock:/var/run/docker.sock:ro

AWS Production (ECS Fargate)

Services use AWS Cloud Map for DNS:

  • prometheus.mpac-obs.{env}.local:9090
  • loki.mpac-obs.{env}.local:3100
  • tempo.mpac-obs.{env}.local:4317
  • alloy.mpac-obs.{env}.local:4317

Application services connect to Alloy via:

OTEL_EXPORTER_OTLP_ENDPOINT=alloy.mpac-obs.{env}.local:4317

Port Reference

ServicePortProtocolPurpose
Alloy4317gRPCOTLP receiver (apps send here)
Alloy4318HTTPOTLP receiver (apps send here)
Alloy12345HTTPAlloy UI/health
Prometheus9090HTTPMetrics API, remote write
Loki3100HTTPLog push/query API
Tempo3200HTTPTrace query API (Grafana)
Tempo4317gRPCOTLP receiver (internal, from Alloy)
Grafana3000HTTPDashboard UI

Configuration Files

mpac-obs/
├── stack/
│   ├── alloy/
│   │   └── config.alloy          # Alloy pipeline configuration
│   ├── prometheus/
│   │   └── prometheus.yml        # Prometheus scrape config
│   ├── loki/
│   │   └── config.yaml           # Loki storage/ingestion config
│   ├── tempo/
│   │   └── config.yaml           # Tempo storage/receiver config
│   ├── grafana/
│   │   ├── provisioning/
│   │   │   └── datasources/      # Auto-configured datasources
│   │   └── dashboards/           # Pre-built dashboards
│   └── docker-compose.yml        # Local stack definition
└── ...

Application Integration

Go Services (mpac-pgw, svc-smarttab)

go
import "go.opentelemetry.io/otel"

// Configure OTLP exporter
exporter, _ := otlptracegrpc.New(ctx,
    otlptracegrpc.WithEndpoint("localhost:4317"),
    otlptracegrpc.WithInsecure(),
)

Environment variables:

bash
OTEL_ENABLED=true
OTEL_EXPORTER_OTLP_ENDPOINT=localhost:4317  # Local
OTEL_EXPORTER_OTLP_ENDPOINT=alloy.mpac-obs.dev.local:4317  # AWS
OTEL_SERVICE_NAME=mpac-pgw.backend

Python Services (svc-portal)

python
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter

exporter = OTLPSpanExporter(
    endpoint="localhost:4317",
    insecure=True,
)

Correlation: Traces <-> Logs <-> Metrics

All telemetry is correlated via:

  1. trace_id / span_id - Links logs to specific traces
  2. service label - Groups all telemetry by service name
  3. request_id / corr_id - Application-level correlation

Alloy extracts these fields from JSON logs:

trace_id, span_id, request_id, corr_id, merchant_id, store_id

In Grafana:

  • Click trace -> "Logs for this trace"
  • Click log -> "View trace"
  • Dashboard variables filter all panels by service

MPAC — MP-Solution Advanced Cloud Service