Grafana Mimir Skill
Comprehensive guide for Grafana Mimir - the horizontally scalable, highly available, multi-tenant time series database for long-term Prometheus metrics storage.
What is Mimir?
Mimir is an open-source, horizontally scalable, highly available, multi-tenant long-term storage solution for Prometheus and OpenTelemetry metrics that:
Overcomes Prometheus limitations - Scalability and long-term retention Multi-tenant by default - Built-in tenant isolation via X-Scope-OrgID header Stores data in object storage - S3, GCS, Azure Blob Storage, or Swift 100% Prometheus compatible - PromQL queries, remote write protocol Part of LGTM+ Stack - Logs, Grafana, Traces, Metrics unified observability Architecture Overview Core Components Component Purpose Distributor Validates requests, routes incoming metrics to ingesters via hash ring Ingester Stores time-series data in memory, flushes to object storage Querier Executes PromQL queries from ingesters and store-gateways Query Frontend Caches query results, optimizes and splits queries Query Scheduler Manages per-tenant query queues for fairness Store-Gateway Provides access to historical metric blocks in object storage Compactor Consolidates and optimizes stored metric data blocks Ruler Evaluates recording and alerting rules (optional) Alertmanager Handles alert routing and deduplication (optional) Data Flow
Write Path:
Prometheus/OTel → Distributor → Ingester → Object Storage ↓ Hash Ring (routes by series)
Read Path:
Query → Query Frontend → Query Scheduler → Querier ↓ Ingesters (recent) ↓ Store-Gateway (historical)
Deployment Modes 1. Monolithic Mode (-target=all) All components in single process Best for: Development, testing, small-scale (~1M series) Horizontally scalable by deploying multiple instances Not recommended for large-scale (all components scale together) 2. Microservices Mode (Distributed) - Recommended for Production
Using mimir-distributed Helm chart
distributor: replicas: 3
ingester: replicas: 3 zoneAwareReplication: enabled: true
querier: replicas: 3
queryFrontend: replicas: 2
queryScheduler: replicas: 2
storeGateway: replicas: 3
compactor: replicas: 1
Helm Deployment Add Repository helm repo add grafana https://grafana.github.io/helm-charts helm repo update
Install Distributed Mimir helm install mimir grafana/mimir-distributed \ --namespace monitoring \ --values values.yaml
Pre-Built Values Files File Purpose values.yaml Non-production testing with MinIO small.yaml ~1 million series (single replicas, not HA) large.yaml Production (~10 million series) Production Values Example
Deployment mode
mimir: structuredConfig: multitenancy_enabled: true
Storage configuration
mimir: structuredConfig: common: storage: backend: azure # or s3, gcs azure: account_name: ${AZURE_STORAGE_ACCOUNT} account_key: ${AZURE_STORAGE_KEY} endpoint_suffix: blob.core.windows.net
blocks_storage:
azure:
container_name: mimir-blocks
alertmanager_storage:
azure:
container_name: mimir-alertmanager
ruler_storage:
azure:
container_name: mimir-ruler
Distributor
distributor: replicas: 3 resources: requests: cpu: 1 memory: 2Gi limits: memory: 4Gi
Ingester
ingester: replicas: 3 zoneAwareReplication: enabled: true persistentVolume: enabled: true size: 50Gi resources: requests: cpu: 2 memory: 8Gi limits: memory: 16Gi
Querier
querier: replicas: 3 resources: requests: cpu: 1 memory: 2Gi limits: memory: 8Gi
Query Frontend
query_frontend: replicas: 2 resources: requests: cpu: 500m memory: 1Gi limits: memory: 2Gi
Query Scheduler
query_scheduler: replicas: 2
Store Gateway
store_gateway: replicas: 3 persistentVolume: enabled: true size: 20Gi resources: requests: cpu: 500m memory: 2Gi limits: memory: 8Gi
Compactor
compactor: replicas: 1 persistentVolume: enabled: true size: 50Gi resources: requests: cpu: 1 memory: 4Gi limits: memory: 8Gi
Gateway for external access
gateway: enabledNonEnterprise: true replicas: 2
Monitoring
metaMonitoring: serviceMonitor: enabled: true
Storage Configuration
Critical Requirements
Must create buckets manually - Mimir doesn't create them
Separate buckets required - blocks_storage, alertmanager_storage, ruler_storage cannot share the same bucket+prefix
Azure: Hierarchical namespace must be disabled
Azure Blob Storage
mimir:
structuredConfig:
common:
storage:
backend: azure
azure:
account_name:
blocks_storage:
azure:
container_name: mimir-blocks
alertmanager_storage:
azure:
container_name: mimir-alertmanager
ruler_storage:
azure:
container_name: mimir-ruler
AWS S3 mimir: structuredConfig: common: storage: backend: s3 s3: endpoint: s3.us-east-1.amazonaws.com region: us-east-1 access_key_id: ${AWS_ACCESS_KEY_ID} secret_access_key: ${AWS_SECRET_ACCESS_KEY}
blocks_storage:
s3:
bucket_name: mimir-blocks
alertmanager_storage:
s3:
bucket_name: mimir-alertmanager
ruler_storage:
s3:
bucket_name: mimir-ruler
Google Cloud Storage mimir: structuredConfig: common: storage: backend: gcs gcs: service_account: ${GCS_SERVICE_ACCOUNT_JSON}
blocks_storage:
gcs:
bucket_name: mimir-blocks
alertmanager_storage:
gcs:
bucket_name: mimir-alertmanager
ruler_storage:
gcs:
bucket_name: mimir-ruler
Limits Configuration mimir: structuredConfig: limits: # Ingestion limits ingestion_rate: 25000 # Samples/sec per tenant ingestion_burst_size: 50000 # Burst size max_series_per_metric: 10000 max_series_per_user: 1000000 max_global_series_per_user: 1000000 max_label_names_per_series: 30 max_label_name_length: 1024 max_label_value_length: 2048
# Query limits
max_fetched_series_per_query: 100000
max_fetched_chunks_per_query: 2000000
max_query_lookback: 0 # No limit
max_query_parallelism: 32
# Retention
compactor_blocks_retention_period: 365d # 1 year
# Out-of-order samples
out_of_order_time_window: 5m
Per-Tenant Overrides (Runtime Configuration)
runtime-config.yaml
overrides: tenant1: ingestion_rate: 50000 max_series_per_user: 2000000 compactor_blocks_retention_period: 730d # 2 years tenant2: ingestion_rate: 75000 max_global_series_per_user: 5000000
Enable runtime configuration:
mimir: structuredConfig: runtime_config: file: /etc/mimir/runtime-config.yaml period: 10s
High Availability Configuration HA Tracker for Prometheus Deduplication mimir: structuredConfig: distributor: ha_tracker: enable_ha_tracker: true kvstore: store: memberlist cluster_label: cluster replica_label: replica
memberlist:
join_members:
- mimir-gossip-ring.monitoring.svc.cluster.local:7946
Prometheus Configuration:
global: external_labels: cluster: prom-team1 replica: replica1
remote_write: - url: http://mimir-gateway:8080/api/v1/push headers: X-Scope-OrgID: my-tenant
Zone-Aware Replication ingester: zoneAwareReplication: enabled: true zones: - name: zone-a nodeSelector: topology.kubernetes.io/zone: us-east-1a - name: zone-b nodeSelector: topology.kubernetes.io/zone: us-east-1b - name: zone-c nodeSelector: topology.kubernetes.io/zone: us-east-1c
store_gateway: zoneAwareReplication: enabled: true
Shuffle Sharding
Limits tenant data to a subset of instances for fault isolation:
mimir: structuredConfig: limits: # Write path ingestion_tenant_shard_size: 3
# Read path
max_queriers_per_tenant: 5
store_gateway_tenant_shard_size: 3
OpenTelemetry Integration OTLP Metrics Ingestion
OpenTelemetry Collector Config:
exporters: otlphttp: endpoint: http://mimir-gateway:8080/otlp headers: X-Scope-OrgID: "my-tenant"
service: pipelines: metrics: receivers: [otlp] exporters: [otlphttp]
Exponential Histograms (Experimental) // Go SDK configuration Aggregation: metric.AggregationBase2ExponentialHistogram{ MaxSize: 160, // Maximum buckets MaxScale: 20, // Scale factor }
Key Benefits:
Explicit min/max values (no estimation needed) Better accuracy for extreme percentiles Native OTLP format preservation Multi-Tenancy mimir: structuredConfig: multitenancy_enabled: true no_auth_tenant: anonymous # Used when multitenancy disabled
Query with tenant header:
curl -H "X-Scope-OrgID: tenant-a" \ "http://mimir:8080/prometheus/api/v1/query?query=up"
Tenant ID Constraints:
Max 150 characters Allowed: alphanumeric, ! - _ . * ' ( ) Prohibited: . or .. alone, __mimir_cluster, slashes API Reference Ingestion Endpoints
Prometheus remote write
POST /api/v1/push
OTLP metrics
POST /otlp/v1/metrics
InfluxDB line protocol
POST /api/v1/push/influx/write
Query Endpoints
Instant query
GET,POST /prometheus/api/v1/query?query=
Range query
GET,POST /prometheus/api/v1/query_range?query=
Labels
GET,POST /prometheus/api/v1/labels GET /prometheus/api/v1/label/{name}/values
Series
GET,POST /prometheus/api/v1/series
Exemplars
GET,POST /prometheus/api/v1/query_exemplars
Cardinality
GET,POST /prometheus/api/v1/cardinality/label_names GET,POST /prometheus/api/v1/cardinality/active_series
Administrative Endpoints
Flush ingester data
GET,POST /ingester/flush
Prepare shutdown
GET,POST,DELETE /ingester/prepare-shutdown
Ring status
GET /ingester/ring GET /distributor/ring GET /store-gateway/ring GET /compactor/ring
Tenant stats
GET /distributor/all_user_stats GET /api/v1/user_stats GET /api/v1/user_limits
Health & Config GET /ready GET /metrics GET /config GET /config?mode=diff GET /runtime_config
Azure Identity Configuration User-Assigned Managed Identity
- Create Identity:
az identity create \
--name mimir-identity \
--resource-group
IDENTITY_CLIENT_ID=$(az identity show --name mimir-identity --resource-group
- Assign to Node Pool:
az vmss identity assign \
--resource-group
- Grant Storage Permission:
az role assignment create \
--role "Storage Blob Data Contributor" \
--assignee-object-id $IDENTITY_PRINCIPAL_ID \
--scope /subscriptions//resourceGroups/
- Configure Mimir:
mimir:
structuredConfig:
common:
storage:
azure:
user_assigned_id:
Workload Identity Federation
- Create Federated Credential:
az identity federated-credential create \
--name mimir-federated \
--identity-name mimir-identity \
--resource-group
- Configure Helm Values:
serviceAccount:
annotations:
azure.workload.identity/client-id:
podLabels: azure.workload.identity/use: "true"
Troubleshooting Common Issues
- Container Not Found (Azure)
Create required containers
az storage container create --name mimir-blocks --account-name
- Authorization Failure (Azure)
Verify RBAC assignment
az role assignment list --scope /subscriptions//resourceGroups/
Assign if missing
az role assignment create \
--role "Storage Blob Data Contributor" \
--assignee-object-id
Restart pod to refresh token
kubectl delete pod -n monitoring
- Ingester OOM
ingester: resources: limits: memory: 16Gi # Increase memory
- Query Timeout
mimir: structuredConfig: querier: timeout: 5m max_concurrent: 20
- High Cardinality
mimir: structuredConfig: limits: max_series_per_user: 5000000 max_series_per_metric: 50000
Diagnostic Commands
Check pod status
kubectl get pods -n monitoring -l app.kubernetes.io/name=mimir
Check ingester logs
kubectl logs -n monitoring -l app.kubernetes.io/component=ingester --tail=100
Check distributor logs
kubectl logs -n monitoring -l app.kubernetes.io/component=distributor --tail=100
Verify readiness
kubectl exec -it
Check ring status
kubectl port-forward svc/mimir-distributor 8080:8080 -n monitoring curl http://localhost:8080/distributor/ring
Check configuration
kubectl exec -it
Validate configuration before deployment
mimir -modules -config.file
Key Metrics to Monitor
Ingestion rate per tenant
sum by (user) (rate(cortex_distributor_received_samples_total[5m]))
Series count per tenant
sum by (user) (cortex_ingester_memory_series)
Query latency
histogram_quantile(0.99, sum by (le) (rate(cortex_request_duration_seconds_bucket{route=~"/api/prom/api/v1/query.*"}[5m])))
Compactor status
cortex_compactor_runs_completed_total cortex_compactor_runs_failed_total
Store-gateway block sync
cortex_bucket_store_blocks_loaded
Circuit Breakers (Ingester) mimir: structuredConfig: ingester: push_circuit_breaker: enabled: true request_timeout: 2s failure_threshold_percentage: 10 cooldown_period: 10s read_circuit_breaker: enabled: true request_timeout: 30s
States:
Closed - Normal operation Open - Stops forwarding to failing instances Half-open - Limited trial requests after cooldown External Resources Official Mimir Documentation Mimir Helm Chart Configuration Reference HTTP API Reference Mimir GitHub Repository