容器化与编排(Docker / k8s / Helm)
从本地 Compose 到生产 K8s Operator,配套 health-check / PVC / 备份 CronJob
选哪个?
| 场景 | 推荐 |
|---|---|
| 本地开发 / CI | Docker 单容器 |
| 一个小项目部 server | Docker Compose |
| 中型团队多服务 | Compose + Caddy/Traefik 或托管 RDS |
| 跨环境一致、有 K8s | MariaDB Operator + Helm |
| 完全托管想省心 | 不要自管,去 云 RDS |
警告:在 K8s 自管数据库比自管无状态服务难一个数量级。备份、HA、网络分区、卷迁移每个都是坑。如果团队 < 5 人,强烈建议托管。
Docker(开发用)
docker run -d --name mariadb-dev \
-e MARIADB_ROOT_PASSWORD=dev \
-e MARIADB_DATABASE=app \
-p 3306:3306 \
-v mariadb-data:/var/lib/mysql \
mariadb:11.4用 tmpfs 跑测试(10× 速度)
docker run -d --rm --name mariadb-test \
--tmpfs /var/lib/mysql:rw,size=2g \
-e MARIADB_ROOT_PASSWORD=test \
-e MARIADB_DATABASE=test \
-p 3307:3306 \
mariadb:11.4自定义 my.cnf
# my.cnf
[mariadb]
innodb_buffer_pool_size=1G
innodb_redo_log_capacity=512M
innodb_flush_log_at_trx_commit=1
slow_query_log=1
long_query_time=1
# 挂载
docker run -d --name mariadb-dev \
-v $PWD/my.cnf:/etc/mysql/conf.d/custom.cnf:ro \
-v mariadb-data:/var/lib/mysql \
mariadb:11.4Docker Compose
# docker-compose.yml
services:
mariadb:
image: mariadb:11.4
restart: unless-stopped
environment:
MARIADB_ROOT_PASSWORD: ${ROOT_PASS}
MARIADB_DATABASE: app
MARIADB_USER: app
MARIADB_PASSWORD: ${APP_PASS}
volumes:
- mariadb-data:/var/lib/mysql
- ./my.cnf:/etc/mysql/conf.d/custom.cnf:ro
ports:
- "127.0.0.1:3306:3306" # 只绑本机
healthcheck:
test: ["CMD", "mariadb-admin", "ping", "-h", "localhost", "-u", "root", "-p${ROOT_PASS}"]
interval: 10s
timeout: 5s
retries: 5
start_period: 30s
app:
image: yourapp:latest
depends_on:
mariadb:
condition: service_healthy
environment:
DATABASE_URL: mysql://app:${APP_PASS}@mariadb:3306/app?charset=utf8mb4
volumes:
mariadb-data:添加自动备份
backup:
image: mariadb:11.4
depends_on:
mariadb:
condition: service_healthy
volumes:
- ./backups:/backups
environment:
MYSQL_PWD: ${ROOT_PASS}
entrypoint: ["sh", "-c"]
command:
- |
while true; do
ts=$$(date +%F-%H%M)
mariadb-dump -h mariadb -u root --single-transaction --routines --triggers \
--all-databases | gzip > /backups/dump-$$ts.sql.gz
find /backups -name 'dump-*.sql.gz' -mtime +7 -delete
sleep 86400
doneKubernetes(生产)
两个主流方案:
| 方案 | 维护方 | 复杂度 |
|---|---|---|
| mariadb-operator | mmontes / 社区 | 简单 |
| Bitnami Charts | Bitnami | 中等 |
| KubeDB | AppsCode (商业) | 复杂 |
mariadb-operator 入门
helm repo add mariadb-operator https://helm.mariadb.com/mariadb-operator
helm install mariadb-operator mariadb-operator/mariadb-operator \
--namespace mariadb-operator --create-namespace定义一个 MariaDB CR:
apiVersion: k8s.mariadb.com/v1alpha1
kind: MariaDB
metadata:
name: mariadb-prod
namespace: app
spec:
rootPasswordSecretKeyRef:
name: mariadb-root
key: password
database: app
username: app
passwordSecretKeyRef:
name: mariadb-app
key: password
image: mariadb:11.4
port: 3306
replicas: 3 # 1 主 2 从
replication:
enabled: true
primary:
podIndex: 0
automaticFailover: true
replica:
waitPoint: AfterSync
kind: AfterSync
storage:
size: 100Gi
storageClassName: gp3
myCnf: |
[mariadb]
bind-address=*
default_storage_engine=InnoDB
innodb_buffer_pool_size=4G
innodb_flush_log_at_trx_commit=1
sync_binlog=1
max_connections=500
slow_query_log=1
long_query_time=1
binlog_format=ROW
log-bin
resources:
requests:
memory: 6Gi
cpu: 2
limits:
memory: 8Gi
cpu: 4
metrics:
enabled: true # 自带 exporter + ServiceMonitor
service:
type: ClusterIP
primaryService:
type: ClusterIP自动备份 CR
apiVersion: k8s.mariadb.com/v1alpha1
kind: Backup
metadata:
name: nightly-backup
namespace: app
spec:
mariaDbRef: { name: mariadb-prod }
schedule:
cron: "0 3 * * *" # 每日 3am
suspend: false
maxRetention: 720h # 30 天
storage:
s3:
bucket: my-backups
prefix: mariadb-prod
endpoint: s3.amazonaws.com
region: us-east-1
accessKeyIdSecretKeyRef: { name: s3-creds, key: access }
secretAccessKeySecretKeyRef: { name: s3-creds, key: secret }Restore CR(一键恢复)
apiVersion: k8s.mariadb.com/v1alpha1
kind: Restore
metadata:
name: restore-from-2026-05-17
spec:
mariaDbRef: { name: mariadb-restore-target }
s3:
bucket: my-backups
key: mariadb-prod/2026-05-17/dump.sql.gzBitnami Helm Chart
helm install mariadb bitnami/mariadb \
--set auth.rootPassword=$ROOT_PASS \
--set auth.database=app \
--set primary.persistence.size=100Gi \
--set primary.resources.requests.memory=4Gi \
--set primary.configuration=" \
[mariadb]
innodb_buffer_pool_size=3G
slow_query_log=1
" \
--set metrics.enabled=true简单,但 HA 复杂度自管。
健康检查的细节
Liveness vs Readiness
livenessProbe:
exec:
command:
- mariadb-admin
- ping
- -h
- localhost
initialDelaySeconds: 60
periodSeconds: 30
timeoutSeconds: 5
failureThreshold: 3
readinessProbe:
exec:
command:
- sh
- -c
- "mariadb -h localhost -u root -p$MARIADB_ROOT_PASSWORD -e 'SELECT 1' && \
mariadb -h localhost -u root -p$MARIADB_ROOT_PASSWORD -e 'SHOW REPLICA STATUS\\G' | grep -q 'Slave_IO_Running: Yes' || true"
initialDelaySeconds: 30
periodSeconds: 10startup probe(启动慢的库)
startupProbe:
exec:
command: [mariadb-admin, ping, -h, localhost]
failureThreshold: 60
periodSeconds: 5防止 liveness 在启动时杀掉 pod。
存储
| 选项 | 优 | 劣 |
|---|---|---|
| EBS gp3 / Cloud SSD | 数据持久 | 跨节点迁移要 detach/attach |
| Local NVMe | 最快 | 节点挂数据丢 |
| OpenEBS / Longhorn | 跨节点复制 | 性能开销 |
| Rook Ceph | 全功能 | 复杂 |
经验:用云厂商 CSI driver + EBS-style 块存储,配合 MariaDB 复制做 HA,是简单稳定的组合。
暴露给应用
ClusterIP(推荐)
应用 pod 同集群,走内部 DNS:
mariadb-prod.app.svc.cluster.local:3306LoadBalancer
只在需要外部访问时用(如 BI 工具)。强烈建议加 IP 白名单 + 强 TLS。
Ingress / Gateway
MariaDB 是 TCP 协议,不能走 HTTP Ingress。要用 NLB / TCP LoadBalancer。
监控
metrics:
enabled: true操作员自动跑 mysqld_exporter,PrometheusRule 配 ServiceMonitor 即可。
Dashboard:用 Grafana 的 "MySQL Overview" dashboard,MariaDB 兼容。
升级
spec:
image: mariadb:11.4 # 改成 mariadb:11.4.5(patch)
updateStrategy:
type: RollingUpdate跨小版本:rolling,正常。
跨大版本(11.4 → 11.8 → 12.x):要 backup → 新集群 → 切流。不要原地升。
常见坑
- CrashLoopBackOff 第一次启动:probe 启太早。加
startupProbe。 - OOM Killed:limits 设太严,
innodb_buffer_pool_size配过大。limit memory ≥ bufferpool + 1G overhead。 - PVC 跨节点拉不上:CSI 驱动配错或节点 AZ 不一致。把 storageClass 改成 region 内可漫游。
- 复制延迟暴涨:从库节点 CPU/IO 不足;或
binlog_format=STATEMENT触发慢操作。 - 服务发现失败:headless service vs 普通 service 用错。Operator 通常自己处理。