mgmt/docs/vault-consul-best-practices.md

5.4 KiB

Vault与Consul集成最佳实践

1. 架构设计

1.1 高可用架构

  • Vault集群: 3个节点 (1个Leader + 2个Follower)
  • Consul集群: 3个节点 (1个Leader + 2个Follower)
  • 网络: Tailscale安全网络
  • 存储: Consul作为Vault的存储后端

1.2 节点分布

Vault节点:
  - ch4.tailnet-68f9.ts.net:8200 (Leader)
  - ash3c.tailnet-68f9.ts.net:8200 (Follower)
  - warden.tailnet-68f9.ts.net:8200 (Follower)

Consul节点:
  - ch4.tailnet-68f9.ts.net:8500 (Leader)
  - ash3c.tailnet-68f9.ts.net:8500 (Follower)
  - warden.tailnet-68f9.ts.net:8500 (Follower)

2. Vault配置最佳实践

2.1 存储后端配置

storage "consul" {
  address = "127.0.0.1:8500"
  path    = "vault/"
  
  # 高可用配置
  datacenter = "dc1"
  service = "vault"
  service_tags = "vault-server"
  
  # 会话配置
  session_ttl = "15s"
  lock_wait_time = "15s"
  
  # 一致性配置
  consistency_mode = "strong"
  
  # 故障转移配置
  max_parallel = 128
  disable_registration = false
}

2.2 监听器配置

listener "tcp" {
  address = "0.0.0.0:8200"
  
  # 生产环境启用TLS
  tls_cert_file = "/opt/vault/tls/vault.crt"
  tls_key_file  = "/opt/vault/tls/vault.key"
  tls_min_version = "1.2"
}

# 集群监听器
listener "tcp" {
  address = "0.0.0.0:8201"
  purpose = "cluster"
  
  tls_cert_file = "/opt/vault/tls/vault.crt"
  tls_key_file  = "/opt/vault/tls/vault.key"
}

2.3 集群配置

# API地址 - 使用Tailscale网络
api_addr = "https://{{ ansible_host }}:8200"

# 集群地址 - 使用Tailscale网络
cluster_addr = "https://{{ ansible_host }}:8201"

# 集群名称
cluster_name = "vault-cluster"

# 禁用mlock (生产环境应启用)
disable_mlock = false

# 日志配置
log_level = "INFO"
log_format = "json"

3. Consul配置最佳实践

3.1 服务注册配置

services {
  name = "vault"
  tags = ["vault-server", "secrets"]
  port = 8200
  
  check {
    name = "vault-health"
    http = "http://127.0.0.1:8200/v1/sys/health"
    interval = "10s"
    timeout = "3s"
  }
}

3.2 ACL配置

acl {
  enabled = true
  default_policy = "deny"
  enable_token_persistence = true
  
  # Vault服务权限
  tokens {
    default = "{{ vault_consul_token }}"
  }
}

4. 安全最佳实践

4.1 TLS配置

  • 所有Vault节点间通信使用TLS
  • Consul节点间通信使用TLS
  • 客户端到Vault通信使用TLS

4.2 认证配置

# 启用多种认证方法
auth {
  enabled = true
  
  # AppRole认证
  approle {
    enabled = true
  }
  
  # LDAP认证
  ldap {
    enabled = true
    url = "ldap://authentik.tailnet-68f9.ts.net:389"
    userdn = "ou=users,dc=authentik,dc=local"
    groupdn = "ou=groups,dc=authentik,dc=local"
  }
  
  # OIDC认证
  oidc {
    enabled = true
    oidc_discovery_url = "https://authentik1.git-4ta.live/application/o/vault/"
  }
}

5. 监控和审计

5.1 审计日志

audit {
  enabled = true
  
  # 文件审计
  file {
    path = "/opt/vault/logs/audit.log"
    format = "json"
  }
  
  # Syslog审计
  syslog {
    facility = "AUTH"
    tag = "vault"
  }
}

5.2 遥测配置

telemetry {
  prometheus_retention_time = "30s"
  disable_hostname = false
  
  # 指标配置
  metrics {
    enabled = true
    prefix = "vault"
  }
}

6. 备份和恢复

6.1 自动备份脚本

#!/bin/bash
# /opt/vault/scripts/backup.sh

VAULT_ADDR="https://vault.git-4ta.live"
VAULT_TOKEN="$(cat /opt/vault/token)"

# 创建快照
vault operator raft snapshot save /opt/vault/backups/vault-$(date +%Y%m%d-%H%M%S).snapshot

# 清理旧备份 (保留7天)
find /opt/vault/backups -name "vault-*.snapshot" -mtime +7 -delete

6.2 Consul快照

#!/bin/bash
# /opt/consul/scripts/backup.sh

CONSUL_ADDR="http://127.0.0.1:8500"

# 创建Consul快照
consul snapshot save /opt/consul/backups/consul-$(date +%Y%m%d-%H%M%S).snapshot

7. 故障转移和灾难恢复

7.1 自动故障转移

  • Vault使用Raft协议自动选举新Leader
  • Consul使用Raft协议自动选举新Leader
  • 客户端自动重连到新的Leader节点

7.2 灾难恢复流程

  1. 停止所有Vault节点
  2. 从Consul恢复数据
  3. 启动Vault集群
  4. 验证服务状态

8. 性能优化

8.1 缓存配置

cache {
  enabled = true
  size = 1000
  persist {
    type = "kubernetes"
    path = "/opt/vault/cache"
  }
}

8.2 连接池配置

storage "consul" {
  # 连接池配置
  max_parallel = 128
  max_requests_per_second = 100
}

9. 部署检查清单

9.1 部署前检查

  • Consul集群健康
  • 网络连通性测试
  • TLS证书配置
  • 防火墙规则配置
  • 存储空间检查

9.2 部署后验证

  • Vault集群状态检查
  • 服务注册验证
  • 认证功能测试
  • 备份功能测试
  • 监控指标验证

10. 常见问题和解决方案

10.1 常见问题

  1. Vault无法连接到Consul

    • 检查网络连通性
    • 验证Consul服务状态
    • 检查ACL权限
  2. 集群分裂问题

    • 检查网络分区
    • 验证Raft日志一致性
    • 执行灾难恢复流程
  3. 性能问题

    • 调整连接池大小
    • 启用缓存
    • 优化网络配置

10.2 故障排除命令

# 检查Vault状态
vault status

# 检查Consul成员
consul members

# 检查服务注册
consul catalog services

# 检查Vault日志
journalctl -u vault -f

# 检查Consul日志
journalctl -u consul -f