5.4 KiB
5.4 KiB
Vault与Consul集成最佳实践
1. 架构设计
1.1 高可用架构
- Vault集群: 3个节点 (1个Leader + 2个Follower)
- Consul集群: 3个节点 (1个Leader + 2个Follower)
- 网络: Tailscale安全网络
- 存储: Consul作为Vault的存储后端
1.2 节点分布
Vault节点:
- ch4.tailnet-68f9.ts.net:8200 (Leader)
- ash3c.tailnet-68f9.ts.net:8200 (Follower)
- warden.tailnet-68f9.ts.net:8200 (Follower)
Consul节点:
- ch4.tailnet-68f9.ts.net:8500 (Leader)
- ash3c.tailnet-68f9.ts.net:8500 (Follower)
- warden.tailnet-68f9.ts.net:8500 (Follower)
2. Vault配置最佳实践
2.1 存储后端配置
storage "consul" {
address = "127.0.0.1:8500"
path = "vault/"
# 高可用配置
datacenter = "dc1"
service = "vault"
service_tags = "vault-server"
# 会话配置
session_ttl = "15s"
lock_wait_time = "15s"
# 一致性配置
consistency_mode = "strong"
# 故障转移配置
max_parallel = 128
disable_registration = false
}
2.2 监听器配置
listener "tcp" {
address = "0.0.0.0:8200"
# 生产环境启用TLS
tls_cert_file = "/opt/vault/tls/vault.crt"
tls_key_file = "/opt/vault/tls/vault.key"
tls_min_version = "1.2"
}
# 集群监听器
listener "tcp" {
address = "0.0.0.0:8201"
purpose = "cluster"
tls_cert_file = "/opt/vault/tls/vault.crt"
tls_key_file = "/opt/vault/tls/vault.key"
}
2.3 集群配置
# API地址 - 使用Tailscale网络
api_addr = "https://{{ ansible_host }}:8200"
# 集群地址 - 使用Tailscale网络
cluster_addr = "https://{{ ansible_host }}:8201"
# 集群名称
cluster_name = "vault-cluster"
# 禁用mlock (生产环境应启用)
disable_mlock = false
# 日志配置
log_level = "INFO"
log_format = "json"
3. Consul配置最佳实践
3.1 服务注册配置
services {
name = "vault"
tags = ["vault-server", "secrets"]
port = 8200
check {
name = "vault-health"
http = "http://127.0.0.1:8200/v1/sys/health"
interval = "10s"
timeout = "3s"
}
}
3.2 ACL配置
acl {
enabled = true
default_policy = "deny"
enable_token_persistence = true
# Vault服务权限
tokens {
default = "{{ vault_consul_token }}"
}
}
4. 安全最佳实践
4.1 TLS配置
- 所有Vault节点间通信使用TLS
- Consul节点间通信使用TLS
- 客户端到Vault通信使用TLS
4.2 认证配置
# 启用多种认证方法
auth {
enabled = true
# AppRole认证
approle {
enabled = true
}
# LDAP认证
ldap {
enabled = true
url = "ldap://authentik.tailnet-68f9.ts.net:389"
userdn = "ou=users,dc=authentik,dc=local"
groupdn = "ou=groups,dc=authentik,dc=local"
}
# OIDC认证
oidc {
enabled = true
oidc_discovery_url = "https://authentik1.git-4ta.live/application/o/vault/"
}
}
5. 监控和审计
5.1 审计日志
audit {
enabled = true
# 文件审计
file {
path = "/opt/vault/logs/audit.log"
format = "json"
}
# Syslog审计
syslog {
facility = "AUTH"
tag = "vault"
}
}
5.2 遥测配置
telemetry {
prometheus_retention_time = "30s"
disable_hostname = false
# 指标配置
metrics {
enabled = true
prefix = "vault"
}
}
6. 备份和恢复
6.1 自动备份脚本
#!/bin/bash
# /opt/vault/scripts/backup.sh
VAULT_ADDR="https://vault.git-4ta.live"
VAULT_TOKEN="$(cat /opt/vault/token)"
# 创建快照
vault operator raft snapshot save /opt/vault/backups/vault-$(date +%Y%m%d-%H%M%S).snapshot
# 清理旧备份 (保留7天)
find /opt/vault/backups -name "vault-*.snapshot" -mtime +7 -delete
6.2 Consul快照
#!/bin/bash
# /opt/consul/scripts/backup.sh
CONSUL_ADDR="http://127.0.0.1:8500"
# 创建Consul快照
consul snapshot save /opt/consul/backups/consul-$(date +%Y%m%d-%H%M%S).snapshot
7. 故障转移和灾难恢复
7.1 自动故障转移
- Vault使用Raft协议自动选举新Leader
- Consul使用Raft协议自动选举新Leader
- 客户端自动重连到新的Leader节点
7.2 灾难恢复流程
- 停止所有Vault节点
- 从Consul恢复数据
- 启动Vault集群
- 验证服务状态
8. 性能优化
8.1 缓存配置
cache {
enabled = true
size = 1000
persist {
type = "kubernetes"
path = "/opt/vault/cache"
}
}
8.2 连接池配置
storage "consul" {
# 连接池配置
max_parallel = 128
max_requests_per_second = 100
}
9. 部署检查清单
9.1 部署前检查
- Consul集群健康
- 网络连通性测试
- TLS证书配置
- 防火墙规则配置
- 存储空间检查
9.2 部署后验证
- Vault集群状态检查
- 服务注册验证
- 认证功能测试
- 备份功能测试
- 监控指标验证
10. 常见问题和解决方案
10.1 常见问题
-
Vault无法连接到Consul
- 检查网络连通性
- 验证Consul服务状态
- 检查ACL权限
-
集群分裂问题
- 检查网络分区
- 验证Raft日志一致性
- 执行灾难恢复流程
-
性能问题
- 调整连接池大小
- 启用缓存
- 优化网络配置
10.2 故障排除命令
# 检查Vault状态
vault status
# 检查Consul成员
consul members
# 检查服务注册
consul catalog services
# 检查Vault日志
journalctl -u vault -f
# 检查Consul日志
journalctl -u consul -f