163 lines
		
	
	
		
			4.3 KiB
		
	
	
	
		
			Markdown
		
	
	
	
			
		
		
	
	
			163 lines
		
	
	
		
			4.3 KiB
		
	
	
	
		
			Markdown
		
	
	
	
| # Nomad Jobs 备份管理
 | ||
| 
 | ||
| 本文档说明如何管理和恢复 Nomad job 配置的备份。
 | ||
| 
 | ||
| ## 📁 备份存储位置
 | ||
| 
 | ||
| ### 本地备份
 | ||
| - **路径**: `/root/mgmt/backups/nomad-jobs-YYYYMMDD-HHMMSS/`
 | ||
| - **压缩包**: `/root/mgmt/nomad-jobs-backup-YYYYMMDD.tar.gz`
 | ||
| 
 | ||
| ### Consul KV 备份
 | ||
| - **数据**: `backup/nomad-jobs/YYYYMMDD/data`
 | ||
| - **元数据**: `backup/nomad-jobs/YYYYMMDD/metadata`
 | ||
| - **索引**: `backup/nomad-jobs/index`
 | ||
| 
 | ||
| ## 📋 当前备份
 | ||
| 
 | ||
| ### 2025-10-04 备份
 | ||
| - **备份时间**: 2025-10-04 07:44:11
 | ||
| - **备份类型**: 完整 Nomad jobs 配置
 | ||
| - **文件数量**: 25 个 `.nomad` 文件
 | ||
| - **原始大小**: 208KB
 | ||
| - **压缩大小**: 13KB
 | ||
| - **Consul KV 路径**: `backup/nomad-jobs/20251004/data`
 | ||
| 
 | ||
| #### 服务状态
 | ||
| - ✅ **Traefik** (`traefik-cloudflare-v1`) - SSL证书正常
 | ||
| - ✅ **Vault** (`vault-cluster`) - 三节点高可用集群
 | ||
| - ✅ **Waypoint** (`waypoint-server`) - Web UI 可访问
 | ||
| 
 | ||
| #### 域名和证书
 | ||
| - **域名**: `*.git4ta.me`
 | ||
| - **证书**: Let's Encrypt (Cloudflare DNS Challenge)
 | ||
| - **状态**: 所有证书有效
 | ||
| 
 | ||
| ## 🔧 备份管理命令
 | ||
| 
 | ||
| ### 查看备份列表
 | ||
| ```bash
 | ||
| # 查看 Consul KV 中的备份索引
 | ||
| consul kv get backup/nomad-jobs/index
 | ||
| 
 | ||
| # 查看特定备份的元数据
 | ||
| consul kv get backup/nomad-jobs/20251004/metadata
 | ||
| ```
 | ||
| 
 | ||
| ### 恢复备份
 | ||
| ```bash
 | ||
| # 从 Consul KV 恢复备份
 | ||
| consul kv get backup/nomad-jobs/20251004/data > nomad-jobs-backup-20251004.tar.gz
 | ||
| 
 | ||
| # 解压备份
 | ||
| tar -xzf nomad-jobs-backup-20251004.tar.gz
 | ||
| 
 | ||
| # 查看备份内容
 | ||
| ls -la backups/nomad-jobs-20251004-074411/
 | ||
| ```
 | ||
| 
 | ||
| ### 创建新备份
 | ||
| ```bash
 | ||
| # 创建本地备份目录
 | ||
| mkdir -p backups/nomad-jobs-$(date +%Y%m%d-%H%M%S)
 | ||
| 
 | ||
| # 备份当前配置
 | ||
| cp -r components backups/nomad-jobs-$(date +%Y%m%d-%H%M%S)/
 | ||
| cp -r nomad-jobs backups/nomad-jobs-$(date +%Y%m%d-%H%M%S)/
 | ||
| cp waypoint-server.nomad backups/nomad-jobs-$(date +%Y%m%d-%H%M%S)/
 | ||
| 
 | ||
| # 压缩备份
 | ||
| tar -czf nomad-jobs-backup-$(date +%Y%m%d).tar.gz backups/nomad-jobs-$(date +%Y%m%d-*)/
 | ||
| 
 | ||
| # 存储到 Consul KV
 | ||
| consul kv put backup/nomad-jobs/$(date +%Y%m%d)/data @nomad-jobs-backup-$(date +%Y%m%d).tar.gz
 | ||
| ```
 | ||
| 
 | ||
| ## 📊 备份策略
 | ||
| 
 | ||
| ### 备份频率
 | ||
| - **自动备份**: 建议每周一次
 | ||
| - **重要变更前**: 部署新服务或重大配置修改前
 | ||
| - **紧急情况**: 服务出现问题时立即备份当前状态
 | ||
| 
 | ||
| ### 备份内容
 | ||
| - 所有 `.nomad` 文件
 | ||
| - 配置文件模板
 | ||
| - 服务依赖关系
 | ||
| - 网络和存储配置
 | ||
| 
 | ||
| ### 备份验证
 | ||
| ```bash
 | ||
| # 验证备份完整性
 | ||
| tar -tzf nomad-jobs-backup-20251004.tar.gz | wc -l
 | ||
| 
 | ||
| # 检查关键文件
 | ||
| tar -tzf nomad-jobs-backup-20251004.tar.gz | grep -E "(traefik|vault|waypoint)"
 | ||
| ```
 | ||
| 
 | ||
| ## 🚨 恢复流程
 | ||
| 
 | ||
| ### 紧急恢复
 | ||
| 1. **停止所有服务**
 | ||
|    ```bash
 | ||
|    nomad job stop traefik-cloudflare-v1
 | ||
|    nomad job stop vault-cluster
 | ||
|    nomad job stop waypoint-server
 | ||
|    ```
 | ||
| 
 | ||
| 2. **恢复备份**
 | ||
|    ```bash
 | ||
|    consul kv get backup/nomad-jobs/20251004/data > restore.tar.gz
 | ||
|    tar -xzf restore.tar.gz
 | ||
|    ```
 | ||
| 
 | ||
| 3. **重新部署**
 | ||
|    ```bash
 | ||
|    nomad job run backups/nomad-jobs-20251004-074411/components/traefik/jobs/traefik-cloudflare.nomad
 | ||
|    nomad job run backups/nomad-jobs-20251004-074411/nomad-jobs/vault-cluster.nomad
 | ||
|    nomad job run backups/nomad-jobs-20251004-074411/waypoint-server.nomad
 | ||
|    ```
 | ||
| 
 | ||
| ### 部分恢复
 | ||
| ```bash
 | ||
| # 只恢复特定服务
 | ||
| cp backups/nomad-jobs-20251004-074411/components/traefik/jobs/traefik-cloudflare.nomad components/traefik/jobs/
 | ||
| nomad job run components/traefik/jobs/traefik-cloudflare.nomad
 | ||
| ```
 | ||
| 
 | ||
| ## 📝 备份记录
 | ||
| 
 | ||
| | 日期 | 备份类型 | 服务状态 | 大小 | Consul KV 路径 |
 | ||
| |------|----------|----------|------|----------------|
 | ||
| | 2025-10-04 | 完整备份 | 全部运行 | 13KB | `backup/nomad-jobs/20251004/data` |
 | ||
| 
 | ||
| ## ⚠️ 注意事项
 | ||
| 
 | ||
| 1. **证书备份**: SSL证书存储在容器内,重启会丢失
 | ||
| 2. **Consul KV**: 重要配置存储在 Consul KV 中,需要单独备份
 | ||
| 3. **网络配置**: Tailscale 网络配置需要单独记录
 | ||
| 4. **凭据安全**: Vault 和 Waypoint 的凭据存储在 Consul KV 中
 | ||
| 
 | ||
| ## 🔍 故障排除
 | ||
| 
 | ||
| ### 备份损坏
 | ||
| ```bash
 | ||
| # 检查备份文件完整性
 | ||
| tar -tzf nomad-jobs-backup-20251004.tar.gz > /dev/null && echo "备份完整" || echo "备份损坏"
 | ||
| ```
 | ||
| 
 | ||
| ### Consul KV 访问问题
 | ||
| ```bash
 | ||
| # 检查 Consul 连接
 | ||
| consul members
 | ||
| 
 | ||
| # 检查 KV 存储
 | ||
| consul kv get backup/nomad-jobs/index
 | ||
| ```
 | ||
| 
 | ||
| ---
 | ||
| 
 | ||
| **最后更新**: 2025-10-04 07:45:00  
 | ||
| **备份状态**: ✅ 当前备份完整可用  
 | ||
| **服务状态**: ✅ 所有服务正常运行
 |