This commit is contained in:
Houzhong Xu 2025-10-09 01:22:22 +00:00
parent 1c994f9f60
commit eab95c8c80
No known key found for this signature in database
GPG Key ID: B44BEB1438F1B46F
136 changed files with 11001 additions and 849 deletions

162
README-Backup.md Normal file
View File

@ -0,0 +1,162 @@
# Nomad Jobs 备份管理
本文档说明如何管理和恢复 Nomad job 配置的备份。
## 📁 备份存储位置
### 本地备份
- **路径**: `/root/mgmt/backups/nomad-jobs-YYYYMMDD-HHMMSS/`
- **压缩包**: `/root/mgmt/nomad-jobs-backup-YYYYMMDD.tar.gz`
### Consul KV 备份
- **数据**: `backup/nomad-jobs/YYYYMMDD/data`
- **元数据**: `backup/nomad-jobs/YYYYMMDD/metadata`
- **索引**: `backup/nomad-jobs/index`
## 📋 当前备份
### 2025-10-04 备份
- **备份时间**: 2025-10-04 07:44:11
- **备份类型**: 完整 Nomad jobs 配置
- **文件数量**: 25 个 `.nomad` 文件
- **原始大小**: 208KB
- **压缩大小**: 13KB
- **Consul KV 路径**: `backup/nomad-jobs/20251004/data`
#### 服务状态
- ✅ **Traefik** (`traefik-cloudflare-v1`) - SSL证书正常
- ✅ **Vault** (`vault-cluster`) - 三节点高可用集群
- ✅ **Waypoint** (`waypoint-server`) - Web UI 可访问
#### 域名和证书
- **域名**: `*.git4ta.me`
- **证书**: Let's Encrypt (Cloudflare DNS Challenge)
- **状态**: 所有证书有效
## 🔧 备份管理命令
### 查看备份列表
```bash
# 查看 Consul KV 中的备份索引
consul kv get backup/nomad-jobs/index
# 查看特定备份的元数据
consul kv get backup/nomad-jobs/20251004/metadata
```
### 恢复备份
```bash
# 从 Consul KV 恢复备份
consul kv get backup/nomad-jobs/20251004/data > nomad-jobs-backup-20251004.tar.gz
# 解压备份
tar -xzf nomad-jobs-backup-20251004.tar.gz
# 查看备份内容
ls -la backups/nomad-jobs-20251004-074411/
```
### 创建新备份
```bash
# 创建本地备份目录
mkdir -p backups/nomad-jobs-$(date +%Y%m%d-%H%M%S)
# 备份当前配置
cp -r components backups/nomad-jobs-$(date +%Y%m%d-%H%M%S)/
cp -r nomad-jobs backups/nomad-jobs-$(date +%Y%m%d-%H%M%S)/
cp waypoint-server.nomad backups/nomad-jobs-$(date +%Y%m%d-%H%M%S)/
# 压缩备份
tar -czf nomad-jobs-backup-$(date +%Y%m%d).tar.gz backups/nomad-jobs-$(date +%Y%m%d-*)/
# 存储到 Consul KV
consul kv put backup/nomad-jobs/$(date +%Y%m%d)/data @nomad-jobs-backup-$(date +%Y%m%d).tar.gz
```
## 📊 备份策略
### 备份频率
- **自动备份**: 建议每周一次
- **重要变更前**: 部署新服务或重大配置修改前
- **紧急情况**: 服务出现问题时立即备份当前状态
### 备份内容
- 所有 `.nomad` 文件
- 配置文件模板
- 服务依赖关系
- 网络和存储配置
### 备份验证
```bash
# 验证备份完整性
tar -tzf nomad-jobs-backup-20251004.tar.gz | wc -l
# 检查关键文件
tar -tzf nomad-jobs-backup-20251004.tar.gz | grep -E "(traefik|vault|waypoint)"
```
## 🚨 恢复流程
### 紧急恢复
1. **停止所有服务**
```bash
nomad job stop traefik-cloudflare-v1
nomad job stop vault-cluster
nomad job stop waypoint-server
```
2. **恢复备份**
```bash
consul kv get backup/nomad-jobs/20251004/data > restore.tar.gz
tar -xzf restore.tar.gz
```
3. **重新部署**
```bash
nomad job run backups/nomad-jobs-20251004-074411/components/traefik/jobs/traefik-cloudflare.nomad
nomad job run backups/nomad-jobs-20251004-074411/nomad-jobs/vault-cluster.nomad
nomad job run backups/nomad-jobs-20251004-074411/waypoint-server.nomad
```
### 部分恢复
```bash
# 只恢复特定服务
cp backups/nomad-jobs-20251004-074411/components/traefik/jobs/traefik-cloudflare.nomad components/traefik/jobs/
nomad job run components/traefik/jobs/traefik-cloudflare.nomad
```
## 📝 备份记录
| 日期 | 备份类型 | 服务状态 | 大小 | Consul KV 路径 |
|------|----------|----------|------|----------------|
| 2025-10-04 | 完整备份 | 全部运行 | 13KB | `backup/nomad-jobs/20251004/data` |
## ⚠️ 注意事项
1. **证书备份**: SSL证书存储在容器内重启会丢失
2. **Consul KV**: 重要配置存储在 Consul KV 中,需要单独备份
3. **网络配置**: Tailscale 网络配置需要单独记录
4. **凭据安全**: Vault 和 Waypoint 的凭据存储在 Consul KV 中
## 🔍 故障排除
### 备份损坏
```bash
# 检查备份文件完整性
tar -tzf nomad-jobs-backup-20251004.tar.gz > /dev/null && echo "备份完整" || echo "备份损坏"
```
### Consul KV 访问问题
```bash
# 检查 Consul 连接
consul members
# 检查 KV 存储
consul kv get backup/nomad-jobs/index
```
---
**最后更新**: 2025-10-04 07:45:00
**备份状态**: ✅ 当前备份完整可用
**服务状态**: ✅ 所有服务正常运行

166
README-Traefik.md Normal file
View File

@ -0,0 +1,166 @@
# Traefik 配置管理指南
## 🎯 配置与应用分离的最佳实践
### ⚠️ 重要:避免低逼格操作
**❌ 错误做法显得很low**
- 修改Nomad job文件来添加新域名
- 重新部署整个Traefik服务
- 把配置嵌入在应用定义中
**✅ 正确做法(优雅且专业):**
## 配置文件分离架构
### 1. 配置文件位置
- **动态配置**: `/root/mgmt/components/traefik/config/dynamic.yml`
- **应用配置**: `/root/mgmt/components/traefik/jobs/traefik-cloudflare-git4ta-live.nomad`
### 2. 关键特性
- ✅ **热重载**: Traefik配置了`file`提供者,支持`watch: true`
- ✅ **自动生效**: 修改YAML配置文件后自动生效无需重启
- ✅ **配置分离**: 配置与应用完全分离,符合最佳实践
### 3. 添加新域名的工作流程
```bash
# 只需要编辑配置文件
vim /root/mgmt/components/traefik/config/dynamic.yml
# 添加新的服务配置
services:
new-service-cluster:
loadBalancer:
servers:
- url: "https://new-service.tailnet-68f9.ts.net:8080"
healthCheck:
path: "/health"
interval: "30s"
timeout: "15s"
# 添加新的路由配置
routers:
new-service-ui:
rule: "Host(`new-service.git-4ta.live`)"
service: new-service-cluster
entryPoints:
- websecure
tls:
certResolver: cloudflare
# 保存后立即生效,无需重启!
```
### 4. 架构优势
- 🚀 **零停机时间**: 配置变更无需重启服务
- 🔧 **灵活管理**: 独立管理配置和应用
- 📝 **版本控制**: 配置文件可以独立版本管理
- 🎯 **专业标准**: 符合现代DevOps最佳实践
## 当前服务配置
### 已配置的服务
1. **Consul集群**
- 域名: `consul.git-4ta.live`
- 后端: 多节点负载均衡
- 健康检查: `/v1/status/leader`
2. **Nomad集群**
- 域名: `nomad.git-4ta.live`
- 后端: 多节点负载均衡
- 健康检查: `/v1/status/leader`
3. **Waypoint服务**
- 域名: `waypoint.git-4ta.live`
- 后端: `hcp1.tailnet-68f9.ts.net:9701`
- 协议: HTTPS (跳过证书验证)
4. **Vault服务**
- 域名: `vault.git-4ta.live`
- 后端: `warden.tailnet-68f9.ts.net:8200`
- 健康检查: `/ui/`
5. **Authentik服务**
- 域名: `authentik.git-4ta.live`
- 后端: `authentik.tailnet-68f9.ts.net:9443`
- 协议: HTTPS (跳过证书验证)
- 健康检查: `/flows/-/default/authentication/`
6. **Traefik Dashboard**
- 域名: `traefik.git-4ta.live`
- 服务: 内置dashboard
### SSL证书管理
- **证书解析器**: Cloudflare DNS Challenge
- **自动续期**: Let's Encrypt证书自动管理
- **存储位置**: `/opt/traefik/certs/acme.json`
- **强制HTTPS**: 所有HTTP请求自动重定向到HTTPS
## 故障排除
### 检查服务状态
```bash
# 检查Traefik API
curl -s http://hcp1.tailnet-68f9.ts.net:8080/api/overview
# 检查路由配置
curl -s http://hcp1.tailnet-68f9.ts.net:8080/api/http/routers
# 检查服务配置
curl -s http://hcp1.tailnet-68f9.ts.net:8080/api/http/services
```
### 检查证书状态
```bash
# 检查SSL证书
openssl s_client -connect consul.git-4ta.live:443 -servername consul.git-4ta.live < /dev/null 2>/dev/null | openssl x509 -noout -subject -issuer
# 检查证书文件
ssh root@hcp1 "cat /opt/traefik/certs/acme.json | jq '.cloudflare.Certificates'"
```
### 查看日志
```bash
# 查看Traefik日志
nomad logs -tail traefik-cloudflare-v1
# 查看特定错误
nomad logs -tail traefik-cloudflare-v1 | grep -i "error\|warn\|fail"
```
## 最佳实践
1. **配置管理**
- 始终使用`dynamic.yml`文件管理路由配置
- 避免修改Nomad job文件
- 使用版本控制管理配置文件
2. **服务发现**
- 优先使用Tailscale网络地址
- 配置适当的健康检查
- 使用HTTPS协议跳过自签名证书验证
3. **SSL证书**
- 依赖Cloudflare DNS Challenge
- 监控证书自动续期
- 定期检查证书状态
4. **监控和日志**
- 启用Traefik API监控
- 配置访问日志
- 定期检查服务健康状态
## 记住
**配置与应用分离是现代基础设施管理的核心原则!**
这种架构不仅提高了系统的灵活性和可维护性更体现了专业的DevOps实践水平。

120
README-Vault.md Normal file
View File

@ -0,0 +1,120 @@
# Vault 配置信息
## 概述
Vault 已成功迁移到 Nomad 管理下,运行在 ch4、ash3c、warden 三个节点上,支持高可用部署。
## 访问信息
### Vault 服务地址
- **主节点 (Active)**: `http://100.117.106.136:8200` (ch4 节点)
- **备用节点 (Standby)**: `http://100.116.80.94:8200` (ash3c 节点)
- **备用节点 (Standby)**: `http://100.122.197.112:8200` (warden 节点)
- **Web UI**: `http://100.117.106.136:8200/ui`
### 认证信息
- **Unseal Key**: `/iHuxLbHWmx5xlJhqaTUMniiRc71eO1UAwNJj/lDWow=`
- **Root Token**: `hvs.dHtno0cCpWtFYMCvJZTgGmfn`
## 使用方法
### 环境变量设置
```bash
export VAULT_ADDR=http://100.117.106.136:8200
export VAULT_TOKEN=hvs.dHtno0cCpWtFYMCvJZTgGmfn
```
### 基本命令
```bash
# 检查 Vault 状态
vault status
# 如果 Vault 被密封,使用 unseal key 解封
vault operator unseal /iHuxLbHWmx5xlJhqaTUMniiRc71eO1UAwNJj/lDWow=
# 访问 Vault CLI
vault auth -method=token token=hvs.dHtno0cCpWtFYMCvJZTgGmfn
```
## 存储位置
### Consul KV 存储
- **Unseal Key**: `vault/unseal-key`
- **Root Token**: `vault/root-token`
- **配置**: `vault/config/dev`
### 本地备份
- **备份目录**: `/root/vault-backup/`
- **初始化脚本**: `/root/mgmt/scripts/vault-init.sh`
## 部署信息
### Nomad 作业
- **作业名称**: `vault-cluster-nomad`
- **作业文件**: `/root/mgmt/nomad-jobs/vault-cluster.nomad`
- **部署节点**: ch4, ash3c, warden
- **并行部署**: 3 个节点同时运行
### 配置特点
- **存储后端**: Consul
- **高可用**: 启用
- **密封类型**: Shamir
- **密钥份额**: 1
- **阈值**: 1
## 故障排除
### 如果 Vault 被密封
```bash
# 1. 检查状态
vault status
# 2. 使用 unseal key 解封所有节点
# ch4 节点
export VAULT_ADDR=http://100.117.106.136:8200
vault operator unseal /iHuxLbHWmx5xlJhqaTUMniiRc71eO1UAwNJj/lDWow=
# ash3c 节点
export VAULT_ADDR=http://100.116.80.94:8200
vault operator unseal /iHuxLbHWmx5xlJhqaTUMniiRc71eO1UAwNJj/lDWow=
# warden 节点
export VAULT_ADDR=http://100.122.197.112:8200
vault operator unseal /iHuxLbHWmx5xlJhqaTUMniiRc71eO1UAwNJj/lDWow=
# 3. 验证解封状态
vault status
```
### 如果忘记认证信息
```bash
# 从 Consul KV 获取
consul kv get vault/unseal-key
consul kv get vault/root-token
```
### 重启 Vault 服务
```bash
# 重启 Nomad 作业
nomad job restart vault-cluster-nomad
# 或重启特定分配
nomad alloc restart <allocation-id>
```
## 安全注意事项
⚠️ **重要**:
- 请妥善保管 Unseal Key 和 Root Token
- 不要在生产环境中使用 Root Token 进行日常操作
- 建议创建具有适当权限的用户和策略
- 定期轮换密钥和令牌
## 更新历史
- **2025-10-04**: 成功迁移 Vault 到 Nomad 管理
- **2025-10-04**: 重新初始化 Vault 并获取新的认证信息
- **2025-10-04**: 优化部署策略,支持三节点并行运行
---
*最后更新: 2025-10-04*
*维护者: ben*

157
README-Waypoint.md Normal file
View File

@ -0,0 +1,157 @@
# Waypoint 配置和使用指南
## 服务信息
- **服务器地址**: `hcp1.tailnet-68f9.ts.net:9702` (gRPC)
- **HTTP API**: `hcp1.tailnet-68f9.ts.net:9701` (HTTPS)
- **Web UI**: `https://waypoint.git4ta.me/auth/token`
## 认证信息
### 认证 Token
```
3K4wQUdH1dfES7e2KRygoJ745wgjDCG6X7LmLCAseEs3a5jrK185Yk4ZzYQUDvwEacPTfaF5hbUW1E3JNA7fvMthHWrkAFyRZoocmjCqj72YfJRzXW7KsurdSoMoKpEVJyiWRxPAg3VugzUx
```
### Token 存储位置
- **Consul KV**: `waypoint/auth-token`
- **获取命令**: `consul kv get waypoint/auth-token`
## 访问方式
### 1. Web UI 访问
```
https://waypoint.git4ta.me/auth/token
```
使用上述认证 token 进行登录。
### 2. CLI 访问
```bash
# 创建上下文
waypoint context create \
-server-addr=hcp1.tailnet-68f9.ts.net:9702 \
-server-tls-skip-verify \
-set-default waypoint-server
# 验证连接
waypoint server info
```
### 3. 使用认证 Token
```bash
# 设置环境变量
export WAYPOINT_TOKEN="3K4wQUdH1dfES7e2KRygoJ745wgjDCG6X7LmLCAseEs3a5jrK185Yk4ZzYQUDvwEacPTfaF5hbUW1E3JNA7fvMthHWrkAFyRZoocmjCqj72YfJRzXW7KsurdSoMoKpEVJyiWRxPAg3VugzUx"
# 或者使用 -server-auth-token 参数
waypoint server info -server-auth-token="$WAYPOINT_TOKEN"
```
## 服务配置
### Nomad 作业配置
- **文件**: `/root/mgmt/waypoint-server.nomad`
- **节点**: `hcp1.tailnet-68f9.ts.net`
- **数据库**: `/opt/waypoint/waypoint.db`
- **gRPC 端口**: 9702
- **HTTP 端口**: 9701
### Traefik 路由配置
- **域名**: `waypoint.git4ta.me`
- **后端**: `https://hcp1.tailnet-68f9.ts.net:9701`
- **TLS**: 跳过证书验证 (`insecureSkipVerify: true`)
## 常用命令
### 服务器管理
```bash
# 检查服务器状态
waypoint server info
# 获取服务器 cookie
waypoint server cookie
# 创建快照备份
waypoint server snapshot
```
### 项目管理
```bash
# 列出所有项目
waypoint list projects
# 初始化新项目
waypoint init
# 部署应用
waypoint up
# 查看部署状态
waypoint list deployments
```
### 应用管理
```bash
# 列出应用
waypoint list apps
# 查看应用日志
waypoint logs -app=<app-name>
# 执行应用命令
waypoint exec -app=<app-name> <command>
```
## 故障排除
### 1. 连接问题
```bash
# 检查服务器是否运行
nomad job status waypoint-server
# 检查端口是否监听
netstat -tlnp | grep 970
```
### 2. 认证问题
```bash
# 重新引导服务器(会生成新 token
nomad job stop waypoint-server
ssh hcp1.tailnet-68f9.ts.net "rm -f /opt/waypoint/waypoint.db"
nomad job run /root/mgmt/waypoint-server.nomad
waypoint server bootstrap -server-addr=hcp1.tailnet-68f9.ts.net:9702 -server-tls-skip-verify
```
### 3. Web UI 访问问题
- 确保使用正确的路径: `/auth/token`
- 检查 Traefik 路由配置
- 验证 SSL 证书是否有效
## 集成配置
### 与 Nomad 集成
```bash
# 配置 Nomad 作为运行时平台
waypoint config source-set -type=nomad nomad-platform \
addr=http://localhost:4646
```
### 与 Vault 集成
```bash
# 配置 Vault 集成
waypoint config source-set -type=vault vault-secrets \
addr=http://localhost:8200 \
token=<vault-token>
```
## 安全注意事项
1. **Token 保护**: 认证 token 具有完全访问权限,请妥善保管
2. **网络访问**: 服务器监听所有接口,确保防火墙配置正确
3. **TLS 验证**: 当前配置跳过 TLS 验证,生产环境建议启用
4. **备份**: 定期备份 `/opt/waypoint/waypoint.db` 数据库文件
## 更新日志
- **2025-10-04**: 初始部署和配置
- **2025-10-04**: 获取认证 token 并存储到 Consul KV
- **2025-10-04**: 配置 Traefik 路由和 Web UI 访问

752
README.md
View File

@ -1,586 +1,284 @@
# 🏗️ 基础设施管理项目
# Management Infrastructure
这是一个现代化的多云基础设施管理平台,专注于 OpenTofu、Ansible 和 Nomad + Podman 的集成管理。
## 🚨 关键问题记录
## 📝 重要提醒 (Sticky Note)
### Nomad Consul KV 模板语法问题
### ✅ Consul集群状态更新
**问题描述:**
Nomad 无法从 Consul KV 读取配置,报错:`Missing: kv.block(config/dev/cloudflare/token)`
**当前状态**Consul集群运行健康所有节点正常运行
**根本原因:**
1. **Nomad 客户端未配置 Consul 连接** - Nomad 无法访问 Consul KV
2. **模板语法正确** - `{{ key "path/to/key" }}` 是正确语法
3. **Consul KV 数据存在** - `config/dev/cloudflare/token` 确实存在
**集群信息**
- **Leader**: warden (100.122.197.112:8300)
- **节点数量**: 3个服务器节点
- **健康状态**: 所有节点健康检查通过
- **节点列表**:
- master (100.117.106.136) - 韩国主节点
- ash3c (100.116.80.94) - 美国服务器节点
- warden (100.122.197.112) - 北京服务器节点当前集群leader
**解决方案:**
1. **临时方案** - 硬编码 token 到配置文件中
2. **长期方案** - 配置 Nomad 客户端连接 Consul
**配置状态**
- Ansible inventory配置与实际集群状态一致
- 所有节点均为服务器模式
- bootstrap_expect=3符合实际节点数量
**核心诉求:**
- **集中化存储** → Consul KV 存储所有敏感配置
- **分散化部署** → Nomad 从 Consul 读取配置部署到多节点
- **直接读取** → Nomad 模板系统直接从 Consul KV 读取配置
**依赖关系**
- Tailscale (第1天) ✅
- Ansible (第2天) ✅
- Nomad (第3天) ✅
- Consul (第4天) ✅ **已完成**
- Terraform (第5天) ✅ **进展良好**
- Vault (第6天) ⏳ 计划中
- Waypoint (第7天) ⏳ 计划中
**当前状态:**
- ✅ Consul KV 存储正常
- ✅ Traefik 服务运行正常
- ❌ Nomad 无法读取 Consul KV需要配置连接
**下一步计划**
- 继续推进Terraform状态管理
- 准备Vault密钥管理集成
- 规划Waypoint应用部署流程
**下一步:**
1. 配置 Nomad 客户端连接 Consul
2. 恢复模板语法从 Consul KV 读取配置
3. 实现真正的集中化配置管理
---
## 🎯 项目特性
## 🎯 Traefik 配置架构:配置与应用分离的最佳实践
- **🌩️ 多云支持**: Oracle Cloud, 华为云, Google Cloud, AWS, DigitalOcean
- **🏗️ 基础设施即代码**: 使用 OpenTofu 管理云资源
- **⚙️ 配置管理**: 使用 Ansible 自动化配置和部署
- **🐳 容器编排**: Nomad 集群管理和 Podman 容器运行时
- **🔄 CI/CD**: Gitea Actions 自动化流水线
- **📊 监控**: Prometheus + Grafana 监控体系
- **🔐 安全**: 多层安全防护和合规性
### ⚠️ 重要:避免低逼格操作
## 🔄 架构分层与职责划分
**❌ 错误做法显得很low**
- 修改Nomad job文件来添加新域名
- 重新部署整个Traefik服务
- 把配置嵌入在应用定义中
### ⚠️ 重要Terraform 与 Nomad 的职责区分
**✅ 正确做法(优雅且专业):**
本项目采用分层架构,明确区分了不同工具的职责范围,避免混淆:
### 配置文件分离架构
#### 1. **Terraform/OpenTofu 层面 - 基础设施生命周期管理**
- **职责**: 管理云服务商提供的计算资源(虚拟机)的生命周期
- **操作范围**:
- 创建、更新、删除虚拟机实例
- 管理网络资源VCN、子网、安全组等
- 管理存储资源(块存储、对象存储等)
- 管理负载均衡器等云服务
- **目标**: 确保底层基础设施的正确配置和状态管理
**1. 配置文件位置:**
- **动态配置**: `/root/mgmt/components/traefik/config/dynamic.yml`
- **应用配置**: `/root/mgmt/components/traefik/jobs/traefik-cloudflare-git4ta-live.nomad`
#### 2. **Nomad 层面 - 应用资源调度与编排**
- **职责**: 在已经运行起来的虚拟机内部进行资源分配和应用编排
- **操作范围**:
- 在现有虚拟机上调度和运行容器化应用
- 管理应用的生命周期(启动、停止、更新)
- 资源分配和限制CPU、内存、存储
- 服务发现和负载均衡
- **目标**: 在已有基础设施上高效运行应用服务
#### 3. **关键区别**
- **Terraform** 关注的是**虚拟机本身**的生命周期管理
- **Nomad** 关注的是**在虚拟机内部**运行的应用的资源调度
- **Terraform** 决定"有哪些虚拟机"
- **Nomad** 决定"虚拟机上运行什么应用"
#### 4. **工作流程示例**
```
1. Terraform 创建虚拟机 (云服务商层面)
2. 虚拟机启动并运行操作系统
3. 在虚拟机上安装和配置 Nomad 客户端
4. Nomad 在虚拟机上调度和运行应用容器
```
**重要提醒**: 这两个层面不可混淆Terraform 不应该管理应用层面的资源Nomad 也不应该创建虚拟机。严格遵守此分层架构是项目成功的关键。
## 📁 项目结构
```
mgmt/
├── .gitea/workflows/ # CI/CD 工作流
├── tofu/ # OpenTofu 基础设施代码 (基础设施生命周期管理)
│ ├── environments/ # 环境配置 (dev/staging/prod)
│ ├── modules/ # 可复用模块
│ ├── providers/ # 云服务商配置
│ └── shared/ # 共享配置
├── configuration/ # Ansible 配置管理
│ ├── inventories/ # 主机清单
│ ├── playbooks/ # 剧本
│ ├── templates/ # 模板文件
│ └── group_vars/ # 组变量
├── jobs/ # Nomad 作业定义 (应用资源调度与编排)
│ ├── consul/ # Consul 集群配置
│ └── podman/ # Podman 相关作业
├── configs/ # 配置文件
│ ├── nomad-master.hcl # Nomad 主节点配置
│ └── nomad-ash3c.hcl # Nomad 客户端配置
├── docs/ # 文档
├── security/ # 安全配置
│ ├── certificates/ # 证书文件
│ └── policies/ # 安全策略
├── tests/ # 测试脚本和报告
│ ├── mcp_servers/ # MCP服务器测试脚本
│ ├── mcp_server_test_report.md # MCP服务器测试报告
│ └── legacy/ # 旧的测试脚本
├── tools/ # 工具和实用程序
├── playbooks/ # 核心Ansible剧本
└── Makefile # 项目管理命令
```
**架构分层说明**:
- **tofu/** 目录包含 Terraform/OpenTofu 代码,负责管理云服务商提供的计算资源生命周期
- **jobs/** 目录包含 Nomad 作业定义,负责在已有虚拟机内部进行应用资源调度
- 这两个目录严格分离,确保职责边界清晰
**注意:** 项目已从 Docker Swarm 迁移到 Nomad + Podman原有的 swarm 目录已不再使用。所有中间过程脚本和测试文件已清理保留核心配置文件以符合GitOps原则。
## 🔄 GitOps 原则
本项目遵循 GitOps 工作流,确保基础设施状态与 Git 仓库中的代码保持一致:
- **声明式配置**: 所有基础设施和应用程序配置都以声明式方式存储在 Git 中
- **版本控制和审计**: 所有变更都通过 Git 提交,提供完整的变更历史和审计跟踪
- **自动化同步**: 通过 CI/CD 流水线自动将 Git 中的变更应用到实际环境
- **状态收敛**: 系统会持续监控实际状态,并自动修复任何与期望状态的偏差
### GitOps 工作流程
1. **声明期望状态**: 在 Git 中定义基础设施和应用程序的期望状态
2. **提交变更**: 通过 Git 提交来应用变更
3. **自动同步**: CI/CD 系统检测到变更并自动应用到环境
4. **状态验证**: 系统验证实际状态与期望状态一致
5. **监控和告警**: 持续监控状态并在出现偏差时发出告警
这种工作流确保了环境的一致性、可重复性和可靠性,同时提供了完整的变更历史和回滚能力。
## 🚀 快速开始
### 1. 环境准备
**2. 关键特性:**
- ✅ **热重载**: Traefik配置了`file`提供者,支持`watch: true`
- ✅ **自动生效**: 修改YAML配置文件后自动生效无需重启
- ✅ **配置分离**: 配置与应用完全分离,符合最佳实践
**3. 添加新域名的工作流程:**
```bash
# 克隆项目
git clone <repository-url>
cd mgmt
# 只需要编辑配置文件
vim /root/mgmt/components/traefik/config/dynamic.yml
# 检查环境状态
./mgmt.sh status
# 添加新的路由配置
routers:
new-service-ui:
rule: "Host(`new-service.git-4ta.live`)"
service: new-service-cluster
entryPoints:
- websecure
tls:
certResolver: cloudflare
# 快速部署(适用于开发环境)
./mgmt.sh deploy
# 保存后立即生效,无需重启!
```
### 2. 配置云服务商
**4. 架构优势:**
- 🚀 **零停机时间**: 配置变更无需重启服务
- 🔧 **灵活管理**: 独立管理配置和应用
- 📝 **版本控制**: 配置文件可以独立版本管理
- 🎯 **专业标准**: 符合现代DevOps最佳实践
**记住:配置与应用分离是现代基础设施管理的核心原则!**
---
## 架构概览
### 集中化 + 分散化架构
**集中化存储:**
- **Consul KV** → 存储所有敏感配置tokens、证书、密钥
- **Consul Service Discovery** → 服务注册和发现
- **Consul Health Checks** → 服务健康检查
**分散化部署:**
- **亚洲节点**`warden.tailnet-68f9.ts.net` (北京)
- **亚洲节点**`ch4.tailnet-68f9.ts.net` (韩国)
- **美洲节点**`ash3c.tailnet-68f9.ts.net` (美国)
### 服务端点
- `https://consul.git-4ta.live` → Consul UI
- `https://traefik.git-4ta.live` → Traefik Dashboard
- `https://nomad.git-4ta.live` → Nomad UI
- `https://vault.git-4ta.live` → Vault UI
- `https://waypoint.git-4ta.live` → Waypoint UI
- `https://authentik.git-4ta.live` → Authentik 身份认证
### 技术栈
- **Nomad** → 工作负载编排
- **Consul** → 服务发现和配置管理
- **Traefik** → 反向代理和负载均衡
- **Cloudflare** → DNS 和 SSL 证书管理
- **Waypoint** → 应用部署平台
- **Authentik** → 身份认证和授权管理
---
## 部署状态
### ✅ 已完成
- [x] Cloudflare token 存储到 Consul KV
- [x] 泛域名解析 `*.git-4ta.live` 配置
- [x] Traefik 配置和部署
- [x] SSL 证书自动获取
- [x] 所有服务端点配置
- [x] Vault 迁移到 Nomad 管理
- [x] Vault 高可用三节点部署
- [x] Waypoint 服务器部署和引导
- [x] Waypoint 认证 token 获取和存储
- [x] Nomad jobs 配置备份到 Consul KV
- [x] Authentik 容器部署和SSH密钥配置
- [x] Traefik 配置架构优化(配置与应用分离)
### ⚠️ 待解决
- [ ] Nomad 客户端 Consul 连接配置
- [ ] 恢复从 Consul KV 读取配置
- [ ] 实现真正的集中化配置管理
---
## 快速开始
### 检查服务状态
```bash
# 复制配置模板
cp tofu/environments/dev/terraform.tfvars.example tofu/environments/dev/terraform.tfvars
# 编辑配置文件,填入你的云服务商凭据
vim tofu/environments/dev/terraform.tfvars
# 检查所有服务
curl -k -I https://consul.git4ta.tech
curl -k -I https://traefik.git4ta.tech
curl -k -I https://nomad.git4ta.tech
curl -k -I https://waypoint.git4ta.tech
```
### 3. 初始化基础设施
### 部署 Traefik
```bash
# 初始化 OpenTofu
./mgmt.sh tofu init
# 查看执行计划
./mgmt.sh tofu plan
# 应用基础设施变更
cd tofu/environments/dev && tofu apply
cd /root/mgmt
nomad job run components/traefik/jobs/traefik-cloudflare-git4ta-live.nomad
```
### 4. 部署 Nomad 服务
### 管理 Traefik 配置(推荐方式)
```bash
# 部署 Consul 集群
nomad run /root/mgmt/jobs/consul/consul-cluster.nomad
# 添加新域名只需要编辑配置文件
vim /root/mgmt/components/traefik/config/dynamic.yml
# 查看 Nomad 任务
nomad job status
# 查看节点状态
nomad node status
# 保存后自动生效,无需重启!
# 这就是配置与应用分离的优雅之处
```
### ⚠️ 重要提示:网络访问注意事项
**Tailscale 网络访问**
- 本项目中的 Nomad 和 Consul 服务通过 Tailscale 网络进行访问
- 访问 Nomad (端口 4646) 和 Consul (端口 8500) 时,必须使用 Tailscale 分配的 IP 地址
- 错误示例:`http://127.0.0.1:4646` 或 `http://localhost:8500` (无法连接)
- 正确示例:`http://100.x.x.x:4646` 或 `http://100.x.x.x:8500` (使用 Tailscale IP)
**获取 Tailscale IP**
### 检查 Consul KV
```bash
# 查看当前节点的 Tailscale IP
tailscale ip -4
# 查看所有 Tailscale 网络中的节点
tailscale status
consul kv get config/dev/cloudflare/token
consul kv get -recurse config/
```
**常见问题**
- 如果遇到 "connection refused" 错误,请确认是否使用了正确的 Tailscale IP
- 确保 Tailscale 服务已启动并正常运行
- 检查网络策略是否允许通过 Tailscale 接口访问相关端口
- 更多详细经验和解决方案,请参考:[Consul 和 Nomad 访问问题经验教训](.gitea/issues/consul-nomad-access-lesson.md)
### 🔄 Nomad 集群领导者轮换与访问策略
**Nomad 集群领导者机制**
- Nomad 使用 Raft 协议实现分布式一致性,集群中只有一个领导者节点
- 领导者节点负责处理所有写入操作和协调集群状态
- 当领导者节点故障时,集群会自动选举新的领导者
**领导者轮换时的访问策略**
1. **动态发现领导者**
### 备份管理
```bash
# 查询当前领导者节点
curl -s http://<任意Nomad服务器IP>:4646/v1/status/leader
# 返回结果示例: "100.90.159.68:4647"
# 查看备份列表
consul kv get backup/nomad-jobs/index
# 使用返回的领导者地址进行API调用
curl -s http://100.90.159.68:4646/v1/nodes
# 查看最新备份信息
consul kv get backup/nomad-jobs/20251004/metadata
# 恢复备份
consul kv get backup/nomad-jobs/20251004/data > restore.tar.gz
tar -xzf restore.tar.gz
```
2. **负载均衡方案**
- **DNS 负载均衡**:使用 Consul DNS 服务,通过 `nomad.service.consul` 解析到当前领导者
- **代理层负载均衡**:在 Nginx/HAProxy 配置中添加健康检查,自动路由到活跃的领导者节点
- **客户端重试机制**:在客户端代码中实现重试逻辑,当连接失败时尝试其他服务器节点
---
3. **推荐访问模式**
## 重要文件
- `components/traefik/config/dynamic.yml` → **Traefik 动态配置文件(推荐使用)**
- `components/traefik/jobs/traefik-cloudflare-git4ta-live.nomad` → Traefik Nomad 作业配置
- `README-Traefik.md` → **Traefik 配置管理指南(必读)**
- `infrastructure/opentofu/environments/dev/` → Terraform 基础设施配置
- `deployment/ansible/inventories/production/hosts` → 服务器清单
- `README-Vault.md` → Vault 配置和使用说明
- `README-Waypoint.md` → Waypoint 配置和使用说明
- `README-Backup.md` → 备份管理和恢复说明
- `nomad-jobs/vault-cluster.nomad` → Vault Nomad 作业配置
- `waypoint-server.nomad` → Waypoint Nomad 作业配置
---
## 🔧 服务初始化说明
### Vault 初始化
**当前状态:** Vault使用本地file存储需要初始化
**初始化步骤:**
```bash
# 使用领导者发现脚本
#!/bin/bash
# 获取任意一个Nomad服务器IP
SERVER_IP="100.116.158.95"
# 查询当前领导者
LEADER=$(curl -s http://${SERVER_IP}:4646/v1/status/leader | sed 's/"//g')
# 使用领导者地址执行命令
nomad node status -address=http://${LEADER}
# 1. 检查vault状态
curl -s http://warden.tailnet-68f9.ts.net:8200/v1/sys/health
# 2. 初始化vault如果返回"no available server"
vault operator init -address=http://warden.tailnet-68f9.ts.net:8200
# 3. 保存unseal keys和root token
# 4. 解封vault
vault operator unseal -address=http://warden.tailnet-68f9.ts.net:8200 <unseal-key-1>
vault operator unseal -address=http://warden.tailnet-68f9.ts.net:8200 <unseal-key-2>
vault operator unseal -address=http://warden.tailnet-68f9.ts.net:8200 <unseal-key-3>
```
4. **高可用性配置**
- 将所有 Nomad 服务器节点添加到客户端配置中
- 客户端会自动连接到可用的服务器节点
- 对于写入操作,客户端会自动重定向到领导者节点
**🔑 Vault 密钥信息 (2025-10-04 最终初始化):**
```
Unseal Key 1: 5XQ6vSekewZj9SigcIS8KcpnsOyEzgG5UFe/mqPVXkre
Unseal Key 2: vmLu+Ry+hajWjQhX3YVnZG72aZRn5cowcUm5JIVtv/kR
Unseal Key 3: 3eDhfnHZnG9OT6RFOhpoK/aO5TghPypz4XPlXxFMm52F
Unseal Key 4: LWGkYB7qD3GPPc/nRuqKmMUiQex8ygYF1BkSXA1Tov3J
Unseal Key 5: rIidFy7d/SxcPOCrNy569VZ86I56oMQxqL7qVgM+PYPy
**注意事项**
- Nomad 集群领导者轮换是自动进行的,通常不需要人工干预
- 在领导者选举期间,集群可能会短暂无法处理写入操作
- 建议在应用程序中实现适当的重试逻辑,以处理领导者切换期间的临时故障
Root Token: hvs.OgVR2hEihbHM7qFxtFr7oeo3
```
## 🛠️ 常用命令
**配置说明:**
- **存储**: file (本地文件系统)
- **路径**: `/opt/nomad/data/vault-storage` (持久化存储)
- **端口**: 8200
- **UI**: 启用
- **重要**: 已配置持久化存储,重启后密钥不会丢失
| 命令 | 描述 |
|------|------|
| `make status` | 显示项目状态总览 |
| `make deploy` | 快速部署所有服务 |
| `make cleanup` | 清理所有部署的服务 |
| `cd tofu/environments/dev && tofu <cmd>` | OpenTofu 管理命令 |
| `nomad job status` | 查看 Nomad 任务状态 |
| `nomad node status` | 查看 Nomad 节点状态 |
| `podman ps` | 查看运行中的容器 |
| `ansible-playbook playbooks/configure-nomad-clients.yml` | 配置 Nomad 客户端 |
| `./run_tests.sh``make test-mcp` | 运行所有MCP服务器测试 |
| `make test-kali` | 运行Kali Linux快速健康检查 |
| `make test-kali-security` | 运行Kali Linux安全工具测试 |
| `make test-kali-full` | 运行Kali Linux完整测试套件 |
### Waypoint 初始化
## 🌩️ 支持的云服务商
**当前状态:** Waypoint正常运行可能需要重新初始化
### Oracle Cloud Infrastructure (OCI)
- ✅ 计算实例
- ✅ 网络配置 (VCN, 子网, 安全组)
- ✅ 存储 (块存储, 对象存储)
- ✅ 负载均衡器
### 华为云
- ✅ 弹性云服务器 (ECS)
- ✅ 虚拟私有云 (VPC)
- ✅ 弹性负载均衡 (ELB)
- ✅ 云硬盘 (EVS)
### Google Cloud Platform
- ✅ Compute Engine
- ✅ VPC 网络
- ✅ Cloud Load Balancing
- ✅ Persistent Disk
### Amazon Web Services
- ✅ EC2 实例
- ✅ VPC 网络
- ✅ Application Load Balancer
- ✅ EBS 存储
### DigitalOcean
- ✅ Droplets
- ✅ VPC 网络
- ✅ Load Balancers
- ✅ Block Storage
## 🔄 CI/CD 流程
### 基础设施部署流程
1. **代码提交** → 触发 Gitea Actions
2. **OpenTofu Plan** → 生成执行计划
3. **人工审核** → 确认变更
4. **OpenTofu Apply** → 应用基础设施变更
5. **Ansible 部署** → 配置和部署应用
### 应用部署流程
1. **应用代码更新** → 构建容器镜像
2. **镜像推送** → 推送到镜像仓库
3. **Nomad Job 更新** → 更新任务定义
4. **Nomad 部署** → 滚动更新服务
5. **健康检查** → 验证部署状态
## 📊 监控和可观测性
### 监控组件
- **Prometheus**: 指标收集和存储
- **Grafana**: 可视化仪表板
- **AlertManager**: 告警管理
- **Node Exporter**: 系统指标导出
### 日志管理
- **ELK Stack**: Elasticsearch + Logstash + Kibana
- **Fluentd**: 日志收集和转发
- **结构化日志**: JSON 格式标准化
## 🔐 安全最佳实践
### 基础设施安全
- **网络隔离**: VPC, 安全组, 防火墙
- **访问控制**: IAM 角色和策略
- **数据加密**: 传输和静态加密
- **密钥管理**: 云服务商密钥管理服务
### 应用安全
- **容器安全**: 镜像扫描, 最小权限
- **网络安全**: 服务网格, TLS 终止
- **秘密管理**: Docker Secrets, Ansible Vault
- **安全审计**: 日志监控和审计
## 🧪 测试策略
### 基础设施测试
- **语法检查**: OpenTofu validate
- **安全扫描**: Checkov, tfsec
- **合规检查**: OPA (Open Policy Agent)
### 应用测试
- **单元测试**: 应用代码测试
- **集成测试**: 服务间集成测试
- **端到端测试**: 完整流程测试
### MCP服务器测试
项目包含完整的MCPModel Context Protocol服务器测试套件位于 `tests/mcp_servers/` 目录:
- **context7服务器测试**: 验证初始化、工具列表和搜索功能
- **qdrant服务器测试**: 测试文档添加、搜索和删除功能
- **qdrant-ollama服务器测试**: 验证向量数据库与LLM集成功能
测试脚本包括Shell脚本和Python脚本支持通过JSON-RPC协议直接测试MCP服务器功能。详细的测试结果和问题修复记录请参考 `tests/mcp_server_test_report.md`
运行测试:
**初始化步骤:**
```bash
# 运行单个测试脚本
cd tests/mcp_servers
./test_local_mcp_servers.sh
# 1. 检查waypoint状态
curl -I https://waypoint.git-4ta.live
# 或运行Python测试
python test_mcp_servers_simple.py
# 2. 如果需要重新初始化
waypoint server init -server-addr=https://waypoint.git-4ta.live
# 3. 配置waypoint CLI
waypoint auth login -server-addr=https://waypoint.git-4ta.live
```
### Kali Linux系统测试
项目包含完整的Kali Linux系统测试套件位于 `configuration/playbooks/test/` 目录。测试包括:
**配置说明:**
- **存储**: 本地数据库 `/opt/waypoint/waypoint.db`
- **端口**: HTTP 9701, gRPC 9702
- **UI**: 启用
1. **快速健康检查** (`kali-health-check.yml`): 基本系统状态检查
2. **安全工具测试** (`kali-security-tools.yml`): 测试各种安全工具的安装和功能
3. **完整系统测试** (`test-kali.yml`): 全面的系统测试和报告生成
4. **完整测试套件** (`kali-full-test-suite.yml`): 按顺序执行所有测试
### Consul 服务注册
运行测试:
```bash
# Kali Linux快速健康检查
make test-kali
**已注册服务:**
- ✅ **vault**: `vault.git-4ta.live` (tags: vault, secrets, kv)
- ✅ **waypoint**: `waypoint.git-4ta.live` (tags: waypoint, ci-cd, deployment)
- ✅ **consul**: `consul.git-4ta.live` (tags: consul, service-discovery)
- ✅ **traefik**: `traefik.git-4ta.live` (tags: traefik, proxy, load-balancer)
- ✅ **nomad**: `nomad.git-4ta.live` (tags: nomad, scheduler, orchestrator)
# Kali Linux安全工具测试
make test-kali-security
**健康检查:**
- **vault**: `/v1/sys/health`
- **waypoint**: `/`
- **consul**: `/v1/status/leader`
- **traefik**: `/ping`
- **nomad**: `/v1/status/leader`
# Kali Linux完整测试套件
make test-kali-full
```
## 📚 文档
- [Consul集群故障排除](docs/consul-cluster-troubleshooting.md)
- [磁盘管理](docs/disk-management.md)
- [Nomad NFS设置](docs/nomad-nfs-setup.md)
- [Consul-Terraform集成](docs/setup/consul-terraform-integration.md)
- [OCI凭据设置](docs/setup/oci-credentials-setup.md)
- [Oracle云设置](docs/setup/oracle-cloud-setup.md)
## 🤝 贡献指南
1. Fork 项目
2. 创建特性分支 (`git checkout -b feature/amazing-feature`)
3. 提交变更 (`git commit -m 'Add amazing feature'`)
4. 推送到分支 (`git push origin feature/amazing-feature`)
5. 创建 Pull Request
## 📄 许可证
本项目采用 MIT 许可证 - 查看 [LICENSE](LICENSE) 文件了解详情。
## 🆘 支持
如果你遇到问题或有疑问:
1. 查看 [文档](docs/)
2. 搜索 [Issues](../../issues)
3. 创建新的 [Issue](../../issues/new)
## ⚠️ 重要经验教训
### Terraform 与 Nomad 职责区分
**问题**:在基础设施管理中容易混淆 Terraform 和 Nomad 的职责范围,导致架构设计混乱。
**根本原因**Terraform 和 Nomad 虽然都是基础设施管理工具,但它们在架构中处于不同层面,负责不同类型的资源管理。
**解决方案**
1. **明确分层架构**
- **Terraform/OpenTofu**:负责云服务商提供的计算资源(虚拟机)的生命周期管理
- **Nomad**:负责在已有虚拟机内部进行应用资源调度和编排
2. **职责边界清晰**
- Terraform 决定"有哪些虚拟机"
- Nomad 决定"虚拟机上运行什么应用"
- 两者不应越界管理对方的资源
3. **工作流程分离**
```
1. Terraform 创建虚拟机 (云服务商层面)
2. 虚拟机启动并运行操作系统
3. 在虚拟机上安装和配置 Nomad 客户端
4. Nomad 在虚拟机上调度和运行应用容器
```
**重要提醒**:严格遵守这种分层架构是项目成功的关键。任何混淆这两个层面职责的做法都会导致架构混乱和管理困难。
### Consul 和 Nomad 访问问题
**问题**:尝试访问 Consul 服务时,使用 `http://localhost:8500``http://127.0.0.1:8500` 无法连接。
**根本原因**:本项目中的 Consul 和 Nomad 服务通过 Nomad + Podman 在集群中运行,并通过 Tailscale 网络进行访问。这些服务不在本地运行,因此无法通过 localhost 访问。
**解决方案**
1. **使用 Tailscale IP**:必须使用 Tailscale 分配的 IP 地址访问服务
```bash
# 查看当前节点的 Tailscale IP
tailscale ip -4
# 查看所有 Tailscale 网络中的节点
tailscale status
# 访问 Consul (使用实际的 Tailscale IP)
curl http://100.x.x.x:8500/v1/status/leader
# 访问 Nomad (使用实际的 Tailscale IP)
curl http://100.x.x.x:4646/v1/status/leader
```
2. **服务发现**Consul 集群由 3 个节点组成Nomad 集群由十多个节点组成,需要正确识别服务运行的节点
3. **集群架构**
- Consul 集群3 个节点 (kr-master, us-ash3c, bj-warden)
- Nomad 集群:十多个节点,包括服务器节点和客户端节点
**重要提醒**:在开发和调试过程中,始终记住使用 Tailscale IP 而不是 localhost 访问集群服务。这是本项目架构的基本要求,必须严格遵守。
### Consul 集群配置管理经验
**问题**Consul集群配置文件与实际运行状态不一致导致集群管理混乱和配置错误。
**根本原因**Ansible inventory配置文件中的节点信息与实际Consul集群中的节点状态不匹配包括节点角色、数量和expect值等关键配置。
**解决方案**
1. **定期验证集群状态**使用Consul API定期检查集群实际状态确保配置文件与实际运行状态一致
```bash
# 查看Consul集群节点信息
curl -s http://<consul-server>:8500/v1/catalog/nodes
# 查看节点详细信息
curl -s http://<consul-server>:8500/v1/agent/members
# 查看集群leader信息
curl -s http://<consul-server>:8500/v1/status/leader
```
2. **保持配置文件一致性**确保所有相关的inventory配置文件如`csol-consul-nodes.ini`、`consul-nodes.ini`、`consul-cluster.ini`)保持一致,包括:
- 服务器节点列表和数量
- 客户端节点列表和数量
- `bootstrap_expect`值(必须与实际服务器节点数量匹配)
- 节点角色和IP地址
3. **正确识别节点角色**通过API查询确认每个节点的实际角色避免将服务器节点误配置为客户端节点或反之
```json
// API返回的节点信息示例
{
"Name": "warden",
"Addr": "100.122.197.112",
"Port": 8300,
"Status": 1,
"ProtocolVersion": 2,
"Delegate": 1,
"Server": true // 确认节点角色
}
```
4. **更新配置流程**:当发现配置与实际状态不匹配时,按照以下步骤更新:
- 使用API获取集群实际状态
- 根据实际状态更新所有相关配置文件
- 确保所有配置文件中的信息保持一致
- 更新配置文件中的说明和注释,反映最新的集群状态
**实际案例**
- **初始状态**配置文件显示2个服务器节点和5个客户端节点`bootstrap_expect=2`
- **实际状态**Consul集群运行3个服务器节点master、ash3c、warden无客户端节点`expect=3`
- **解决方案**更新所有配置文件将服务器节点数量改为3个移除所有客户端节点配置将`bootstrap_expect`值更新为3
**重要提醒**Consul集群配置必须与实际运行状态保持严格一致。任何不匹配都可能导致集群不稳定或功能异常。定期使用Consul API验证集群状态并及时更新配置文件是确保集群稳定运行的关键。
## 🎉 致谢
感谢所有为这个项目做出贡献的开发者和社区成员!
## 脚本整理
项目脚本已重新整理,按功能分类存放在 `scripts/` 目录中:
- `scripts/setup/` - 环境设置和初始化
- `scripts/deployment/` - 部署相关脚本
- `scripts/testing/` - 测试脚本
- `scripts/utilities/` - 工具脚本
- `scripts/mcp/` - MCP 服务器相关
- `scripts/ci-cd/` - CI/CD 相关
详细信息请查看 [脚本索引](scripts/SCRIPT_INDEX.md)。
## 脚本整理
项目脚本已重新整理,按功能分类存放在 `scripts/` 目录中:
- `scripts/setup/` - 环境设置和初始化
- `scripts/deployment/` - 部署相关脚本
- `scripts/testing/` - 测试脚本
- `scripts/utilities/` - 工具脚本
- `scripts/mcp/` - MCP 服务器相关
- `scripts/ci-cd/` - CI/CD 相关
详细信息请查看 [脚本索引](scripts/SCRIPT_INDEX.md)。
---
**最后更新:** 2025-10-08 02:55 UTC
**状态:** 服务运行正常Traefik配置架构已优化Authentik已集成

View File

@ -12,16 +12,18 @@
- "100.116.80.94:8300" # ash3c (美国)
tasks:
- name: Update APT cache
- name: Update APT cache (忽略 GPG 错误)
apt:
update_cache: yes
force_apt_get: yes
ignore_errors: yes
- name: Install consul via APT (假设源已存在)
apt:
name: consul={{ consul_version }}-*
state: present
update_cache: yes
register: consul_installed
force_apt_get: yes
ignore_errors: yes
- name: Create consul user (if not exists)
user:

View File

@ -1,59 +0,0 @@
---
# Ansible Inventory for Consul Client Deployment
all:
children:
consul_servers:
hosts:
master.tailnet-68f9.ts.net:
ansible_host: 100.117.106.136
region: korea
warden.tailnet-68f9.ts.net:
ansible_host: 100.122.197.112
region: beijing
ash3c.tailnet-68f9.ts.net:
ansible_host: 100.116.80.94
region: usa
nomad_servers:
hosts:
# Nomad Server 节点也需要 Consul Client
semaphore.tailnet-68f9.ts.net:
ansible_host: 100.116.158.95
region: korea
ch3.tailnet-68f9.ts.net:
ansible_host: 100.86.141.112
region: switzerland
ash1d.tailnet-68f9.ts.net:
ansible_host: 100.81.26.3
region: usa
ash2e.tailnet-68f9.ts.net:
ansible_host: 100.103.147.94
region: usa
ch2.tailnet-68f9.ts.net:
ansible_host: 100.90.159.68
region: switzerland
de.tailnet-68f9.ts.net:
ansible_host: 100.120.225.29
region: germany
onecloud1.tailnet-68f9.ts.net:
ansible_host: 100.98.209.50
region: unknown
nomad_clients:
hosts:
# 需要部署 Consul Client 的节点
influxdb1.tailnet-68f9.ts.net:
ansible_host: "{{ influxdb1_ip }}" # 需要填入实际IP
region: beijing
browser.tailnet-68f9.ts.net:
ansible_host: "{{ browser_ip }}" # 需要填入实际IP
region: beijing
# hcp1 已经有 Consul Client可选择重新配置
# hcp1.tailnet-68f9.ts.net:
# ansible_host: 100.97.62.111
# region: beijing
vars:
ansible_user: root
ansible_ssh_private_key_file: ~/.ssh/id_rsa
consul_datacenter: dc1

192
authentik-traefik-setup.md Normal file
View File

@ -0,0 +1,192 @@
# Authentik Traefik 代理配置指南
## 配置概述
已为Authentik配置Traefik代理实现SSL证书自动管理和域名访问。
## 配置详情
### Authentik服务信息
- **容器IP**: 192.168.31.144
- **HTTP端口**: 9000 (可选)
- **HTTPS端口**: 9443 (主要)
- **容器状态**: 运行正常
- **SSH认证**: 已配置密钥认证,无需密码
### Traefik代理配置
#### 服务配置
```yaml
authentik-cluster:
loadBalancer:
servers:
- url: "https://192.168.31.144:9443" # Authentik容器HTTPS端口
serversTransport: authentik-insecure
healthCheck:
path: "/flows/-/default/authentication/"
interval: "30s"
timeout: "15s"
```
#### 路由配置
```yaml
authentik-ui:
rule: "Host(`authentik.git-4ta.live`)"
service: authentik-cluster
entryPoints:
- websecure
tls:
certResolver: cloudflare
```
## DNS配置要求
需要在Cloudflare中为以下域名添加DNS记录
### A记录
```
authentik.git-4ta.live A <hcp1的Tailscale IP>
```
### 获取hcp1的Tailscale IP
```bash
# 方法1: 通过Tailscale命令
tailscale ip -4 hcp1
# 方法2: 通过ping
ping hcp1.tailnet-68f9.ts.net
```
## 部署步骤
### 1. 更新Traefik配置
```bash
# 重新部署Traefik job
nomad job run components/traefik/jobs/traefik-cloudflare-git4ta-live.nomad
```
### 2. 配置DNS记录
在Cloudflare Dashboard中添加A记录
- **Name**: authentik
- **Type**: A
- **Content**: <hcp1的Tailscale IP>
- **TTL**: Auto
### 3. 验证SSL证书
```bash
# 检查证书是否自动生成
curl -I https://authentik.git-4ta.live
# 预期返回200状态码和有效的SSL证书
```
### 4. 测试访问
```bash
# 访问Authentik Web UI
open https://authentik.git-4ta.live
# 或使用curl测试
curl -k https://authentik.git-4ta.live
```
## 健康检查
### Authentik健康检查端点
- **路径**: `/if/flow/default-authentication-flow/`
- **间隔**: 30秒
- **超时**: 15秒
### 检查服务状态
```bash
# 检查Traefik路由状态
curl -s http://hcp1.tailnet-68f9.ts.net:8080/api/http/routers | jq '.[] | select(.name=="authentik-ui")'
# 检查服务健康状态
curl -s http://hcp1.tailnet-68f9.ts.net:8080/api/http/services | jq '.[] | select(.name=="authentik-cluster")'
```
## 故障排除
### 常见问题
1. **DNS解析问题**
```bash
# 检查DNS解析
nslookup authentik.git-4ta.live
# 检查Cloudflare DNS
dig @1.1.1.1 authentik.git-4ta.live
```
2. **SSL证书问题**
```bash
# 检查证书状态
openssl s_client -connect authentik.git-4ta.live:443 -servername authentik.git-4ta.live
# 检查Traefik证书存储
ls -la /opt/traefik/certs/
```
3. **服务连接问题**
```bash
# 检查Authentik容器状态
sshpass -p "Aa313131@ben" ssh -o StrictHostKeyChecking=no root@pve "pct exec 113 -- netstat -tlnp | grep 9000"
# 检查Traefik日志
nomad logs -f traefik-cloudflare-v1
```
### 调试命令
```bash
# 检查Traefik配置
curl -s http://hcp1.tailnet-68f9.ts.net:8080/api/rawdata | jq '.routers[] | select(.name=="authentik-ui")'
# 检查服务发现
curl -s http://hcp1.tailnet-68f9.ts.net:8080/api/rawdata | jq '.services[] | select(.name=="authentik-cluster")'
# 检查中间件
curl -s http://hcp1.tailnet-68f9.ts.net:8080/api/rawdata | jq '.middlewares'
```
## 下一步
配置完成后,可以:
1. **配置OAuth2 Provider**
- 在Authentik中创建OAuth2应用
- 配置回调URL
- 设置客户端凭据
2. **集成HCP服务**
- 为Nomad UI配置OAuth2认证
- 为Consul UI配置OAuth2认证
- 为Vault配置OIDC认证
3. **用户管理**
- 创建用户组和权限
- 配置多因素认证
- 设置访问策略
## 安全注意事项
1. **网络安全**
- Authentik容器使用内网IP (192.168.31.144)
- 通过Traefik代理访问不直接暴露
2. **SSL/TLS**
- 使用Cloudflare自动SSL证书
- 强制HTTPS重定向
- 支持现代TLS协议
3. **访问控制**
- 建议配置IP白名单
- 启用多因素认证
- 定期轮换密钥
---
**配置完成时间**: $(date)
**配置文件**: `/root/mgmt/components/traefik/jobs/traefik-cloudflare-git4ta-live.nomad`
**域名**: `authentik.git-4ta.live`
**状态**: 待部署和测试

View File

@ -0,0 +1,99 @@
# Nomad Jobs 备份
**备份时间**: 2025-10-04 07:44:11
**备份原因**: 所有服务正常运行SSL证书已配置完成
## 当前运行状态
### ✅ 已部署并正常工作的服务
1. **Traefik** (`traefik-cloudflare-v1`)
- 文件: `components/traefik/jobs/traefik-cloudflare.nomad`
- 状态: 运行中SSL证书正常
- 域名: `*.git4ta.me`
- 证书: Let's Encrypt (Cloudflare DNS Challenge)
2. **Vault** (`vault-cluster`)
- 文件: `nomad-jobs/vault-cluster.nomad`
- 状态: 三节点集群运行中
- 节点: ch4, ash3c, warden
- 配置: 存储在 Consul KV `vault/config`
3. **Waypoint** (`waypoint-server`)
- 文件: `waypoint-server.nomad`
- 状态: 运行中
- 节点: hcp1
- Web UI: `https://waypoint.git4ta.me/auth/token`
### 🔧 关键配置
#### Traefik 配置要点
- 使用 Cloudflare DNS Challenge 获取 SSL 证书
- 证书存储: `/local/acme.json` (本地存储)
- 域名: `git4ta.me`
- 服务路由: consul, nomad, vault, waypoint
#### Vault 配置要点
- 三节点高可用集群
- 配置统一存储在 Consul KV
- 使用 `exec` driver
- 服务注册到 Consul
#### Waypoint 配置要点
- 使用 `raw_exec` driver
- HTTPS API: 9701, gRPC: 9702
- 已引导并获取认证 token
### 📋 服务端点
- `https://consul.git4ta.me` → Consul UI
- `https://traefik.git4ta.me` → Traefik Dashboard
- `https://nomad.git4ta.me` → Nomad UI
- `https://vault.git4ta.me` → Vault UI
- `https://waypoint.git4ta.me/auth/token` → Waypoint UI
### 🔑 重要凭据
#### Vault
- Unseal Keys: 存储在 Consul KV `vault/unseal-keys`
- Root Token: 存储在 Consul KV `vault/root-token`
- 详细文档: `/root/mgmt/README-Vault.md`
#### Waypoint
- Auth Token: 存储在 Consul KV `waypoint/auth-token`
- 详细文档: `/root/mgmt/README-Waypoint.md`
### 🚀 部署命令
```bash
# 部署 Traefik
nomad job run components/traefik/jobs/traefik-cloudflare.nomad
# 部署 Vault
nomad job run nomad-jobs/vault-cluster.nomad
# 部署 Waypoint
nomad job run waypoint-server.nomad
```
### 📝 注意事项
1. **证书管理**: 证书存储在 Traefik 容器的 `/local/acme.json`,容器重启会丢失
2. **Vault 配置**: 所有配置通过 Consul KV 动态加载,修改后需要重启 job
3. **网络配置**: 所有服务使用 Tailscale 网络地址
4. **备份策略**: 建议定期备份 Consul KV 中的配置和凭据
### 🔄 恢复步骤
如需恢复到此状态:
1. 恢复 Consul KV 配置
2. 按顺序部署: Traefik → Vault → Waypoint
3. 验证所有服务端点可访问
4. 检查 SSL 证书状态
---
**备份完成时间**: 2025-10-04 07:44:11
**备份者**: AI Assistant
**状态**: 所有服务正常运行 ✅

View File

@ -0,0 +1,19 @@
# Consul 配置
## 部署
```bash
nomad job run components/consul/jobs/consul-cluster.nomad
```
## Job 信息
- **Job 名称**: `consul-cluster-nomad`
- **类型**: service
- **节点**: master, ash3c, warden
## 访问方式
- Master: `http://master.tailnet-68f9.ts.net:8500`
- Ash3c: `http://ash3c.tailnet-68f9.ts.net:8500`
- Warden: `http://warden.tailnet-68f9.ts.net:8500`

View File

@ -0,0 +1,88 @@
# Consul配置文件
# 此文件包含Consul的完整配置包括变量和存储相关设置
# 基础配置
data_dir = "/opt/consul/data"
raft_dir = "/opt/consul/raft"
# 启用UI
ui_config {
enabled = true
}
# 数据中心配置
datacenter = "dc1"
# 服务器配置
server = true
bootstrap_expect = 3
# 网络配置
client_addr = "0.0.0.0"
bind_addr = "{{ GetInterfaceIP `eth0` }}"
advertise_addr = "{{ GetInterfaceIP `eth0` }}"
# 端口配置
ports {
dns = 8600
http = 8500
https = -1
grpc = 8502
grpc_tls = 8503
serf_lan = 8301
serf_wan = 8302
server = 8300
}
# 集群连接
retry_join = ["100.117.106.136", "100.116.80.94", "100.122.197.112"]
# 服务发现
enable_service_script = true
enable_script_checks = true
enable_local_script_checks = true
# 性能调优
performance {
raft_multiplier = 1
}
# 日志配置
log_level = "INFO"
enable_syslog = false
log_file = "/var/log/consul/consul.log"
# 安全配置
encrypt = "YourEncryptionKeyHere"
# 连接配置
reconnect_timeout = "30s"
reconnect_timeout_wan = "30s"
session_ttl_min = "10s"
# Autopilot配置
autopilot {
cleanup_dead_servers = true
last_contact_threshold = "200ms"
max_trailing_logs = 250
server_stabilization_time = "10s"
redundancy_zone_tag = ""
disable_upgrade_migration = false
upgrade_version_tag = ""
}
# 快照配置
snapshot {
enabled = true
interval = "24h"
retain = 30
name = "consul-snapshot-{{.Timestamp}}"
}
# 备份配置
backup {
enabled = true
interval = "6h"
retain = 7
name = "consul-backup-{{.Timestamp}}"
}

View File

@ -0,0 +1,93 @@
# Consul配置模板文件
# 此文件使用Consul模板语法从KV存储中动态获取配置
# 遵循 config/{environment}/{provider}/{region_or_service}/{key} 格式
# 基础配置
data_dir = "{{ keyOrDefault `config/` + env "ENVIRONMENT" + `/consul/cluster/data_dir` `/opt/consul/data` }}"
raft_dir = "{{ keyOrDefault `config/` + env "ENVIRONMENT" + `/consul/cluster/raft_dir` `/opt/consul/raft` }}"
# 启用UI
ui_config {
enabled = {{ keyOrDefault `config/` + env "ENVIRONMENT" + `/consul/ui/enabled` `true` }}
}
# 数据中心配置
datacenter = "{{ keyOrDefault `config/` + env "ENVIRONMENT" + `/consul/cluster/datacenter` `dc1` }}"
# 服务器配置
server = true
bootstrap_expect = {{ keyOrDefault `config/` + env "ENVIRONMENT" + `/consul/cluster/bootstrap_expect` `3` }}
# 网络配置
client_addr = "{{ keyOrDefault `config/` + env "ENVIRONMENT" + `/consul/network/client_addr` `0.0.0.0` }}"
bind_addr = "{{ GetInterfaceIP (keyOrDefault `config/` + env "ENVIRONMENT" + `/consul/network/bind_interface` `ens160`) }}"
advertise_addr = "{{ GetInterfaceIP (keyOrDefault `config/` + env "ENVIRONMENT" + `/consul/network/advertise_interface` `ens160`) }}"
# 端口配置
ports {
dns = {{ keyOrDefault `config/` + env "ENVIRONMENT" + `/consul/ports/dns` `8600` }}
http = {{ keyOrDefault `config/` + env "ENVIRONMENT" + `/consul/ports/http` `8500` }}
https = {{ keyOrDefault `config/` + env "ENVIRONMENT" + `/consul/ports/https` `-1` }}
grpc = {{ keyOrDefault `config/` + env "ENVIRONMENT" + `/consul/ports/grpc` `8502` }}
grpc_tls = {{ keyOrDefault `config/` + env "ENVIRONMENT" + `/consul/ports/grpc_tls` `8503` }}
serf_lan = {{ keyOrDefault `config/` + env "ENVIRONMENT" + `/consul/ports/serf_lan` `8301` }}
serf_wan = {{ keyOrDefault `config/` + env "ENVIRONMENT" + `/consul/ports/serf_wan` `8302` }}
server = {{ keyOrDefault `config/` + env "ENVIRONMENT" + `/consul/ports/server` `8300` }}
}
# 集群连接 - 动态获取节点IP
retry_join = [
"{{ keyOrDefault `config/` + env "ENVIRONMENT" + `/consul/nodes/master/ip` `100.117.106.136` }}",
"{{ keyOrDefault `config/` + env "ENVIRONMENT" + `/consul/nodes/ash3c/ip` `100.116.80.94` }}",
"{{ keyOrDefault `config/` + env "ENVIRONMENT" + `/consul/nodes/warden/ip` `100.122.197.112` }}"
]
# 服务发现
enable_service_script = {{ keyOrDefault `config/` + env "ENVIRONMENT" + `/consul/service/enable_service_script` `true` }}
enable_script_checks = {{ keyOrDefault `config/` + env "ENVIRONMENT" + `/consul/service/enable_script_checks` `true` }}
enable_local_script_checks = {{ keyOrDefault `config/` + env "ENVIRONMENT" + `/consul/service/enable_local_script_checks` `true` }}
# 性能调优
performance {
raft_multiplier = {{ keyOrDefault `config/` + env "ENVIRONMENT" + `/consul/performance/raft_multiplier` `1` }}
}
# 日志配置
log_level = "{{ keyOrDefault `config/` + env "ENVIRONMENT" + `/consul/cluster/log_level` `INFO` }}"
enable_syslog = {{ keyOrDefault `config/` + env "ENVIRONMENT" + `/consul/log/enable_syslog` `false` }}
log_file = "{{ keyOrDefault `config/` + env "ENVIRONMENT" + `/consul/log/log_file` `/var/log/consul/consul.log` }}"
# 安全配置
encrypt = "{{ keyOrDefault `config/` + env "ENVIRONMENT" + `/consul/cluster/encrypt_key` `YourEncryptionKeyHere` }}"
# 连接配置
reconnect_timeout = "{{ keyOrDefault `config/` + env "ENVIRONMENT" + `/consul/connection/reconnect_timeout` `30s` }}"
reconnect_timeout_wan = "{{ keyOrDefault `config/` + env "ENVIRONMENT" + `/consul/connection/reconnect_timeout_wan` `30s` }}"
session_ttl_min = "{{ keyOrDefault `config/` + env "ENVIRONMENT" + `/consul/connection/session_ttl_min` `10s` }}"
# Autopilot配置
autopilot {
cleanup_dead_servers = {{ keyOrDefault `config/` + env "ENVIRONMENT" + `/consul/autopilot/cleanup_dead_servers` `true` }}
last_contact_threshold = "{{ keyOrDefault `config/` + env "ENVIRONMENT" + `/consul/autopilot/last_contact_threshold` `200ms` }}"
max_trailing_logs = {{ keyOrDefault `config/` + env "ENVIRONMENT" + `/consul/autopilot/max_trailing_logs` `250` }}
server_stabilization_time = "{{ keyOrDefault `config/` + env "ENVIRONMENT" + `/consul/autopilot/server_stabilization_time` `10s` }}"
redundancy_zone_tag = ""
disable_upgrade_migration = {{ keyOrDefault `config/` + env "ENVIRONMENT" + `/consul/autopilot/disable_upgrade_migration` `false` }}
upgrade_version_tag = ""
}
# 快照配置
snapshot {
enabled = {{ keyOrDefault `config/` + env "ENVIRONMENT" + `/consul/snapshot/enabled` `true` }}
interval = "{{ keyOrDefault `config/` + env "ENVIRONMENT" + `/consul/snapshot/interval` `24h` }}"
retain = {{ keyOrDefault `config/` + env "ENVIRONMENT" + `/consul/snapshot/retain` `30` }}
name = "{{ keyOrDefault `config/` + env "ENVIRONMENT" + `/consul/snapshot/name` `consul-snapshot-{{.Timestamp}}` }}"
}
# 备份配置
backup {
enabled = {{ keyOrDefault `config/` + env "ENVIRONMENT" + `/consul/backup/enabled` `true` }}
interval = "{{ keyOrDefault `config/` + env "ENVIRONMENT" + `/consul/backup/interval` `6h` }}"
retain = {{ keyOrDefault `config/` + env "ENVIRONMENT" + `/consul/backup/retain` `7` }}
name = "{{ keyOrDefault `config/` + env "ENVIRONMENT" + `/consul/backup/name` `consul-backup-{{.Timestamp}}` }}"
}

View File

@ -0,0 +1,50 @@
job "consul-clients-additional" {
datacenters = ["dc1"]
type = "service"
constraint {
attribute = "${node.unique.name}"
operator = "regexp"
value = "ch2|ch3|de"
}
group "consul-client" {
count = 3
task "consul-client" {
driver = "exec"
config {
command = "/usr/bin/consul"
args = [
"agent",
"-config-dir=/etc/consul.d",
"-data-dir=/opt/consul",
"-node=${node.unique.name}",
"-bind=${attr.unique.network.ip-address}",
"-retry-join=warden.tailnet-68f9.ts.net:8301",
"-retry-join=ch4.tailnet-68f9.ts.net:8301",
"-retry-join=ash3c.tailnet-68f9.ts.net:8301",
"-client=0.0.0.0"
]
}
resources {
cpu = 100
memory = 128
}
service {
name = "consul-client"
port = "http"
check {
type = "http"
path = "/v1/status/leader"
interval = "30s"
timeout = "5s"
}
}
}
}
}

View File

@ -0,0 +1,154 @@
job "consul-clients-dedicated" {
datacenters = ["dc1"]
type = "service"
group "consul-client-hcp1" {
constraint {
attribute = "${node.unique.name}"
value = "hcp1"
}
network {
port "http" {
static = 8500
}
}
task "consul-client" {
driver = "exec"
config {
command = "/usr/bin/consul"
args = [
"agent",
"-data-dir=/opt/consul",
"-node=hcp1",
"-bind=100.97.62.111",
"-advertise=100.97.62.111",
"-retry-join=hcp1.tailnet-68f9.ts.net:80",
"-client=0.0.0.0",
"-http-port=8500",
"-datacenter=dc1"
]
}
resources {
cpu = 100
memory = 128
}
service {
name = "consul-client"
port = "http"
check {
type = "script"
command = "consul"
args = ["members"]
interval = "10s"
timeout = "3s"
}
}
}
}
group "consul-client-influxdb1" {
constraint {
attribute = "${node.unique.name}"
value = "influxdb1"
}
network {
port "http" {
static = 8500
}
}
task "consul-client" {
driver = "exec"
config {
command = "/usr/bin/consul"
args = [
"agent",
"-data-dir=/opt/consul",
"-node=influxdb1",
"-bind=100.100.7.4",
"-advertise=100.100.7.4",
"-retry-join=hcp1.tailnet-68f9.ts.net:80",
"-client=0.0.0.0",
"-http-port=8500",
"-datacenter=dc1"
]
}
resources {
cpu = 100
memory = 128
}
service {
name = "consul-client"
port = "http"
check {
type = "script"
command = "consul"
args = ["members"]
interval = "10s"
timeout = "3s"
}
}
}
}
group "consul-client-browser" {
constraint {
attribute = "${node.unique.name}"
value = "browser"
}
network {
port "http" {
static = 8500
}
}
task "consul-client" {
driver = "exec"
config {
command = "/usr/bin/consul"
args = [
"agent",
"-data-dir=/opt/consul",
"-node=browser",
"-bind=100.116.112.45",
"-advertise=100.116.112.45",
"-retry-join=hcp1.tailnet-68f9.ts.net:80",
"-client=0.0.0.0",
"-http-port=8500",
"-datacenter=dc1"
]
}
resources {
cpu = 100
memory = 128
}
service {
name = "consul-client"
port = "http"
check {
type = "script"
command = "consul"
args = ["members"]
interval = "10s"
timeout = "3s"
}
}
}
}
}

View File

@ -0,0 +1,66 @@
job "consul-clients-dedicated" {
datacenters = ["dc1"]
type = "service"
constraint {
attribute = "${node.unique.name}"
operator = "regexp"
value = "hcp1|influxdb1|browser"
}
group "consul-client" {
count = 3
update {
max_parallel = 3
min_healthy_time = "5s"
healthy_deadline = "2m"
progress_deadline = "5m"
auto_revert = false
}
network {
port "http" {
static = 8500
}
}
task "consul-client" {
driver = "exec"
config {
command = "/usr/bin/consul"
args = [
"agent",
"-data-dir=/opt/consul",
"-node=${node.unique.name}",
"-bind=${attr.unique.network.ip-address}",
"-advertise=${attr.unique.network.ip-address}",
"-retry-join=warden.tailnet-68f9.ts.net:8301",
"-retry-join=ch4.tailnet-68f9.ts.net:8301",
"-retry-join=ash3c.tailnet-68f9.ts.net:8301",
"-client=0.0.0.0",
"-http-port=${NOMAD_PORT_http}",
"-datacenter=dc1"
]
}
resources {
cpu = 100
memory = 128
}
service {
name = "consul-client"
port = "http"
check {
type = "http"
path = "/v1/status/leader"
interval = "10s"
timeout = "3s"
}
}
}
}
}

View File

@ -0,0 +1,43 @@
job "consul-clients" {
datacenters = ["dc1"]
type = "system"
group "consul-client" {
count = 0 # system job, runs on all nodes
task "consul-client" {
driver = "exec"
config {
command = "/usr/bin/consul"
args = [
"agent",
"-config-dir=/etc/consul.d",
"-data-dir=/opt/consul",
"-node=${node.unique.name}",
"-bind=${attr.unique.network.ip-address}",
"-retry-join=warden.tailnet-68f9.ts.net:8301",
"-retry-join=ch4.tailnet-68f9.ts.net:8301",
"-retry-join=ash3c.tailnet-68f9.ts.net:8301"
]
}
resources {
cpu = 100
memory = 128
}
service {
name = "consul-client"
port = "http"
check {
type = "http"
path = "/v1/status/leader"
interval = "30s"
timeout = "5s"
}
}
}
}
}

View File

@ -0,0 +1,115 @@
job "consul-cluster-nomad" {
datacenters = ["dc1"]
type = "service"
group "consul-ch4" {
constraint {
attribute = "${node.unique.name}"
value = "ch4"
}
task "consul" {
driver = "exec"
config {
command = "consul"
args = [
"agent",
"-server",
"-bootstrap-expect=3",
"-data-dir=/opt/nomad/data/consul",
"-client=0.0.0.0",
"-bind=100.117.106.136",
"-advertise=100.117.106.136",
"-retry-join=100.116.80.94",
"-retry-join=100.122.197.112",
"-ui",
"-http-port=8500",
"-server-port=8300",
"-serf-lan-port=8301",
"-serf-wan-port=8302"
]
}
resources {
cpu = 300
memory = 512
}
}
}
group "consul-ash3c" {
constraint {
attribute = "${node.unique.name}"
value = "ash3c"
}
task "consul" {
driver = "exec"
config {
command = "consul"
args = [
"agent",
"-server",
"-bootstrap-expect=3",
"-data-dir=/opt/nomad/data/consul",
"-client=0.0.0.0",
"-bind=100.116.80.94",
"-advertise=100.116.80.94",
"-retry-join=100.117.106.136",
"-retry-join=100.122.197.112",
"-ui",
"-http-port=8500",
"-server-port=8300",
"-serf-lan-port=8301",
"-serf-wan-port=8302"
]
}
resources {
cpu = 300
memory = 512
}
}
}
group "consul-warden" {
constraint {
attribute = "${node.unique.name}"
value = "warden"
}
task "consul" {
driver = "exec"
config {
command = "consul"
args = [
"agent",
"-server",
"-bootstrap-expect=3",
"-data-dir=/opt/nomad/data/consul",
"-client=0.0.0.0",
"-bind=100.122.197.112",
"-advertise=100.122.197.112",
"-retry-join=100.117.106.136",
"-retry-join=100.116.80.94",
"-ui",
"-http-port=8500",
"-server-port=8300",
"-serf-lan-port=8301",
"-serf-wan-port=8302"
]
}
resources {
cpu = 300
memory = 512
}
}
}
}

View File

@ -0,0 +1,66 @@
job "consul-ui-service" {
datacenters = ["dc1"]
type = "service"
group "consul-ui" {
count = 1
constraint {
attribute = "${node.unique.name}"
value = "warden"
}
network {
mode = "host"
port "http" {
static = 8500
host_network = "tailscale0"
}
}
service {
name = "consul-ui"
port = "http"
tags = [
"traefik.enable=true",
"traefik.http.routers.consul-ui.rule=PathPrefix(`/consul`)",
"traefik.http.routers.consul-ui.priority=100"
]
check {
type = "http"
path = "/v1/status/leader"
interval = "10s"
timeout = "2s"
}
}
task "consul-ui" {
driver = "exec"
config {
command = "/usr/bin/consul"
args = [
"agent",
"-server",
"-bootstrap-expect=3",
"-data-dir=/opt/nomad/data/consul",
"-client=0.0.0.0",
"-bind=100.122.197.112",
"-advertise=100.122.197.112",
"-retry-join=100.117.106.136",
"-retry-join=100.116.80.94",
"-ui",
"-http-port=8500"
]
}
resources {
cpu = 300
memory = 512
}
}
}
}

View File

@ -0,0 +1,8 @@
# Nomad 配置
## Jobs
- `install-podman-driver.nomad` - 安装 Podman 驱动
- `nomad-consul-config.nomad` - Nomad-Consul 配置
- `nomad-consul-setup.nomad` - Nomad-Consul 设置
- `nomad-nfs-volume.nomad` - NFS 卷配置

View File

@ -16,7 +16,7 @@ job "nomad-consul-config" {
command = "sh"
args = [
"-c",
"sed -i '/^consul {/,/^}/c\\consul {\\n address = \"master.tailnet-68f9.ts.net:8500,ash3c.tailnet-68f9.ts.net:8500,warden.tailnet-68f9.ts.net:8500\"\\n server_service_name = \"nomad\"\\n client_service_name = \"nomad-client\"\\n auto_advertise = true\\n server_auto_join = true\\n client_auto_join = false\\n}' /etc/nomad.d/nomad.hcl && systemctl restart nomad"
"sed -i '/^consul {/,/^}/c\\consul {\\n address = \"ch4.tailnet-68f9.ts.net:8500,ash3c.tailnet-68f9.ts.net:8500,warden.tailnet-68f9.ts.net:8500\"\\n server_service_name = \"nomad\"\\n client_service_name = \"nomad-client\"\\n auto_advertise = true\\n server_auto_join = true\\n client_auto_join = false\\n}' /etc/nomad.d/nomad.hcl && systemctl restart nomad"
]
}
@ -31,7 +31,7 @@ job "nomad-consul-config" {
constraint {
attribute = "${node.unique.name}"
operator = "regexp"
value = "master|ash3c|browser|influxdb1|hcp1|warden"
value = "ch4|ash3c|browser|influxdb1|hcp1|warden"
}
task "update-nomad-config" {
@ -41,7 +41,7 @@ job "nomad-consul-config" {
command = "sh"
args = [
"-c",
"sed -i '/^consul {/,/^}/c\\consul {\\n address = \"master.tailnet-68f9.ts.net:8500,ash3c.tailnet-68f9.ts.net:8500,warden.tailnet-68f9.ts.net:8500\"\\n server_service_name = \"nomad\"\\n client_service_name = \"nomad-client\"\\n auto_advertise = true\\n server_auto_join = false\\n client_auto_join = true\\n}' /etc/nomad.d/nomad.hcl && systemctl restart nomad"
"sed -i '/^consul {/,/^}/c\\consul {\\n address = \"ch4.tailnet-68f9.ts.net:8500,ash3c.tailnet-68f9.ts.net:8500,warden.tailnet-68f9.ts.net:8500\"\\n server_service_name = \"nomad\"\\n client_service_name = \"nomad-client\"\\n auto_advertise = true\\n server_auto_join = false\\n client_auto_join = true\\n}' /etc/nomad.d/nomad.hcl && systemctl restart nomad"
]
}

View File

@ -0,0 +1,23 @@
job "nomad-consul-setup" {
datacenters = ["dc1"]
type = "system"
group "nomad-config" {
task "setup-consul" {
driver = "exec"
config {
command = "sh"
args = [
"-c",
"if grep -q 'server.*enabled.*true' /etc/nomad.d/nomad.hcl; then sed -i '/^consul {/,/^}/c\\consul {\\n address = \"ch4.tailnet-68f9.ts.net:8500,ash3c.tailnet-68f9.ts.net:8500,warden.tailnet-68f9.ts.net:8500\"\\n server_service_name = \"nomad\"\\n client_service_name = \"nomad-client\"\\n auto_advertise = true\\n server_auto_join = true\\n client_auto_join = false\\n}' /etc/nomad.d/nomad.hcl; else sed -i '/^consul {/,/^}/c\\consul {\\n address = \"ch4.tailnet-68f9.ts.net:8500,ash3c.tailnet-68f9.ts.net:8500,warden.tailnet-68f9.ts.net:8500\"\\n server_service_name = \"nomad\"\\n client_service_name = \"nomad-client\"\\n auto_advertise = true\\n server_auto_join = false\\n client_auto_join = true\\n}' /etc/nomad.d/nomad.hcl; fi && systemctl restart nomad"
]
}
resources {
cpu = 100
memory = 128
}
}
}
}

View File

@ -0,0 +1,28 @@
# Traefik 配置
## 部署
```bash
nomad job run components/traefik/jobs/traefik.nomad
```
## 配置特点
- 明确绑定 Tailscale IP (100.97.62.111)
- 地理位置优化的 Consul 集群顺序(北京 → 韩国 → 美国)
- 适合跨太平洋网络的宽松健康检查
- 无服务健康检查,避免 flapping
## 访问方式
- Dashboard: `http://hcp1.tailnet-68f9.ts.net:8080/dashboard/`
- 直接 IP: `http://100.97.62.111:8080/dashboard/`
- Consul LB: `http://hcp1.tailnet-68f9.ts.net:80`
## 故障排除
如果遇到服务 flapping 问题:
1. 检查是否使用了 RFC1918 私有地址
2. 确认 Tailscale 网络连通性
3. 调整健康检查间隔时间
4. 考虑地理位置对网络延迟的影响

View File

@ -0,0 +1,28 @@
job "test-simple" {
datacenters = ["dc1"]
type = "service"
group "test" {
count = 1
constraint {
attribute = "${node.unique.name}"
value = "warden"
}
task "test" {
driver = "exec"
config {
command = "sleep"
args = ["3600"]
}
resources {
cpu = 100
memory = 64
}
}
}
}

View File

@ -0,0 +1,213 @@
job "traefik-cloudflare-v1" {
datacenters = ["dc1"]
type = "service"
group "traefik" {
count = 1
constraint {
attribute = "${node.unique.name}"
value = "hcp1"
}
network {
mode = "host"
port "http" {
static = 80
host_network = "tailscale0"
}
port "https" {
static = 443
host_network = "tailscale0"
}
port "traefik" {
static = 8080
host_network = "tailscale0"
}
}
task "traefik" {
driver = "exec"
config {
command = "/usr/local/bin/traefik"
args = [
"--configfile=/local/traefik.yml"
]
}
template {
data = <<EOF
api:
dashboard: true
insecure: true
entryPoints:
web:
address: "0.0.0.0:80"
http:
redirections:
entrypoint:
to: websecure
scheme: https
permanent: true
websecure:
address: "0.0.0.0:443"
traefik:
address: "0.0.0.0:8080"
providers:
consulCatalog:
endpoint:
address: "warden.tailnet-68f9.ts.net:8500"
scheme: "http"
watch: true
exposedByDefault: false
prefix: "traefik"
defaultRule: "Host(`{{ .Name }}.git4ta.me`)"
file:
filename: /local/dynamic.yml
watch: true
certificatesResolvers:
cloudflare:
acme:
email: houzhongxu.houzhongxu@gmail.com
storage: /local/acme.json
dnsChallenge:
provider: cloudflare
delayBeforeCheck: 30s
resolvers:
- "1.1.1.1:53"
- "1.0.0.1:53"
log:
level: DEBUG
EOF
destination = "local/traefik.yml"
}
template {
data = <<EOF
http:
serversTransports:
waypoint-insecure:
insecureSkipVerify: true
middlewares:
consul-stripprefix:
stripPrefix:
prefixes:
- "/consul"
waypoint-auth:
replacePathRegex:
regex: "^/auth/token(.*)$"
replacement: "/auth/token$1"
services:
consul-cluster:
loadBalancer:
servers:
- url: "http://warden.tailnet-68f9.ts.net:8500" # 北京,优先
- url: "http://ch4.tailnet-68f9.ts.net:8500" # 韩国,备用
- url: "http://ash3c.tailnet-68f9.ts.net:8500" # 美国,备用
healthCheck:
path: "/v1/status/leader"
interval: "30s"
timeout: "15s"
nomad-cluster:
loadBalancer:
servers:
- url: "http://warden.tailnet-68f9.ts.net:4646" # 北京,优先
- url: "http://ch4.tailnet-68f9.ts.net:4646" # 韩国,备用
- url: "http://ash3c.tailnet-68f9.ts.net:4646" # 美国,备用
healthCheck:
path: "/v1/status/leader"
interval: "30s"
timeout: "15s"
waypoint-cluster:
loadBalancer:
servers:
- url: "https://hcp1.tailnet-68f9.ts.net:9701" # hcp1 节点 HTTPS API
serversTransport: waypoint-insecure
vault-cluster:
loadBalancer:
servers:
- url: "http://ch4.tailnet-68f9.ts.net:8200" # 韩国,活跃节点
- url: "http://ash3c.tailnet-68f9.ts.net:8200" # 美国,备用节点
- url: "http://warden.tailnet-68f9.ts.net:8200" # 北京,备用节点
healthCheck:
path: "/v1/sys/health"
interval: "30s"
timeout: "15s"
routers:
consul-api:
rule: "Host(`consul.git4ta.me`)"
service: consul-cluster
middlewares:
- consul-stripprefix
entryPoints:
- websecure
tls:
certResolver: cloudflare
traefik-dashboard:
rule: "Host(`traefik.git4ta.me`)"
service: dashboard@internal
middlewares:
- dashboard_redirect@internal
- dashboard_stripprefix@internal
entryPoints:
- websecure
tls:
certResolver: cloudflare
nomad-ui:
rule: "Host(`nomad.git4ta.me`)"
service: nomad-cluster
entryPoints:
- websecure
tls:
certResolver: cloudflare
waypoint-ui:
rule: "Host(`waypoint.git4ta.me`)"
service: waypoint-cluster
entryPoints:
- websecure
tls:
certResolver: cloudflare
vault-ui:
rule: "Host(`vault.git4ta.me`)"
service: vault-cluster
entryPoints:
- websecure
tls:
certResolver: cloudflare
EOF
destination = "local/dynamic.yml"
}
template {
data = <<EOF
CLOUDFLARE_EMAIL=houzhongxu.houzhongxu@gmail.com
CLOUDFLARE_DNS_API_TOKEN=HYT-cfZTP_jq6Xd9g3tpFMwxopOyIrf8LZpmGAI3
CLOUDFLARE_ZONE_API_TOKEN=HYT-cfZTP_jq6Xd9g3tpFMwxopOyIrf8LZpmGAI3
EOF
destination = "local/cloudflare.env"
env = true
}
resources {
cpu = 500
memory = 512
}
}
}
}

View File

@ -0,0 +1,217 @@
job "traefik-consul-kv" {
datacenters = ["dc1"]
type = "service"
group "traefik" {
count = 1
constraint {
attribute = "${node.unique.name}"
value = "hcp1"
}
network {
mode = "host"
port "http" {
static = 80
host_network = "tailscale0"
}
port "traefik" {
static = 8080
host_network = "tailscale0"
}
}
task "traefik" {
driver = "exec"
config {
command = "/usr/local/bin/traefik"
args = [
"--configfile=/local/traefik.yml"
]
}
template {
data = <<EOF
api:
dashboard: true
insecure: true
entryPoints:
web:
address: "0.0.0.0:80"
traefik:
address: "0.0.0.0:8080"
providers:
consulCatalog:
endpoint:
address: "warden.tailnet-68f9.ts.net:8500"
scheme: "http"
watch: true
file:
filename: /local/dynamic.yml
watch: true
metrics:
prometheus:
addEntryPointsLabels: true
addServicesLabels: true
addRoutersLabels: true
log:
level: INFO
EOF
destination = "local/traefik.yml"
}
template {
data = <<EOF
http:
middlewares:
consul-stripprefix:
stripPrefix:
prefixes:
- "/consul"
traefik-stripprefix:
stripPrefix:
prefixes:
- "/traefik"
nomad-stripprefix:
stripPrefix:
prefixes:
- "/nomad"
consul-redirect:
redirectRegex:
regex: "^/consul/?$"
replacement: "/consul/ui/"
permanent: false
nomad-redirect:
redirectRegex:
regex: "^/nomad/?$"
replacement: "/nomad/ui/"
permanent: false
traefik-redirect:
redirectRegex:
regex: "^/traefik/?$"
replacement: "/traefik/dashboard/"
permanent: false
services:
consul-cluster:
loadBalancer:
servers:
- url: "http://warden.tailnet-68f9.ts.net:8500" # 北京,优先
- url: "http://ch4.tailnet-68f9.ts.net:8500" # 韩国,备用
- url: "http://ash3c.tailnet-68f9.ts.net:8500" # 美国,备用
healthCheck:
path: "/v1/status/leader"
interval: "30s"
timeout: "15s"
nomad-cluster:
loadBalancer:
servers:
- url: "http://ch2.tailnet-68f9.ts.net:4646" # Nomad server leader
healthCheck:
path: "/v1/status/leader"
interval: "30s"
timeout: "15s"
routers:
consul-redirect:
rule: "Path(`/consul`) || Path(`/consul/`)"
service: consul-cluster
middlewares:
- consul-redirect
entryPoints:
- web
priority: 100
consul-ui:
rule: "PathPrefix(`/consul/ui`)"
service: consul-cluster
middlewares:
- consul-stripprefix
entryPoints:
- web
priority: 5
consul-api:
rule: "PathPrefix(`/consul/v1`)"
service: consul-cluster
middlewares:
- consul-stripprefix
entryPoints:
- web
priority: 5
traefik-api:
rule: "PathPrefix(`/traefik/api`)"
service: api@internal
middlewares:
- traefik-stripprefix
entryPoints:
- web
priority: 6
traefik-dashboard:
rule: "PathPrefix(`/traefik/dashboard`)"
service: dashboard@internal
middlewares:
- traefik-stripprefix
entryPoints:
- web
priority: 5
traefik-redirect:
rule: "Path(`/traefik`) || Path(`/traefik/`)"
middlewares:
- "traefik-redirect"
entryPoints:
- web
priority: 100
nomad-redirect:
rule: "Path(`/nomad`) || Path(`/nomad/`)"
service: nomad-cluster
middlewares:
- nomad-redirect
entryPoints:
- web
priority: 100
nomad-ui:
rule: "PathPrefix(`/nomad/ui`)"
service: nomad-cluster
middlewares:
- nomad-stripprefix
entryPoints:
- web
priority: 5
nomad-api:
rule: "PathPrefix(`/nomad/v1`)"
service: nomad-cluster
middlewares:
- nomad-stripprefix
entryPoints:
- web
priority: 5
EOF
destination = "local/dynamic.yml"
}
resources {
cpu = 500
memory = 512
}
}
}
}

View File

@ -0,0 +1,150 @@
job "traefik-consul-lb" {
datacenters = ["dc1"]
type = "service"
group "traefik" {
count = 1
constraint {
attribute = "${node.unique.name}"
value = "warden"
}
update {
min_healthy_time = "60s"
healthy_deadline = "5m"
progress_deadline = "10m"
auto_revert = false
}
network {
mode = "host"
port "http" {
static = 80
host_network = "tailscale0"
}
port "traefik" {
static = 8080
host_network = "tailscale0"
}
}
task "traefik" {
driver = "exec"
config {
command = "/usr/local/bin/traefik"
args = [
"--configfile=/local/traefik.yml"
]
}
template {
data = <<EOF
api:
dashboard: true
insecure: true
entryPoints:
web:
address: "hcp1.tailnet-68f9.ts.net:80"
traefik:
address: "100.97.62.111:8080"
providers:
file:
filename: /local/dynamic.yml
watch: true
metrics:
prometheus:
addEntryPointsLabels: true
addServicesLabels: true
addRoutersLabels: true
log:
level: INFO
EOF
destination = "local/traefik.yml"
}
template {
data = <<EOF
http:
middlewares:
consul-stripprefix:
stripPrefix:
prefixes:
- "/consul"
traefik-stripprefix:
stripPrefix:
prefixes:
- "/traefik"
services:
consul-cluster:
loadBalancer:
servers:
- url: "http://warden.tailnet-68f9.ts.net:8500" # 北京,优先
- url: "http://ch4.tailnet-68f9.ts.net:8500" # 韩国,备用
- url: "http://ash3c.tailnet-68f9.ts.net:8500" # 美国,备用
healthCheck:
path: "/v1/status/leader"
interval: "30s"
timeout: "15s"
routers:
consul-api:
rule: "PathPrefix(`/consul`)"
service: consul-cluster
middlewares:
- consul-stripprefix
entryPoints:
- web
traefik-dashboard:
rule: "PathPrefix(`/traefik`)"
service: dashboard@internal
middlewares:
- traefik-stripprefix
entryPoints:
- web
EOF
destination = "local/dynamic.yml"
}
resources {
cpu = 500
memory = 512
}
service {
name = "consul-lb"
port = "http"
check {
name = "consul-lb-health"
type = "http"
path = "/consul/v1/status/leader"
interval = "30s"
timeout = "5s"
}
}
service {
name = "traefik-dashboard"
port = "traefik"
check {
name = "traefik-dashboard-health"
type = "http"
path = "/api/rawdata"
interval = "30s"
timeout = "5s"
}
}
}
}
}

View File

@ -0,0 +1,40 @@
job "traefik-no-service" {
datacenters = ["dc1"]
type = "service"
group "traefik" {
count = 1
constraint {
attribute = "${node.unique.name}"
value = "hcp1"
}
network {
mode = "host"
port "http" {
static = 80
host_network = "tailscale0"
}
}
task "traefik" {
driver = "exec"
config {
command = "/usr/local/bin/traefik"
args = [
"--api.dashboard=true",
"--api.insecure=true",
"--providers.file.directory=/tmp",
"--entrypoints.web.address=:80"
]
}
resources {
cpu = 200
memory = 128
}
}
}
}

View File

@ -0,0 +1,68 @@
job "traefik-simple" {
datacenters = ["dc1"]
type = "service"
group "traefik" {
count = 1
constraint {
attribute = "${node.unique.name}"
value = "hcp1"
}
network {
mode = "host"
port "http" {
static = 80
host_network = "tailscale0"
}
port "traefik" {
static = 8080
host_network = "tailscale0"
}
}
task "traefik" {
driver = "exec"
config {
command = "/usr/local/bin/traefik"
args = [
"--configfile=/local/traefik.yml"
]
}
template {
data = <<EOF
api:
dashboard: true
insecure: true
entryPoints:
web:
address: "0.0.0.0:80"
traefik:
address: "0.0.0.0:8080"
providers:
consulCatalog:
endpoint:
address: "warden.tailnet-68f9.ts.net:8500"
scheme: "http"
watch: true
exposedByDefault: false
prefix: "traefik"
log:
level: INFO
EOF
destination = "local/traefik.yml"
}
resources {
cpu = 500
memory = 512
}
}
}
}

View File

@ -11,9 +11,9 @@ job "traefik-consul-lb" {
}
update {
min_healthy_time = "5s"
healthy_deadline = "10m"
progress_deadline = "15m"
min_healthy_time = "60s"
healthy_deadline = "5m"
progress_deadline = "10m"
auto_revert = false
}
@ -56,6 +56,12 @@ providers:
filename: /local/dynamic.yml
watch: true
metrics:
prometheus:
addEntryPointsLabels: true
addServicesLabels: true
addRoutersLabels: true
log:
level: INFO
EOF
@ -65,13 +71,24 @@ EOF
template {
data = <<EOF
http:
middlewares:
consul-stripprefix:
stripPrefix:
prefixes:
- "/consul"
traefik-stripprefix:
stripPrefix:
prefixes:
- "/traefik"
services:
consul-cluster:
loadBalancer:
servers:
- url: "http://warden.tailnet-68f9.ts.net:8500" # 北京,优先
- url: "http://master.tailnet-68f9.ts.net:8500" # 备用
- url: "http://ash3c.tailnet-68f9.ts.net:8500" # 备用
- url: "http://ch4.tailnet-68f9.ts.net:8500" # 韩国,备用
- url: "http://ash3c.tailnet-68f9.ts.net:8500" # 美国,备用
healthCheck:
path: "/v1/status/leader"
interval: "30s"
@ -79,8 +96,18 @@ http:
routers:
consul-api:
rule: "PathPrefix(`/`)"
rule: "PathPrefix(`/consul`)"
service: consul-cluster
middlewares:
- consul-stripprefix
entryPoints:
- web
traefik-dashboard:
rule: "PathPrefix(`/traefik`)"
service: dashboard@internal
middlewares:
- traefik-stripprefix
entryPoints:
- web
EOF
@ -92,6 +119,32 @@ EOF
memory = 512
}
service {
name = "consul-lb"
port = "http"
check {
name = "consul-lb-health"
type = "http"
path = "/consul/v1/status/leader"
interval = "30s"
timeout = "5s"
}
}
service {
name = "traefik-dashboard"
port = "traefik"
check {
name = "traefik-dashboard-health"
type = "http"
path = "/api/rawdata"
interval = "30s"
timeout = "5s"
}
}
}
}
}

View File

@ -0,0 +1,7 @@
# Vault 配置
## Jobs
- `vault-cluster-exec.nomad` - Vault 集群 (exec 驱动)
- `vault-cluster-podman.nomad` - Vault 集群 (podman 驱动)
- `vault-dev-warden.nomad` - Vault 开发环境

View File

@ -2,7 +2,7 @@ job "vault-cluster-exec" {
datacenters = ["dc1"]
type = "service"
group "vault-master" {
group "vault-ch4" {
count = 1
# 使用存在的属性替代consul版本检查
@ -14,7 +14,7 @@ job "vault-cluster-exec" {
constraint {
attribute = "${node.unique.name}"
value = "kr-master"
value = "ch4"
}
network {

View File

@ -0,0 +1,241 @@
job "vault-cluster-nomad" {
datacenters = ["dc1"]
type = "service"
group "vault-ch4" {
count = 1
constraint {
attribute = "${node.unique.name}"
operator = "="
value = "ch4"
}
network {
port "http" {
static = 8200
to = 8200
}
}
task "vault" {
driver = "exec"
consul {
namespace = "default"
}
resources {
cpu = 500
memory = 1024
}
env {
VAULT_ADDR = "http://127.0.0.1:8200"
}
# 从 consul 读取配置
template {
data = <<EOF
{{ key "vault/config" }}
EOF
destination = "local/vault.hcl"
perms = "644"
wait {
min = "2s"
max = "10s"
}
}
config {
command = "vault"
args = [
"server",
"-config=/local/vault.hcl"
]
}
restart {
attempts = 2
interval = "30m"
delay = "15s"
mode = "fail"
}
}
update {
max_parallel = 3
health_check = "checks"
min_healthy_time = "10s"
healthy_deadline = "5m"
progress_deadline = "10m"
auto_revert = true
canary = 0
}
migrate {
max_parallel = 1
health_check = "checks"
min_healthy_time = "10s"
healthy_deadline = "5m"
}
}
group "vault-ash3c" {
count = 1
constraint {
attribute = "${node.unique.name}"
operator = "="
value = "ash3c"
}
network {
port "http" {
static = 8200
to = 8200
}
}
task "vault" {
driver = "exec"
consul {
namespace = "default"
}
resources {
cpu = 500
memory = 1024
}
env {
VAULT_ADDR = "http://127.0.0.1:8200"
}
# 从 consul 读取配置
template {
data = <<EOF
{{ key "vault/config" }}
EOF
destination = "local/vault.hcl"
perms = "644"
wait {
min = "2s"
max = "10s"
}
}
config {
command = "vault"
args = [
"server",
"-config=/local/vault.hcl"
]
}
restart {
attempts = 2
interval = "30m"
delay = "15s"
mode = "fail"
}
}
update {
max_parallel = 3
health_check = "checks"
min_healthy_time = "10s"
healthy_deadline = "5m"
progress_deadline = "10m"
auto_revert = true
canary = 0
}
migrate {
max_parallel = 1
health_check = "checks"
min_healthy_time = "10s"
healthy_deadline = "5m"
}
}
group "vault-warden" {
count = 1
constraint {
attribute = "${node.unique.name}"
operator = "="
value = "warden"
}
network {
port "http" {
static = 8200
to = 8200
}
}
task "vault" {
driver = "exec"
consul {
namespace = "default"
}
resources {
cpu = 500
memory = 1024
}
env {
VAULT_ADDR = "http://127.0.0.1:8200"
}
# 从 consul 读取配置
template {
data = <<EOF
{{ key "vault/config" }}
EOF
destination = "local/vault.hcl"
perms = "644"
wait {
min = "2s"
max = "10s"
}
}
config {
command = "vault"
args = [
"server",
"-config=/local/vault.hcl"
]
}
restart {
attempts = 2
interval = "30m"
delay = "15s"
mode = "fail"
}
}
update {
max_parallel = 3
health_check = "checks"
min_healthy_time = "10s"
healthy_deadline = "5m"
progress_deadline = "10m"
auto_revert = true
canary = 0
}
migrate {
max_parallel = 1
health_check = "checks"
min_healthy_time = "10s"
healthy_deadline = "5m"
}
}
}

View File

@ -0,0 +1,157 @@
job "vault" {
datacenters = ["dc1"]
type = "service"
# 约束只在 warden、ch4、ash3c 节点上运行
constraint {
attribute = "${node.unique.name}"
operator = "regexp"
value = "^(warden|ch4|ash3c)$"
}
group "vault" {
count = 3
# 确保每个节点只运行一个实例
constraint {
operator = "distinct_hosts"
value = "true"
}
# 网络配置
network {
port "http" {
static = 8200
to = 8200
}
}
# 服务发现配置 - 包含版本信息
service {
name = "vault"
port = "http"
# 添加版本标签以避免检查拒绝
tags = [
"vault",
"secrets",
"version:1.20.3"
]
check {
name = "vault-health"
type = "http"
path = "/v1/sys/health"
interval = "10s"
timeout = "3s"
method = "GET"
}
# 健康检查配置
check {
name = "vault-sealed-check"
type = "script"
command = "/bin/sh"
args = ["-c", "vault status -format=json | jq -r '.sealed' | grep -q 'false'"]
interval = "30s"
timeout = "5s"
task = "vault"
}
}
# 任务配置
task "vault" {
driver = "raw_exec"
# 资源配置
resources {
cpu = 500
memory = 1024
}
# 环境变量
env {
VAULT_ADDR = "http://127.0.0.1:8200"
}
# 模板配置 - Vault 配置文件
template {
data = <<EOF
ui = true
storage "consul" {
address = "127.0.0.1:8500"
path = "vault"
}
# HTTP listener (不使用 TLS因为 nomad 会处理负载均衡)
listener "tcp" {
address = "0.0.0.0:8200"
tls_disable = 1
}
# 禁用 mlock 以避免权限问题
disable_mlock = true
# 日志配置
log_level = "INFO"
log_format = "json"
# 性能优化
max_lease_ttl = "168h"
default_lease_ttl = "24h"
# HA 配置
ha_storage "consul" {
address = "127.0.0.1:8500"
path = "vault"
}
EOF
destination = "local/vault.hcl"
perms = "644"
wait {
min = "2s"
max = "10s"
}
}
# 启动命令
config {
command = "/usr/bin/vault"
args = [
"agent",
"-config=/local/vault.hcl"
]
}
# 重启策略
restart {
attempts = 3
interval = "30m"
delay = "15s"
mode = "fail"
}
}
# 更新策略
update {
max_parallel = 1
health_check = "checks"
min_healthy_time = "10s"
healthy_deadline = "5m"
progress_deadline = "10m"
auto_revert = true
canary = 0
}
# 迁移策略
migrate {
max_parallel = 1
health_check = "checks"
min_healthy_time = "10s"
healthy_deadline = "5m"
}
}
}

View File

@ -0,0 +1,213 @@
job "traefik-cloudflare-v1" {
datacenters = ["dc1"]
type = "service"
group "traefik" {
count = 1
constraint {
attribute = "${node.unique.name}"
value = "hcp1"
}
network {
mode = "host"
port "http" {
static = 80
host_network = "tailscale0"
}
port "https" {
static = 443
host_network = "tailscale0"
}
port "traefik" {
static = 8080
host_network = "tailscale0"
}
}
task "traefik" {
driver = "exec"
config {
command = "/usr/local/bin/traefik"
args = [
"--configfile=/local/traefik.yml"
]
}
template {
data = <<EOF
api:
dashboard: true
insecure: true
entryPoints:
web:
address: "0.0.0.0:80"
http:
redirections:
entrypoint:
to: websecure
scheme: https
permanent: true
websecure:
address: "0.0.0.0:443"
traefik:
address: "0.0.0.0:8080"
providers:
consulCatalog:
endpoint:
address: "warden.tailnet-68f9.ts.net:8500"
scheme: "http"
watch: true
exposedByDefault: false
prefix: "traefik"
defaultRule: "Host(`{{ .Name }}.git4ta.me`)"
file:
filename: /local/dynamic.yml
watch: true
certificatesResolvers:
cloudflare:
acme:
email: houzhongxu.houzhongxu@gmail.com
storage: /local/acme.json
dnsChallenge:
provider: cloudflare
delayBeforeCheck: 30s
resolvers:
- "1.1.1.1:53"
- "1.0.0.1:53"
log:
level: DEBUG
EOF
destination = "local/traefik.yml"
}
template {
data = <<EOF
http:
serversTransports:
waypoint-insecure:
insecureSkipVerify: true
middlewares:
consul-stripprefix:
stripPrefix:
prefixes:
- "/consul"
waypoint-auth:
replacePathRegex:
regex: "^/auth/token(.*)$"
replacement: "/auth/token$1"
services:
consul-cluster:
loadBalancer:
servers:
- url: "http://warden.tailnet-68f9.ts.net:8500" # 北京,优先
- url: "http://ch4.tailnet-68f9.ts.net:8500" # 韩国,备用
- url: "http://ash3c.tailnet-68f9.ts.net:8500" # 美国,备用
healthCheck:
path: "/v1/status/leader"
interval: "30s"
timeout: "15s"
nomad-cluster:
loadBalancer:
servers:
- url: "http://warden.tailnet-68f9.ts.net:4646" # 北京,优先
- url: "http://ch4.tailnet-68f9.ts.net:4646" # 韩国,备用
- url: "http://ash3c.tailnet-68f9.ts.net:4646" # 美国,备用
healthCheck:
path: "/v1/status/leader"
interval: "30s"
timeout: "15s"
waypoint-cluster:
loadBalancer:
servers:
- url: "https://hcp1.tailnet-68f9.ts.net:9701" # hcp1 节点 HTTPS API
serversTransport: waypoint-insecure
vault-cluster:
loadBalancer:
servers:
- url: "http://ch4.tailnet-68f9.ts.net:8200" # 韩国,活跃节点
- url: "http://ash3c.tailnet-68f9.ts.net:8200" # 美国,备用节点
- url: "http://warden.tailnet-68f9.ts.net:8200" # 北京,备用节点
healthCheck:
path: "/v1/sys/health"
interval: "30s"
timeout: "15s"
routers:
consul-api:
rule: "Host(`consul.git4ta.me`)"
service: consul-cluster
middlewares:
- consul-stripprefix
entryPoints:
- websecure
tls:
certResolver: cloudflare
traefik-dashboard:
rule: "Host(`traefik.git4ta.me`)"
service: dashboard@internal
middlewares:
- dashboard_redirect@internal
- dashboard_stripprefix@internal
entryPoints:
- websecure
tls:
certResolver: cloudflare
nomad-ui:
rule: "Host(`nomad.git4ta.me`)"
service: nomad-cluster
entryPoints:
- websecure
tls:
certResolver: cloudflare
waypoint-ui:
rule: "Host(`waypoint.git4ta.me`)"
service: waypoint-cluster
entryPoints:
- websecure
tls:
certResolver: cloudflare
vault-ui:
rule: "Host(`vault.git4ta.me`)"
service: vault-cluster
entryPoints:
- websecure
tls:
certResolver: cloudflare
EOF
destination = "local/dynamic.yml"
}
template {
data = <<EOF
CLOUDFLARE_EMAIL=houzhongxu.houzhongxu@gmail.com
CLOUDFLARE_DNS_API_TOKEN=HYT-cfZTP_jq6Xd9g3tpFMwxopOyIrf8LZpmGAI3
CLOUDFLARE_ZONE_API_TOKEN=HYT-cfZTP_jq6Xd9g3tpFMwxopOyIrf8LZpmGAI3
EOF
destination = "local/cloudflare.env"
env = true
}
resources {
cpu = 500
memory = 512
}
}
}
}

View File

@ -0,0 +1,241 @@
job "vault-cluster-nomad" {
datacenters = ["dc1"]
type = "service"
group "vault-ch4" {
count = 1
constraint {
attribute = "${node.unique.name}"
operator = "="
value = "ch4"
}
network {
port "http" {
static = 8200
to = 8200
}
}
task "vault" {
driver = "exec"
consul {
namespace = "default"
}
resources {
cpu = 500
memory = 1024
}
env {
VAULT_ADDR = "http://127.0.0.1:8200"
}
# 从 consul 读取配置
template {
data = <<EOF
{{ key "vault/config" }}
EOF
destination = "local/vault.hcl"
perms = "644"
wait {
min = "2s"
max = "10s"
}
}
config {
command = "vault"
args = [
"server",
"-config=/local/vault.hcl"
]
}
restart {
attempts = 2
interval = "30m"
delay = "15s"
mode = "fail"
}
}
update {
max_parallel = 3
health_check = "checks"
min_healthy_time = "10s"
healthy_deadline = "5m"
progress_deadline = "10m"
auto_revert = true
canary = 0
}
migrate {
max_parallel = 1
health_check = "checks"
min_healthy_time = "10s"
healthy_deadline = "5m"
}
}
group "vault-ash3c" {
count = 1
constraint {
attribute = "${node.unique.name}"
operator = "="
value = "ash3c"
}
network {
port "http" {
static = 8200
to = 8200
}
}
task "vault" {
driver = "exec"
consul {
namespace = "default"
}
resources {
cpu = 500
memory = 1024
}
env {
VAULT_ADDR = "http://127.0.0.1:8200"
}
# 从 consul 读取配置
template {
data = <<EOF
{{ key "vault/config" }}
EOF
destination = "local/vault.hcl"
perms = "644"
wait {
min = "2s"
max = "10s"
}
}
config {
command = "vault"
args = [
"server",
"-config=/local/vault.hcl"
]
}
restart {
attempts = 2
interval = "30m"
delay = "15s"
mode = "fail"
}
}
update {
max_parallel = 3
health_check = "checks"
min_healthy_time = "10s"
healthy_deadline = "5m"
progress_deadline = "10m"
auto_revert = true
canary = 0
}
migrate {
max_parallel = 1
health_check = "checks"
min_healthy_time = "10s"
healthy_deadline = "5m"
}
}
group "vault-warden" {
count = 1
constraint {
attribute = "${node.unique.name}"
operator = "="
value = "warden"
}
network {
port "http" {
static = 8200
to = 8200
}
}
task "vault" {
driver = "exec"
consul {
namespace = "default"
}
resources {
cpu = 500
memory = 1024
}
env {
VAULT_ADDR = "http://127.0.0.1:8200"
}
# 从 consul 读取配置
template {
data = <<EOF
{{ key "vault/config" }}
EOF
destination = "local/vault.hcl"
perms = "644"
wait {
min = "2s"
max = "10s"
}
}
config {
command = "vault"
args = [
"server",
"-config=/local/vault.hcl"
]
}
restart {
attempts = 2
interval = "30m"
delay = "15s"
mode = "fail"
}
}
update {
max_parallel = 3
health_check = "checks"
min_healthy_time = "10s"
healthy_deadline = "5m"
progress_deadline = "10m"
auto_revert = true
canary = 0
}
migrate {
max_parallel = 1
health_check = "checks"
min_healthy_time = "10s"
healthy_deadline = "5m"
}
}
}

View File

@ -0,0 +1,49 @@
job "waypoint-server" {
datacenters = ["dc1"]
type = "service"
group "waypoint" {
count = 1
constraint {
attribute = "${node.unique.name}"
value = "hcp1"
}
network {
port "http" {
static = 9701
}
port "grpc" {
static = 9702
}
}
task "waypoint" {
driver = "raw_exec"
config {
command = "/usr/local/bin/waypoint"
args = [
"server", "run",
"-accept-tos",
"-vvv",
"-db=/opt/waypoint/waypoint.db",
"-listen-grpc=0.0.0.0:9702",
"-listen-http=0.0.0.0:9701"
]
}
resources {
cpu = 500
memory = 512
}
env {
WAYPOINT_LOG_LEVEL = "DEBUG"
}
}
}
}

1
cf-tokens.txt Normal file
View File

@ -0,0 +1 @@
CF Token: 0aPWoLaQ59l0nyL1jIVzZaEx2e41Gjgcfhn3ztJr

95
compile-nomad-armv7.sh Normal file
View File

@ -0,0 +1,95 @@
#!/bin/bash
# Nomad ARMv7 自动编译脚本
# 适用于 onecloud1 节点
set -e
echo "🚀 开始编译 Nomad ARMv7 版本..."
# 检查系统架构
ARCH=$(uname -m)
echo "📋 当前系统架构: $ARCH"
# 设置Go环境变量
export GOOS=linux
export GOARCH=arm
export GOARM=7
export CGO_ENABLED=0
echo "🔧 设置编译环境:"
echo " GOOS=$GOOS"
echo " GOARCH=$GOARCH"
echo " GOARM=$GOARM"
echo " CGO_ENABLED=$CGO_ENABLED"
# 检查Go版本
if ! command -v go &> /dev/null; then
echo "❌ Go未安装正在安装..."
# 安装Go (假设是Ubuntu/Debian系统)
sudo apt update
sudo apt install -y golang-go
fi
GO_VERSION=$(go version)
echo "✅ Go版本: $GO_VERSION"
# 创建编译目录
BUILD_DIR="/tmp/nomad-build"
mkdir -p $BUILD_DIR
cd $BUILD_DIR
echo "📥 克隆 Nomad 源码..."
if [ -d "nomad" ]; then
echo "🔄 更新现有仓库..."
cd nomad
git pull
else
git clone https://github.com/hashicorp/nomad.git
cd nomad
fi
# 切换到最新稳定版本
echo "🏷️ 切换到最新稳定版本..."
git checkout $(git describe --tags --abbrev=0)
# 编译
echo "🔨 开始编译..."
make dev
# 检查编译结果
if [ -f "bin/nomad" ]; then
echo "✅ 编译成功!"
# 显示文件信息
file bin/nomad
ls -lh bin/nomad
# 备份现有Nomad
if [ -f "/usr/bin/nomad" ]; then
echo "💾 备份现有Nomad..."
sudo cp /usr/bin/nomad /usr/bin/nomad.backup.$(date +%Y%m%d-%H%M%S)
fi
# 安装新版本
echo "📦 安装新版本..."
sudo cp bin/nomad /usr/bin/nomad
sudo chmod +x /usr/bin/nomad
# 验证安装
echo "🔍 验证安装..."
/usr/bin/nomad version
echo "🎉 Nomad ARMv7 版本安装完成!"
else
echo "❌ 编译失败!"
exit 1
fi
# 清理
echo "🧹 清理编译文件..."
cd /
rm -rf $BUILD_DIR
echo "✨ 完成!"

View File

@ -2,10 +2,25 @@ job "consul-cluster-nomad" {
datacenters = ["dc1"]
type = "service"
group "consul-master" {
group "consul-ch4" {
constraint {
attribute = "${node.unique.name}"
value = "master"
value = "ch4"
}
network {
port "http" {
static = 8500
}
port "server" {
static = 8300
}
port "serf-lan" {
static = 8301
}
port "serf-wan" {
static = 8302
}
}
task "consul" {
@ -16,18 +31,18 @@ job "consul-cluster-nomad" {
args = [
"agent",
"-server",
"-bootstrap-expect=3",
"-bootstrap-expect=2",
"-data-dir=/opt/nomad/data/consul",
"-client=0.0.0.0",
"-bind=100.117.106.136",
"-advertise=100.117.106.136",
"-retry-join=100.116.80.94",
"-retry-join=100.122.197.112",
"-bind={{ env "NOMAD_IP_http" }}",
"-advertise={{ env "NOMAD_IP_http" }}",
"-retry-join=ash3c.tailnet-68f9.ts.net:8301",
"-retry-join=warden.tailnet-68f9.ts.net:8301",
"-ui",
"-http-port=8500",
"-server-port=8300",
"-serf-lan-port=8301",
"-serf-wan-port=8302"
"-serf-wan-port=8302",
]
}
@ -45,6 +60,21 @@ job "consul-cluster-nomad" {
value = "ash3c"
}
network {
port "http" {
static = 8500
}
port "server" {
static = 8300
}
port "serf-lan" {
static = 8301
}
port "serf-wan" {
static = 8302
}
}
task "consul" {
driver = "exec"
@ -53,13 +83,12 @@ job "consul-cluster-nomad" {
args = [
"agent",
"-server",
"-bootstrap-expect=3",
"-data-dir=/opt/nomad/data/consul",
"-client=0.0.0.0",
"-bind=100.116.80.94",
"-advertise=100.116.80.94",
"-retry-join=100.117.106.136",
"-retry-join=100.122.197.112",
"-bind={{ env "NOMAD_IP_http" }}",
"-advertise={{ env "NOMAD_IP_http" }}",
"-retry-join=ch4.tailnet-68f9.ts.net:8301",
"-retry-join=warden.tailnet-68f9.ts.net:8301",
"-ui",
"-http-port=8500",
"-server-port=8300",
@ -82,6 +111,21 @@ job "consul-cluster-nomad" {
value = "warden"
}
network {
port "http" {
static = 8500
}
port "server" {
static = 8300
}
port "serf-lan" {
static = 8301
}
port "serf-wan" {
static = 8302
}
}
task "consul" {
driver = "exec"
@ -90,13 +134,12 @@ job "consul-cluster-nomad" {
args = [
"agent",
"-server",
"-bootstrap-expect=3",
"-data-dir=/opt/nomad/data/consul",
"-client=0.0.0.0",
"-bind=100.122.197.112",
"-advertise=100.122.197.112",
"-retry-join=100.117.106.136",
"-retry-join=100.116.80.94",
"-bind={{ env "NOMAD_IP_http" }}",
"-advertise={{ env "NOMAD_IP_http" }}",
"-retry-join=ch4.tailnet-68f9.ts.net:8301",
"-retry-join=ash3c.tailnet-68f9.ts.net:8301",
"-ui",
"-http-port=8500",
"-server-port=8300",

View File

@ -0,0 +1,158 @@
job "consul-cluster-nomad" {
datacenters = ["dc1"]
type = "service"
group "consul-ch4" {
constraint {
attribute = "${node.unique.name}"
value = "ch4"
}
network {
port "http" {
static = 8500
}
port "server" {
static = 8300
}
port "serf-lan" {
static = 8301
}
port "serf-wan" {
static = 8302
}
}
task "consul" {
driver = "exec"
config {
command = "consul"
args = [
"agent",
"-server",
"-bootstrap-expect=3",
"-data-dir=/opt/nomad/data/consul",
"-client=0.0.0.0",
"-bind={{ env "NOMAD_IP_http" }}",
"-advertise={{ env "NOMAD_IP_http" }}",
"-retry-join=ash3c.tailnet-68f9.ts.net:8301",
"-retry-join=warden.tailnet-68f9.ts.net:8301",
"-ui",
"-http-port=8500",
"-server-port=8300",
"-serf-lan-port=8301",
"-serf-wan-port=8302"
]
}
resources {
cpu = 300
memory = 512
}
}
}
group "consul-ash3c" {
constraint {
attribute = "${node.unique.name}"
value = "ash3c"
}
network {
port "http" {
static = 8500
}
port "server" {
static = 8300
}
port "serf-lan" {
static = 8301
}
port "serf-wan" {
static = 8302
}
}
task "consul" {
driver = "exec"
config {
command = "consul"
args = [
"agent",
"-server",
"-data-dir=/opt/nomad/data/consul",
"-client=0.0.0.0",
"-bind={{ env "NOMAD_IP_http" }}",
"-advertise={{ env "NOMAD_IP_http" }}",
"-retry-join=ch4.tailnet-68f9.ts.net:8301",
"-retry-join=warden.tailnet-68f9.ts.net:8301",
"-ui",
"-http-port=8500",
"-server-port=8300",
"-serf-lan-port=8301",
"-serf-wan-port=8302"
]
}
resources {
cpu = 300
memory = 512
}
}
}
group "consul-warden" {
constraint {
attribute = "${node.unique.name}"
value = "warden"
}
network {
port "http" {
static = 8500
}
port "server" {
static = 8300
}
port "serf-lan" {
static = 8301
}
port "serf-wan" {
static = 8302
}
}
task "consul" {
driver = "exec"
config {
command = "consul"
args = [
"agent",
"-server",
"-data-dir=/opt/nomad/data/consul",
"-client=0.0.0.0",
"-bind={{ env "NOMAD_IP_http" }}",
"-advertise={{ env "NOMAD_IP_http" }}",
"-retry-join=ch4.tailnet-68f9.ts.net:8301",
"-retry-join=ash3c.tailnet-68f9.ts.net:8301",
"-ui",
"-http-port=8500",
"-server-port=8300",
"-serf-lan-port=8301",
"-serf-wan-port=8302"
]
}
resources {
cpu = 300
memory = 512
}
}
}
}

View File

@ -0,0 +1,43 @@
job "juicefs-controller" {
datacenters = ["dc1"]
type = "system"
group "controller" {
task "plugin" {
driver = "podman"
config {
image = "juicedata/juicefs-csi-driver:v0.14.1"
args = [
"--endpoint=unix://csi/csi.sock",
"--logtostderr",
"--nodeid=${node.unique.id}",
"--v=5",
"--by-process=true"
]
privileged = true
}
csi_plugin {
id = "juicefs-nfs"
type = "controller"
mount_dir = "/csi"
}
resources {
cpu = 100
memory = 512
}
env {
POD_NAME = "csi-controller"
}
}
}
}

View File

@ -0,0 +1,38 @@
job "juicefs-csi-controller" {
datacenters = ["dc1"]
type = "system"
group "controller" {
task "juicefs-csi-driver" {
driver = "podman"
config {
image = "juicedata/juicefs-csi-driver:v0.14.1"
args = [
"--endpoint=unix://csi/csi.sock",
"--logtostderr",
"--nodeid=${node.unique.id}",
"--v=5"
]
privileged = true
}
env {
POD_NAME = "juicefs-csi-controller"
POD_NAMESPACE = "default"
NODE_NAME = "${node.unique.id}"
}
csi_plugin {
id = "juicefs0"
type = "controller"
mount_dir = "/csi"
}
resources {
cpu = 100
memory = 512
}
}
}
}

View File

@ -1,23 +0,0 @@
job "nomad-consul-setup" {
datacenters = ["dc1"]
type = "system"
group "nomad-config" {
task "setup-consul" {
driver = "exec"
config {
command = "sh"
args = [
"-c",
"if grep -q 'server.*enabled.*true' /etc/nomad.d/nomad.hcl; then sed -i '/^consul {/,/^}/c\\consul {\\n address = \"master.tailnet-68f9.ts.net:8500,ash3c.tailnet-68f9.ts.net:8500,warden.tailnet-68f9.ts.net:8500\"\\n server_service_name = \"nomad\"\\n client_service_name = \"nomad-client\"\\n auto_advertise = true\\n server_auto_join = true\\n client_auto_join = false\\n}' /etc/nomad.d/nomad.hcl; else sed -i '/^consul {/,/^}/c\\consul {\\n address = \"master.tailnet-68f9.ts.net:8500,ash3c.tailnet-68f9.ts.net:8500,warden.tailnet-68f9.ts.net:8500\"\\n server_service_name = \"nomad\"\\n client_service_name = \"nomad-client\"\\n auto_advertise = true\\n server_auto_join = false\\n client_auto_join = true\\n}' /etc/nomad.d/nomad.hcl; fi && systemctl restart nomad"
]
}
resources {
cpu = 100
memory = 128
}
}
}
}

View File

@ -0,0 +1,43 @@
# NFS CSI Volume Definition for Nomad
# 这个文件定义了CSI volume让NFS存储能在Nomad UI中显示
volume "nfs-shared-csi" {
type = "csi"
# CSI plugin名称
source = "csi-nfs"
# 容量设置
capacity_min = "1GiB"
capacity_max = "10TiB"
# 访问模式 - 支持多节点读写
access_mode = "multi-node-multi-writer"
# 挂载选项
mount_options {
fs_type = "nfs4"
mount_flags = "rw,relatime,vers=4.2"
}
# 拓扑约束 - 确保在有NFS挂载的节点上运行
topology_request {
required {
topology {
"node" = "{{ range $node := nomadNodes }}{{ if eq $node.Status "ready" }}{{ $node.Name }}{{ end }}{{ end }}"
}
}
}
# 卷参数
parameters {
server = "snail"
share = "/fs/1000/nfs/Fnsync"
}
}

View File

@ -0,0 +1,22 @@
# Dynamic Host Volume Definition for NFS
# 这个文件定义了动态host volume让NFS存储能在Nomad UI中显示
volume "nfs-shared-dynamic" {
type = "host"
# 使用动态host volume
source = "fnsync"
# 只读设置
read_only = false
# 容量信息用于显示
capacity_min = "1GiB"
capacity_max = "10TiB"
}

View File

@ -0,0 +1,22 @@
# NFS Host Volume Definition for Nomad UI
# 这个文件定义了host volume让NFS存储能在Nomad UI中显示
volume "nfs-shared-host" {
type = "host"
# 使用host volume
source = "fnsync"
# 只读设置
read_only = false
# 容量信息用于显示
capacity_min = "1GiB"
capacity_max = "10TiB"
}

View File

@ -0,0 +1,123 @@
http:
serversTransports:
waypoint-insecure:
insecureSkipVerify: true
authentik-insecure:
insecureSkipVerify: true
middlewares:
consul-stripprefix:
stripPrefix:
prefixes:
- "/consul"
waypoint-auth:
replacePathRegex:
regex: "^/auth/token(.*)$"
replacement: "/auth/token$1"
services:
consul-cluster:
loadBalancer:
servers:
- url: "http://ch4.tailnet-68f9.ts.net:8500" # 韩国Leader
- url: "http://warden.tailnet-68f9.ts.net:8500" # 北京Follower
- url: "http://ash3c.tailnet-68f9.ts.net:8500" # 美国Follower
healthCheck:
path: "/v1/status/leader"
interval: "30s"
timeout: "15s"
nomad-cluster:
loadBalancer:
servers:
- url: "http://ch2.tailnet-68f9.ts.net:4646" # 韩国Leader
- url: "http://warden.tailnet-68f9.ts.net:4646" # 北京Follower
- url: "http://ash3c.tailnet-68f9.ts.net:4646" # 美国Follower
healthCheck:
path: "/v1/status/leader"
interval: "30s"
timeout: "15s"
waypoint-cluster:
loadBalancer:
servers:
- url: "https://hcp1.tailnet-68f9.ts.net:9701" # hcp1 节点 HTTPS API
serversTransport: waypoint-insecure
vault-cluster:
loadBalancer:
servers:
- url: "http://warden.tailnet-68f9.ts.net:8200" # 北京,单节点
healthCheck:
path: "/ui/"
interval: "30s"
timeout: "15s"
authentik-cluster:
loadBalancer:
servers:
- url: "https://authentik.tailnet-68f9.ts.net:9443" # Authentik容器HTTPS端口
serversTransport: authentik-insecure
healthCheck:
path: "/flows/-/default/authentication/"
interval: "30s"
timeout: "15s"
routers:
consul-api:
rule: "Host(`consul.git4ta.tech`)"
service: consul-cluster
entryPoints:
- websecure
tls:
certResolver: cloudflare
middlewares:
- consul-stripprefix
consul-ui:
rule: "Host(`consul.git-4ta.live`) && PathPrefix(`/ui`)"
service: consul-cluster
entryPoints:
- websecure
tls:
certResolver: cloudflare
nomad-api:
rule: "Host(`nomad.git-4ta.live`)"
service: nomad-cluster
entryPoints:
- websecure
tls:
certResolver: cloudflare
nomad-ui:
rule: "Host(`nomad.git-4ta.live`) && PathPrefix(`/ui`)"
service: nomad-cluster
entryPoints:
- websecure
tls:
certResolver: cloudflare
waypoint-ui:
rule: "Host(`waypoint.git-4ta.live`)"
service: waypoint-cluster
entryPoints:
- websecure
tls:
certResolver: cloudflare
vault-ui:
rule: "Host(`vault.git-4ta.live`)"
service: vault-cluster
entryPoints:
- websecure
tls:
certResolver: cloudflare
authentik-ui:
rule: "Host(`authentik1.git-4ta.live`)"
service: authentik-cluster
entryPoints:
- websecure
tls:
certResolver: cloudflare

View File

@ -0,0 +1,254 @@
job "traefik-cloudflare-v2" {
datacenters = ["dc1"]
type = "service"
group "traefik" {
count = 1
constraint {
attribute = "${node.unique.name}"
operator = "="
value = "hcp1"
}
volume "traefik-certs" {
type = "host"
read_only = false
source = "traefik-certs"
}
network {
mode = "host"
port "http" {
static = 80
}
port "https" {
static = 443
}
port "traefik" {
static = 8080
}
}
task "traefik" {
driver = "exec"
config {
command = "/usr/local/bin/traefik"
args = [
"--configfile=/local/traefik.yml"
]
}
env {
CLOUDFLARE_EMAIL = "houzhongxu.houzhongxu@gmail.com"
CLOUDFLARE_DNS_API_TOKEN = "HYT-cfZTP_jq6Xd9g3tpFMwxopOyIrf8LZpmGAI3"
CLOUDFLARE_ZONE_API_TOKEN = "HYT-cfZTP_jq6Xd9g3tpFMwxopOyIrf8LZpmGAI3"
}
volume_mount {
volume = "traefik-certs"
destination = "/opt/traefik/certs"
read_only = false
}
template {
data = <<EOF
api:
dashboard: true
insecure: true
debug: true
entryPoints:
web:
address: "0.0.0.0:80"
http:
redirections:
entrypoint:
to: websecure
scheme: https
permanent: true
websecure:
address: "0.0.0.0:443"
traefik:
address: "0.0.0.0:8080"
providers:
consulCatalog:
endpoint:
address: "warden.tailnet-68f9.ts.net:8500"
scheme: "http"
watch: true
exposedByDefault: false
prefix: "traefik"
defaultRule: "Host(`{{ .Name }}.git-4ta.live`)"
file:
filename: /local/dynamic.yml
watch: true
certificatesResolvers:
cloudflare:
acme:
email: {{ env "CLOUDFLARE_EMAIL" }}
storage: /opt/traefik/certs/acme.json
dnsChallenge:
provider: cloudflare
delayBeforeCheck: 30s
resolvers:
- "1.1.1.1:53"
- "1.0.0.1:53"
log:
level: DEBUG
EOF
destination = "local/traefik.yml"
}
template {
data = <<EOF
http:
serversTransports:
waypoint-insecure:
insecureSkipVerify: true
authentik-insecure:
insecureSkipVerify: true
middlewares:
consul-stripprefix:
stripPrefix:
prefixes:
- "/consul"
waypoint-auth:
replacePathRegex:
regex: "^/auth/token(.*)$"
replacement: "/auth/token$1"
services:
consul-cluster:
loadBalancer:
servers:
- url: "http://ch4.tailnet-68f9.ts.net:8500" # 韩国Leader
- url: "http://warden.tailnet-68f9.ts.net:8500" # 北京Follower
- url: "http://ash3c.tailnet-68f9.ts.net:8500" # 美国Follower
healthCheck:
path: "/v1/status/leader"
interval: "30s"
timeout: "15s"
nomad-cluster:
loadBalancer:
servers:
- url: "http://ch2.tailnet-68f9.ts.net:4646" # 韩国Leader
- url: "http://warden.tailnet-68f9.ts.net:4646" # 北京Follower
- url: "http://ash3c.tailnet-68f9.ts.net:4646" # 美国Follower
healthCheck:
path: "/v1/status/leader"
interval: "30s"
timeout: "15s"
waypoint-cluster:
loadBalancer:
servers:
- url: "https://hcp1.tailnet-68f9.ts.net:9701" # hcp1 节点 HTTPS API
serversTransport: waypoint-insecure
vault-cluster:
loadBalancer:
servers:
- url: "http://warden.tailnet-68f9.ts.net:8200" # 北京,单节点
healthCheck:
path: "/ui/"
interval: "30s"
timeout: "15s"
authentik-cluster:
loadBalancer:
servers:
- url: "https://authentik.tailnet-68f9.ts.net:9443" # Authentik容器HTTPS端口
serversTransport: authentik-insecure
healthCheck:
path: "/flows/-/default/authentication/"
interval: "30s"
timeout: "15s"
routers:
consul-api:
rule: "Host(`consul.git-4ta.live`)"
service: consul-cluster
middlewares:
- consul-stripprefix
entryPoints:
- websecure
tls:
certResolver: cloudflare
traefik-dashboard:
rule: "Host(`traefik.git-4ta.live`)"
service: dashboard@internal
middlewares:
- dashboard_redirect@internal
- dashboard_stripprefix@internal
entryPoints:
- websecure
tls:
certResolver: cloudflare
nomad-ui:
rule: "Host(`nomad.git-4ta.live`)"
service: nomad-cluster
entryPoints:
- websecure
tls:
certResolver: cloudflare
waypoint-ui:
rule: "Host(`waypoint.git-4ta.live`)"
service: waypoint-cluster
entryPoints:
- websecure
tls:
certResolver: cloudflare
vault-ui:
rule: "Host(`vault.git-4ta.live`)"
service: vault-cluster
entryPoints:
- websecure
tls:
certResolver: cloudflare
authentik-ui:
rule: "Host(`authentik.git-4ta.live`)"
service: authentik-cluster
entryPoints:
- websecure
tls:
certResolver: cloudflare
EOF
destination = "local/dynamic.yml"
}
template {
data = <<EOF
CLOUDFLARE_EMAIL={{ env "CLOUDFLARE_EMAIL" }}
CLOUDFLARE_DNS_API_TOKEN={{ env "CLOUDFLARE_DNS_API_TOKEN" }}
CLOUDFLARE_ZONE_API_TOKEN={{ env "CLOUDFLARE_ZONE_API_TOKEN" }}
EOF
destination = "local/cloudflare.env"
env = true
}
# 测试证书权限控制
template {
data = "-----BEGIN CERTIFICATE-----\nTEST CERTIFICATE FOR PERMISSION CONTROL\n-----END CERTIFICATE-----"
destination = "/opt/traefik/certs/test-cert.pem"
perms = 600
}
resources {
cpu = 500
memory = 512
}
}
}
}

View File

@ -0,0 +1,239 @@
job "traefik-cloudflare-v2" {
datacenters = ["dc1"]
type = "service"
group "traefik" {
count = 1
constraint {
attribute = "${node.unique.name}"
value = "hcp1"
}
volume "traefik-certs" {
type = "host"
read_only = false
source = "traefik-certs"
}
network {
mode = "host"
port "http" {
static = 80
}
port "https" {
static = 443
}
port "traefik" {
static = 8080
}
}
task "traefik" {
driver = "exec"
config {
command = "/usr/local/bin/traefik"
args = [
"--configfile=/local/traefik.yml"
]
}
volume_mount {
volume = "traefik-certs"
destination = "/opt/traefik/certs"
read_only = false
}
template {
data = <<EOF
api:
dashboard: true
insecure: true
entryPoints:
web:
address: "0.0.0.0:80"
http:
redirections:
entrypoint:
to: websecure
scheme: https
permanent: true
websecure:
address: "0.0.0.0:443"
traefik:
address: "0.0.0.0:8080"
providers:
consulCatalog:
endpoint:
address: "warden.tailnet-68f9.ts.net:8500"
scheme: "http"
watch: true
exposedByDefault: false
prefix: "traefik"
defaultRule: "Host(`{{ .Name }}.git-4ta.live`)"
file:
filename: /local/dynamic.yml
watch: true
certificatesResolvers:
cloudflare:
acme:
email: houzhongxu.houzhongxu@gmail.com
storage: /opt/traefik/certs/acme.json
dnsChallenge:
provider: cloudflare
delayBeforeCheck: 30s
resolvers:
- "1.1.1.1:53"
- "1.0.0.1:53"
log:
level: DEBUG
EOF
destination = "local/traefik.yml"
}
template {
data = <<EOF
http:
serversTransports:
waypoint-insecure:
insecureSkipVerify: true
authentik-insecure:
insecureSkipVerify: true
middlewares:
consul-stripprefix:
stripPrefix:
prefixes:
- "/consul"
waypoint-auth:
replacePathRegex:
regex: "^/auth/token(.*)$"
replacement: "/auth/token$1"
services:
consul-cluster:
loadBalancer:
servers:
- url: "http://ch4.tailnet-68f9.ts.net:8500" # 韩国Leader
- url: "http://warden.tailnet-68f9.ts.net:8500" # 北京Follower
- url: "http://ash3c.tailnet-68f9.ts.net:8500" # 美国Follower
healthCheck:
path: "/v1/status/leader"
interval: "30s"
timeout: "15s"
nomad-cluster:
loadBalancer:
servers:
- url: "http://ch2.tailnet-68f9.ts.net:4646" # 韩国Leader
- url: "http://warden.tailnet-68f9.ts.net:4646" # 北京Follower
- url: "http://ash3c.tailnet-68f9.ts.net:4646" # 美国Follower
healthCheck:
path: "/v1/status/leader"
interval: "30s"
timeout: "15s"
waypoint-cluster:
loadBalancer:
servers:
- url: "https://hcp1.tailnet-68f9.ts.net:9701" # hcp1 节点 HTTPS API
serversTransport: waypoint-insecure
vault-cluster:
loadBalancer:
servers:
- url: "http://warden.tailnet-68f9.ts.net:8200" # 北京,单节点
healthCheck:
path: "/ui/"
interval: "30s"
timeout: "15s"
authentik-cluster:
loadBalancer:
servers:
- url: "https://authentik.tailnet-68f9.ts.net:9443" # Authentik容器HTTPS端口
serversTransport: authentik-insecure
healthCheck:
path: "/flows/-/default/authentication/"
interval: "30s"
timeout: "15s"
routers:
consul-api:
rule: "Host(`consul.git-4ta.live`)"
service: consul-cluster
middlewares:
- consul-stripprefix
entryPoints:
- websecure
tls:
certResolver: cloudflare
traefik-dashboard:
rule: "Host(`traefik.git-4ta.live`)"
service: dashboard@internal
middlewares:
- dashboard_redirect@internal
- dashboard_stripprefix@internal
entryPoints:
- websecure
tls:
certResolver: cloudflare
nomad-ui:
rule: "Host(`nomad.git-4ta.live`)"
service: nomad-cluster
entryPoints:
- websecure
tls:
certResolver: cloudflare
waypoint-ui:
rule: "Host(`waypoint.git-4ta.live`)"
service: waypoint-cluster
entryPoints:
- websecure
tls:
certResolver: cloudflare
vault-ui:
rule: "Host(`vault.git-4ta.live`)"
service: vault-cluster
entryPoints:
- websecure
tls:
certResolver: cloudflare
authentik-ui:
rule: "Host(`authentik.git4ta.tech`)"
service: authentik-cluster
entryPoints:
- websecure
tls:
certResolver: cloudflare
EOF
destination = "local/dynamic.yml"
}
template {
data = <<EOF
CLOUDFLARE_EMAIL=houzhongxu.houzhongxu@gmail.com
CLOUDFLARE_DNS_API_TOKEN=0aPWoLaQ59l0nyL1jIVzZaEx2e41Gjgcfhn3ztJr
CLOUDFLARE_ZONE_API_TOKEN=0aPWoLaQ59l0nyL1jIVzZaEx2e41Gjgcfhn3ztJr
EOF
destination = "local/cloudflare.env"
env = true
}
resources {
cpu = 500
memory = 512
}
}
}
}

View File

@ -0,0 +1,249 @@
job "traefik-cloudflare-v3" {
datacenters = ["dc1"]
type = "service"
group "traefik" {
count = 1
constraint {
attribute = "${node.unique.name}"
value = "hcp1"
}
volume "traefik-certs" {
type = "host"
read_only = false
source = "traefik-certs"
}
network {
mode = "host"
port "http" {
static = 80
}
port "https" {
static = 443
}
port "traefik" {
static = 8080
}
}
task "traefik" {
driver = "exec"
config {
command = "/usr/local/bin/traefik"
args = [
"--configfile=/local/traefik.yml"
]
}
env {
CLOUDFLARE_EMAIL = "locksmithknight@gmail.com"
CLOUDFLARE_DNS_API_TOKEN = "0aPWoLaQ59l0nyL1jIVzZaEx2e41Gjgcfhn3ztJr"
CLOUDFLARE_ZONE_API_TOKEN = "0aPWoLaQ59l0nyL1jIVzZaEx2e41Gjgcfhn3ztJr"
}
volume_mount {
volume = "traefik-certs"
destination = "/opt/traefik/certs"
read_only = false
}
template {
data = <<EOF
api:
dashboard: true
insecure: true
entryPoints:
web:
address: "0.0.0.0:80"
http:
redirections:
entrypoint:
to: websecure
scheme: https
permanent: true
websecure:
address: "0.0.0.0:443"
traefik:
address: "0.0.0.0:8080"
providers:
consulCatalog:
endpoint:
address: "warden.tailnet-68f9.ts.net:8500"
scheme: "http"
watch: true
exposedByDefault: false
prefix: "traefik"
defaultRule: "Host(`{{ .Name }}.git-4ta.live`)"
file:
filename: /local/dynamic.yml
watch: true
certificatesResolvers:
cloudflare:
acme:
email: {{ env "CLOUDFLARE_EMAIL" }}
storage: /opt/traefik/certs/acme.json
dnsChallenge:
provider: cloudflare
delayBeforeCheck: 30s
log:
level: DEBUG
EOF
destination = "local/traefik.yml"
}
template {
data = <<EOF
http:
serversTransports:
waypoint-insecure:
insecureSkipVerify: true
authentik-insecure:
insecureSkipVerify: true
middlewares:
consul-stripprefix:
stripPrefix:
prefixes:
- "/consul"
waypoint-auth:
replacePathRegex:
regex: "^/auth/token(.*)$"
replacement: "/auth/token$1"
services:
consul-cluster:
loadBalancer:
servers:
- url: "http://ch4.tailnet-68f9.ts.net:8500" # 韩国Leader
- url: "http://warden.tailnet-68f9.ts.net:8500" # 北京Follower
- url: "http://ash3c.tailnet-68f9.ts.net:8500" # 美国Follower
healthCheck:
path: "/v1/status/leader"
interval: "30s"
timeout: "15s"
nomad-cluster:
loadBalancer:
servers:
- url: "http://ch2.tailnet-68f9.ts.net:4646" # 韩国Leader
- url: "http://ash3c.tailnet-68f9.ts.net:4646" # 美国Follower
healthCheck:
path: "/v1/status/leader"
interval: "30s"
timeout: "15s"
waypoint-cluster:
loadBalancer:
servers:
- url: "https://hcp1.tailnet-68f9.ts.net:9701" # hcp1 节点 HTTPS API
serversTransport: waypoint-insecure
vault-cluster:
loadBalancer:
servers:
- url: "http://warden.tailnet-68f9.ts.net:8200" # 北京,单节点
healthCheck:
path: "/ui/"
interval: "30s"
timeout: "15s"
authentik-cluster:
loadBalancer:
servers:
- url: "https://authentik.tailnet-68f9.ts.net:9443" # Authentik容器HTTPS端口
serversTransport: authentik-insecure
healthCheck:
path: "/flows/-/default/authentication/"
interval: "30s"
timeout: "15s"
routers:
consul-api:
rule: "Host(`consul.git-4ta.live`)"
service: consul-cluster
middlewares:
- consul-stripprefix
entryPoints:
- websecure
tls:
certResolver: cloudflare
traefik-dashboard:
rule: "Host(`traefik.git-4ta.live`)"
service: dashboard@internal
middlewares:
- dashboard_redirect@internal
- dashboard_stripprefix@internal
entryPoints:
- websecure
tls:
certResolver: cloudflare
traefik-api:
rule: "Host(`traefik.git-4ta.live`) && PathPrefix(`/api`)"
service: api@internal
entryPoints:
- websecure
tls:
certResolver: cloudflare
nomad-ui:
rule: "Host(`nomad.git-4ta.live`)"
service: nomad-cluster
entryPoints:
- websecure
tls:
certResolver: cloudflare
waypoint-ui:
rule: "Host(`waypoint.git-4ta.live`)"
service: waypoint-cluster
entryPoints:
- websecure
tls:
certResolver: cloudflare
vault-ui:
rule: "Host(`vault.git-4ta.live`)"
service: vault-cluster
entryPoints:
- websecure
tls:
certResolver: cloudflare
authentik-ui:
rule: "Host(`authentik1.git-4ta.live`)"
service: authentik-cluster
entryPoints:
- websecure
tls:
certResolver: cloudflare
EOF
destination = "local/dynamic.yml"
}
template {
data = <<EOF
CLOUDFLARE_EMAIL=locksmithknight@gmail.com
CLOUDFLARE_DNS_API_TOKEN=0aPWoLaQ59l0nyL1jIVzZaEx2e41Gjgcfhn3ztJr
CLOUDFLARE_ZONE_API_TOKEN=0aPWoLaQ59l0nyL1jIVzZaEx2e41Gjgcfhn3ztJr
EOF
destination = "local/cloudflare.env"
env = true
}
resources {
cpu = 500
memory = 512
}
}
}
}

View File

@ -0,0 +1,65 @@
# Consul Server Configuration for onecloud1
datacenter = "dc1"
data_dir = "/opt/consul/data"
log_level = "INFO"
node_name = "onecloud1"
bind_addr = "100.98.209.50"
# Server mode
server = true
bootstrap_expect = 4
# Join existing cluster
retry_join = [
"100.117.106.136", # ch4
"100.122.197.112", # warden
"100.116.80.94" # ash3c
]
# Performance optimization
performance {
raft_multiplier = 5
}
# Ports configuration
ports {
grpc = 8502
http = 8500
dns = 8600
server = 8300
serf_lan = 8301
serf_wan = 8302
}
# Enable Connect for service mesh
connect {
enabled = true
}
# Cache configuration for performance
cache {
entry_fetch_max_burst = 42
entry_fetch_rate = 30
}
# Node metadata
node_meta = {
region = "unknown"
zone = "nomad-client"
}
# UI enabled for servers
ui_config {
enabled = true
}
# ACL configuration (if needed)
acl = {
enabled = false
default_policy = "allow"
}
# Logging
log_file = "/var/log/consul/consul.log"
log_rotate_duration = "24h"
log_rotate_max_files = 7

View File

@ -0,0 +1,219 @@
# 通过 Traefik 连接 Consul 的配置示例
## 🎯 目标实现
让其他节点通过 `consul.git4ta.me``nomad.git4ta.me` 访问服务,而不是直接连接 IP。
## ✅ 当前状态验证
### Consul 智能检测
```bash
# Leader 检测
curl -s https://consul.git4ta.me/v1/status/leader
# 返回: "100.117.106.136:8300" (ch4 是 leader)
# 当前路由节点
curl -s https://consul.git4ta.me/v1/agent/self | jq -r '.Config.NodeName'
# 返回: "ash3c" (Traefik 路由到 ash3c)
```
### Nomad 智能检测
```bash
# Leader 检测
curl -s https://nomad.git4ta.me/v1/status/leader
# 返回: "100.90.159.68:4647" (ch2 是 leader)
```
## 🔧 节点配置示例
### 1. Consul 客户端配置
#### 当前配置 (直接连接)
```hcl
# /etc/consul.d/consul.hcl
datacenter = "dc1"
node_name = "client-node"
retry_join = [
"warden.tailnet-68f9.ts.net:8301",
"ch4.tailnet-68f9.ts.net:8301",
"ash3c.tailnet-68f9.ts.net:8301"
]
```
#### 新配置 (通过 Traefik)
```hcl
# /etc/consul.d/consul.hcl
datacenter = "dc1"
node_name = "client-node"
# 通过 Traefik 连接 Consul
retry_join = ["consul.git4ta.me:8301"]
# 或者使用 HTTP API
addresses {
http = "consul.git4ta.me"
}
ports {
http = 8301
}
```
### 2. Nomad 客户端配置
#### 当前配置 (直接连接)
```hcl
# /etc/nomad.d/nomad.hcl
consul {
address = "http://warden.tailnet-68f9.ts.net:8500"
}
```
#### 新配置 (通过 Traefik)
```hcl
# /etc/nomad.d/nomad.hcl
consul {
address = "https://consul.git4ta.me:8500"
# 或者使用 HTTP
# address = "http://consul.git4ta.me:8500"
}
```
### 3. Vault 配置
#### 当前配置 (直接连接)
```hcl
# Consul KV: vault/config
storage "consul" {
address = "ch4.tailnet-68f9.ts.net:8500"
path = "vault/"
}
service_registration "consul" {
address = "ch4.tailnet-68f9.ts.net:8500"
service = "vault"
}
```
#### 新配置 (通过 Traefik)
```hcl
# Consul KV: vault/config
storage "consul" {
address = "consul.git4ta.me:8500"
path = "vault/"
}
service_registration "consul" {
address = "consul.git4ta.me:8500"
service = "vault"
}
```
## 🚀 实施步骤
### 步骤 1: 验证 Traefik 路由
```bash
# 测试 Consul 路由
curl -I https://consul.git4ta.me/v1/status/leader
# 测试 Nomad 路由
curl -I https://nomad.git4ta.me/v1/status/leader
```
### 步骤 2: 更新节点配置
```bash
# 在目标节点上执行
# 备份现有配置
cp /etc/consul.d/consul.hcl /etc/consul.d/consul.hcl.backup
cp /etc/nomad.d/nomad.hcl /etc/nomad.d/nomad.hcl.backup
# 修改 Consul 配置
sed -i 's/warden\.tailnet-68f9\.ts\.net:8301/consul.git4ta.me:8301/g' /etc/consul.d/consul.hcl
sed -i 's/ch4\.tailnet-68f9\.ts\.net:8301/consul.git4ta.me:8301/g' /etc/consul.d/consul.hcl
sed -i 's/ash3c\.tailnet-68f9\.ts\.net:8301/consul.git4ta.me:8301/g' /etc/consul.d/consul.hcl
# 修改 Nomad 配置
sed -i 's/warden\.tailnet-68f9\.ts\.net:8500/consul.git4ta.me:8500/g' /etc/nomad.d/nomad.hcl
sed -i 's/ch4\.tailnet-68f9\.ts\.net:8500/consul.git4ta.me:8500/g' /etc/nomad.d/nomad.hcl
sed -i 's/ash3c\.tailnet-68f9\.ts\.net:8500/consul.git4ta.me:8500/g' /etc/nomad.d/nomad.hcl
```
### 步骤 3: 重启服务
```bash
# 重启 Consul
systemctl restart consul
# 重启 Nomad
systemctl restart nomad
# 重启 Vault (如果适用)
systemctl restart vault
```
### 步骤 4: 验证连接
```bash
# 检查 Consul 连接
consul members
# 检查 Nomad 连接
nomad node status
# 检查 Vault 连接
vault status
```
## 📊 性能对比
### 延迟测试
```bash
# 直接连接
time curl -s http://ch4.tailnet-68f9.ts.net:8500/v1/status/leader
# 通过 Traefik
time curl -s https://consul.git4ta.me/v1/status/leader
```
### 可靠性测试
```bash
# 测试故障转移
# 1. 停止 ch4 Consul
# 2. 检查 Traefik 是否自动路由到其他节点
curl -s https://consul.git4ta.me/v1/status/leader
```
## 🎯 优势总结
### 1. 统一入口
- **之前**: 每个节点需要知道所有 Consul/Nomad 节点 IP
- **现在**: 只需要知道 `consul.git4ta.me``nomad.git4ta.me`
### 2. 自动故障转移
- **之前**: 节点需要手动配置多个 IP
- **现在**: Traefik 自动路由到健康的节点
### 3. 简化配置
- **之前**: 硬编码 IP 地址,难以维护
- **现在**: 使用域名,易于管理和更新
### 4. 负载均衡
- **之前**: 所有请求都到同一个节点
- **现在**: Traefik 可以分散请求到多个节点
## ⚠️ 注意事项
### 1. 端口映射
- **Traefik 外部**: 443 (HTTPS) / 80 (HTTP)
- **服务内部**: 8500 (Consul), 4646 (Nomad)
- **需要配置**: Traefik 端口转发
### 2. SSL 证书
- **HTTPS**: 需要有效证书
- **HTTP**: 可以使用自签名证书
### 3. 单点故障
- **风险**: Traefik 成为单点故障
- **缓解**: Traefik 本身也是高可用的
---
**结论**: 完全可行!通过 Traefik 统一访问 Consul 和 Nomad 是一个优秀的架构改进,提供了更好的可维护性和可靠性。

View File

@ -0,0 +1,191 @@
# Consul 通过 Traefik 连接的配置方案
## 🎯 目标
让所有节点通过 `consul.git4ta.me` 访问 Consul而不是直接连接 IP 地址。
## ✅ 可行性验证
### 测试结果
```bash
# 通过 Traefik 访问 Consul API
curl -s https://consul.git4ta.me/v1/status/leader
# 返回: "100.117.106.136:8300" (ch4 是 leader)
curl -s https://consul.git4ta.me/v1/agent/self | jq -r '.Config.NodeName'
# 返回: "warden" (当前路由到的节点)
```
### 优势
1. **统一入口**: 所有服务都通过域名访问
2. **自动故障转移**: Traefik 自动路由到健康的 Consul 节点
3. **简化配置**: 不需要硬编码 IP 地址
4. **负载均衡**: 可以分散请求到多个 Consul 节点
## 🔧 配置方案
### 方案 1: 修改现有节点配置
#### Consul 客户端配置
```hcl
# /etc/consul.d/consul.hcl
datacenter = "dc1"
node_name = "node-name"
# 通过 Traefik 连接 Consul
retry_join = ["consul.git4ta.me:8500"]
# 或者使用 HTTP 连接
addresses {
http = "consul.git4ta.me"
https = "consul.git4ta.me"
}
ports {
http = 8500
https = 8500
}
```
#### Nomad 配置
```hcl
# /etc/nomad.d/nomad.hcl
consul {
address = "https://consul.git4ta.me:8500"
# 或者
address = "http://consul.git4ta.me:8500"
}
```
#### Vault 配置
```hcl
# 在 Consul KV vault/config 中
storage "consul" {
address = "consul.git4ta.me:8500"
path = "vault/"
}
service_registration "consul" {
address = "consul.git4ta.me:8500"
service = "vault"
service_tags = "vault-server"
}
```
### 方案 2: 创建新的服务发现配置
#### 在 Traefik 中添加 Consul 服务发现
```yaml
# 在 dynamic.yml 中添加
services:
consul-api:
loadBalancer:
servers:
- url: "http://ch4.tailnet-68f9.ts.net:8500" # Leader
- url: "http://warden.tailnet-68f9.ts.net:8500" # Follower
- url: "http://ash3c.tailnet-68f9.ts.net:8500" # Follower
healthCheck:
path: "/v1/status/leader"
interval: "30s"
timeout: "15s"
routers:
consul-api:
rule: "Host(`consul.git4ta.me`)"
service: consul-api
entryPoints:
- websecure
tls:
certResolver: cloudflare
```
## 🚨 注意事项
### 1. 端口映射
- **Traefik 外部端口**: 443 (HTTPS) / 80 (HTTP)
- **Consul 内部端口**: 8500
- **需要配置**: Traefik 端口转发
### 2. SSL 证书
- **HTTPS**: 需要有效的 SSL 证书
- **HTTP**: 可以使用自签名证书或跳过验证
### 3. 健康检查
- **路径**: `/v1/status/leader`
- **间隔**: 30秒
- **超时**: 15秒
### 4. 故障转移
- **自动切换**: Traefik 会自动路由到健康的节点
- **Leader 选举**: Consul 会自动选举新的 leader
## 🔄 实施步骤
### 步骤 1: 验证 Traefik 配置
```bash
# 检查当前 Traefik 是否已配置 Consul 路由
curl -I https://consul.git4ta.me/v1/status/leader
```
### 步骤 2: 更新节点配置
```bash
# 备份现有配置
cp /etc/consul.d/consul.hcl /etc/consul.d/consul.hcl.backup
# 修改配置使用域名
sed -i 's/warden\.tailnet-68f9\.ts\.net:8500/consul.git4ta.me:8500/g' /etc/consul.d/consul.hcl
```
### 步骤 3: 重启服务
```bash
# 重启 Consul
systemctl restart consul
# 重启 Nomad
systemctl restart nomad
# 重启 Vault
systemctl restart vault
```
### 步骤 4: 验证连接
```bash
# 检查 Consul 连接
consul members
# 检查 Nomad 连接
nomad node status
# 检查 Vault 连接
vault status
```
## 📊 性能影响
### 延迟
- **直接连接**: ~1-2ms
- **通过 Traefik**: ~5-10ms (增加 3-8ms)
### 吞吐量
- **Traefik 限制**: 取决于 Traefik 配置
- **建议**: 监控 Traefik 性能指标
### 可靠性
- **提升**: 自动故障转移
- **风险**: Traefik 单点故障
## 🎯 推荐方案
**建议采用方案 1**,因为:
1. **简单直接**: 只需要修改配置文件
2. **向后兼容**: 不影响现有功能
3. **易于维护**: 统一管理入口
**实施优先级**:
1. ✅ **Traefik 配置** - 已完成
2. 🔄 **Consul 客户端** - 需要修改
3. 🔄 **Nomad 配置** - 需要修改
4. 🔄 **Vault 配置** - 需要修改
---
**结论**: 完全可行!通过 Traefik 统一访问 Consul 是一个很好的架构改进。

View File

@ -0,0 +1,57 @@
---
- name: Clean up Consul configuration from dedicated clients
hosts: hcp1,influxdb1,browser
become: yes
tasks:
- name: Stop Consul service
systemd:
name: consul
state: stopped
enabled: no
- name: Disable Consul service
systemd:
name: consul
enabled: no
- name: Kill any remaining Consul processes
shell: |
pkill -f consul || true
sleep 2
pkill -9 -f consul || true
ignore_errors: yes
- name: Remove Consul systemd service file
file:
path: /etc/systemd/system/consul.service
state: absent
- name: Remove Consul configuration directory
file:
path: /etc/consul.d
state: absent
- name: Remove Consul data directory
file:
path: /opt/consul
state: absent
- name: Reload systemd daemon
systemd:
daemon_reload: yes
- name: Verify Consul is stopped
shell: |
if pgrep -f consul; then
echo "Consul still running"
exit 1
else
echo "Consul stopped successfully"
fi
register: consul_status
failed_when: consul_status.rc != 0
- name: Display cleanup status
debug:
msg: "Consul cleanup completed on {{ inventory_hostname }}"

View File

@ -0,0 +1,55 @@
---
- name: Configure Consul Auto-Discovery
hosts: all
become: yes
vars:
consul_servers:
- "warden.tailnet-68f9.ts.net:8301"
- "ch4.tailnet-68f9.ts.net:8301"
- "ash3c.tailnet-68f9.ts.net:8301"
tasks:
- name: Backup current nomad.hcl
copy:
src: /etc/nomad.d/nomad.hcl
dest: /etc/nomad.d/nomad.hcl.backup.{{ ansible_date_time.epoch }}
remote_src: yes
backup: yes
- name: Update Consul configuration for auto-discovery
blockinfile:
path: /etc/nomad.d/nomad.hcl
marker: "# {mark} ANSIBLE MANAGED CONSUL CONFIG"
block: |
consul {
retry_join = [
"warden.tailnet-68f9.ts.net:8301",
"ch4.tailnet-68f9.ts.net:8301",
"ash3c.tailnet-68f9.ts.net:8301"
]
server_service_name = "nomad"
client_service_name = "nomad-client"
}
insertbefore: '^consul \{'
replace: '^consul \{.*?\}'
- name: Restart Nomad service
systemd:
name: nomad
state: restarted
enabled: yes
- name: Wait for Nomad to be ready
wait_for:
port: 4646
host: "{{ ansible_default_ipv4.address }}"
delay: 5
timeout: 30
- name: Verify Consul connection
shell: |
NOMAD_ADDR=http://localhost:4646 nomad node status | grep -q "ready"
register: nomad_ready
failed_when: nomad_ready.rc != 0
retries: 3
delay: 10

View File

@ -0,0 +1,75 @@
---
- name: Remove Consul configuration from Nomad servers
hosts: semaphore,ash1d,ash2e,ch2,ch3,onecloud1,de
become: yes
tasks:
- name: Remove entire Consul configuration block
blockinfile:
path: /etc/nomad.d/nomad.hcl
marker: "# {mark} ANSIBLE MANAGED CONSUL CONFIG"
state: absent
- name: Remove Consul configuration lines
lineinfile:
path: /etc/nomad.d/nomad.hcl
regexp: '^consul \{'
state: absent
- name: Remove Consul configuration content
lineinfile:
path: /etc/nomad.d/nomad.hcl
regexp: '^ address ='
state: absent
- name: Remove Consul service names
lineinfile:
path: /etc/nomad.d/nomad.hcl
regexp: '^ server_service_name ='
state: absent
- name: Remove Consul client service name
lineinfile:
path: /etc/nomad.d/nomad.hcl
regexp: '^ client_service_name ='
state: absent
- name: Remove Consul auto-advertise
lineinfile:
path: /etc/nomad.d/nomad.hcl
regexp: '^ auto_advertise ='
state: absent
- name: Remove Consul server auto-join
lineinfile:
path: /etc/nomad.d/nomad.hcl
regexp: '^ server_auto_join ='
state: absent
- name: Remove Consul client auto-join
lineinfile:
path: /etc/nomad.d/nomad.hcl
regexp: '^ client_auto_join ='
state: absent
- name: Remove Consul closing brace
lineinfile:
path: /etc/nomad.d/nomad.hcl
regexp: '^}'
state: absent
- name: Restart Nomad service
systemd:
name: nomad
state: restarted
- name: Wait for Nomad to be ready
wait_for:
port: 4646
host: "{{ ansible_default_ipv4.address }}"
delay: 5
timeout: 30
- name: Display completion message
debug:
msg: "Removed Consul configuration from {{ inventory_hostname }}"

View File

@ -0,0 +1,32 @@
---
- name: Enable Nomad Client Mode on Servers
hosts: ch2,ch3,de
become: yes
tasks:
- name: Enable Nomad client mode
lineinfile:
path: /etc/nomad.d/nomad.hcl
regexp: '^client \{'
line: 'client {'
state: present
- name: Enable client mode
lineinfile:
path: /etc/nomad.d/nomad.hcl
regexp: '^ enabled = false'
line: ' enabled = true'
state: present
- name: Restart Nomad service
systemd:
name: nomad
state: restarted
- name: Wait for Nomad to be ready
wait_for:
port: 4646
host: "{{ ansible_default_ipv4.address }}"
delay: 5
timeout: 30

View File

@ -0,0 +1,62 @@
---
- name: Fix all master references to ch4
hosts: localhost
gather_facts: no
vars:
files_to_fix:
- "scripts/diagnose-consul-sync.sh"
- "scripts/register-traefik-to-all-consul.sh"
- "deployment/ansible/playbooks/update-nomad-consul-config.yml"
- "deployment/ansible/templates/nomad-server.hcl.j2"
- "deployment/ansible/templates/nomad-client.hcl"
- "deployment/ansible/playbooks/fix-nomad-consul-roles.yml"
- "deployment/ansible/onecloud1_nomad.hcl"
- "ansible/templates/consul-client.hcl.j2"
- "ansible/consul-client-deployment.yml"
- "ansible/consul-client-simple.yml"
tasks:
- name: Replace master.tailnet-68f9.ts.net with ch4.tailnet-68f9.ts.net
replace:
path: "{{ item }}"
regexp: 'master\.tailnet-68f9\.ts\.net'
replace: 'ch4.tailnet-68f9.ts.net'
loop: "{{ files_to_fix }}"
when: item is file
- name: Replace master hostname references
replace:
path: "{{ item }}"
regexp: '\bmaster\b'
replace: 'ch4'
loop: "{{ files_to_fix }}"
when: item is file
- name: Replace master IP references in comments
replace:
path: "{{ item }}"
regexp: '# master'
replace: '# ch4'
loop: "{{ files_to_fix }}"
when: item is file
- name: Fix inventory files
replace:
path: "{{ item }}"
regexp: 'master ansible_host=master'
replace: 'ch4 ansible_host=ch4'
loop:
- "deployment/ansible/inventories/production/inventory.ini"
- "deployment/ansible/inventories/production/csol-consul-nodes.ini"
- "deployment/ansible/inventories/production/nomad-clients.ini"
- "deployment/ansible/inventories/production/master-ash3c.ini"
- "deployment/ansible/inventories/production/consul-nodes.ini"
- "deployment/ansible/inventories/production/vault.ini"
- name: Fix IP address references (100.117.106.136 comments)
replace:
path: "{{ item }}"
regexp: '100\.117\.106\.136.*# master'
replace: '100.117.106.136 # ch4'
loop: "{{ files_to_fix }}"
when: item is file

View File

@ -72,7 +72,7 @@
"description": "Consul客户端节点用于服务发现和健康检查",
"nodes": [
{
"name": "master",
"name": "ch4",
"host": "100.117.106.136",
"user": "ben",
"password": "3131",

View File

@ -2,21 +2,21 @@
# 服务器节点 (7个服务器节点)
# ⚠️ 警告:能力越大,责任越大!服务器节点操作需极其谨慎!
# ⚠️ 任何对服务器节点的操作都可能影响整个集群的稳定性!
semaphore ansible_host=semaphore.tailnet-68f9.ts.net ansible_user=root ansible_password=313131 ansible_become_password=313131
semaphore ansible_host=semaphore.tailnet-68f9.ts.net ansible_user=root ansible_password=3131 ansible_become_password=3131
ash1d ansible_host=ash1d.tailnet-68f9.ts.net ansible_user=ben ansible_password=3131 ansible_become_password=3131
ash2e ansible_host=ash2e.tailnet-68f9.ts.net ansible_user=ben ansible_password=3131 ansible_become_password=3131
ch2 ansible_host=ch2.tailnet-68f9.ts.net ansible_user=ben ansible_password=3131 ansible_become_password=3131
ch3 ansible_host=ch3.tailnet-68f9.ts.net ansible_user=ben ansible_password=3131 ansible_become_password=3131
onecloud1 ansible_host=onecloud1.tailnet-68f9.ts.net ansible_user=ben ansible_password=3131 ansible_become_password=3131
de ansible_host=de.tailnet-68f9.ts.net ansible_user=ben ansible_password=3131 ansible_become_password=3131
hcp1 ansible_host=hcp1.tailnet-68f9.ts.net ansible_user=root ansible_password=3131 ansible_become_password=3131
[nomad_clients]
# 客户端节点
master ansible_host=master.tailnet-68f9.ts.net ansible_user=ben ansible_password=3131 ansible_become_password=3131 ansible_port=60022
# 客户端节点 (5个客户端节点)
ch4 ansible_host=ch4.tailnet-68f9.ts.net ansible_user=ben ansible_password=3131 ansible_become_password=3131
ash3c ansible_host=ash3c.tailnet-68f9.ts.net ansible_user=ben ansible_password=3131 ansible_become_password=3131
browser ansible_host=browser.tailnet-68f9.ts.net ansible_user=ben ansible_password=3131 ansible_become_password=3131
influxdb1 ansible_host=influxdb1.tailnet-68f9.ts.net ansible_user=ben ansible_password=3131 ansible_become_password=3131
hcp1 ansible_host=hcp1.tailnet-68f9.ts.net ansible_user=root ansible_password=3131 ansible_become_password=3131
warden ansible_host=warden.tailnet-68f9.ts.net ansible_user=ben ansible_password=3131 ansible_become_password=3131
[nomad_nodes:children]

View File

@ -11,7 +11,7 @@ ash1d ansible_host=ash1d ansible_user=ben ansible_become=yes ansible_become_pass
ash2e ansible_host=ash2e ansible_user=ben ansible_become=yes ansible_become_pass=3131
[oci_a1]
master ansible_host=master ansible_port=60022 ansible_user=ben ansible_become=yes ansible_become_pass=3131
ch4 ansible_host=ch4 ansible_user=ben ansible_become=yes ansible_become_pass=3131
ash3c ansible_host=ash3c ansible_user=ben ansible_become=yes ansible_become_pass=3131

View File

@ -0,0 +1,62 @@
---
- name: Configure Nomad Dynamic Host Volumes for NFS
hosts: nomad_clients
become: yes
vars:
nfs_server: "snail"
nfs_share: "/fs/1000/nfs/Fnsync"
mount_point: "/mnt/fnsync"
tasks:
- name: Stop Nomad service
systemd:
name: nomad
state: stopped
- name: Update Nomad configuration for dynamic host volumes
blockinfile:
path: /etc/nomad.d/nomad.hcl
marker: "# {mark} DYNAMIC HOST VOLUMES CONFIGURATION"
block: |
client {
# 启用动态host volumes
host_volume "fnsync" {
path = "{{ mount_point }}"
read_only = false
}
# 添加NFS相关的节点元数据
meta {
nfs_server = "{{ nfs_server }}"
nfs_share = "{{ nfs_share }}"
nfs_mounted = "true"
}
}
insertafter: 'client {'
- name: Start Nomad service
systemd:
name: nomad
state: started
enabled: yes
- name: Wait for Nomad to start
wait_for:
port: 4646
delay: 10
timeout: 60
- name: Check Nomad status
command: nomad node status
register: nomad_status
ignore_errors: yes
- name: Display Nomad status
debug:
var: nomad_status.stdout_lines

View File

@ -0,0 +1,41 @@
---
- name: 部署Nomad服务器配置模板
hosts: nomad_servers
become: yes
tasks:
- name: 部署Nomad配置文件
template:
src: nomad-server.hcl.j2
dest: /etc/nomad.d/nomad.hcl
backup: yes
owner: root
group: root
mode: '0644'
- name: 重启Nomad服务
systemd:
name: nomad
state: restarted
enabled: yes
- name: 等待Nomad服务启动
wait_for:
port: 4646
host: "{{ ansible_host }}"
timeout: 30
- name: 显示Nomad服务状态
systemd:
name: nomad
register: nomad_status
- name: 显示服务状态
debug:
msg: "{{ inventory_hostname }} Nomad服务状态: {{ nomad_status.status.ActiveState }}"

View File

@ -0,0 +1,39 @@
---
- name: 紧急修复Nomad bootstrap_expect配置
hosts: nomad_servers
become: yes
tasks:
- name: 修复bootstrap_expect为3
lineinfile:
path: /etc/nomad.d/nomad.hcl
regexp: '^ bootstrap_expect = \d+'
line: ' bootstrap_expect = 3'
backup: yes
- name: 重启Nomad服务
systemd:
name: nomad
state: restarted
enabled: yes
- name: 等待Nomad服务启动
wait_for:
port: 4646
host: "{{ ansible_host }}"
timeout: 30
- name: 检查Nomad服务状态
systemd:
name: nomad
register: nomad_status
- name: 显示Nomad服务状态
debug:
msg: "{{ inventory_hostname }} Nomad服务状态: {{ nomad_status.status.ActiveState }}"

View File

@ -0,0 +1,103 @@
---
- name: Fix ch4 Nomad configuration - convert from server to client
hosts: ch4
become: yes
vars:
ansible_host: 100.117.106.136
tasks:
- name: Backup current Nomad config
copy:
src: /etc/nomad.d/nomad.hcl
dest: /etc/nomad.d/nomad.hcl.backup
remote_src: yes
backup: yes
- name: Update Nomad config to client mode
blockinfile:
path: /etc/nomad.d/nomad.hcl
marker: "# {mark} ANSIBLE MANAGED CLIENT CONFIG"
block: |
server {
enabled = false
}
client {
enabled = true
network_interface = "tailscale0"
servers = [
"semaphore.tailnet-68f9.ts.net:4647",
"ash1d.tailnet-68f9.ts.net:4647",
"ash2e.tailnet-68f9.ts.net:4647",
"ch2.tailnet-68f9.ts.net:4647",
"ch3.tailnet-68f9.ts.net:4647",
"onecloud1.tailnet-68f9.ts.net:4647",
"de.tailnet-68f9.ts.net:4647"
]
meta {
consul = "true"
consul_version = "1.21.5"
consul_server = "true"
}
}
insertbefore: '^server \{'
replace: '^server \{.*?\}'
- name: Update client block
blockinfile:
path: /etc/nomad.d/nomad.hcl
marker: "# {mark} ANSIBLE MANAGED CLIENT BLOCK"
block: |
client {
enabled = true
network_interface = "tailscale0"
servers = [
"semaphore.tailnet-68f9.ts.net:4647",
"ash1d.tailnet-68f9.ts.net:4647",
"ash2e.tailnet-68f9.ts.net:4647",
"ch2.tailnet-68f9.ts.net:4647",
"ch3.tailnet-68f9.ts.net:4647",
"onecloud1.tailnet-68f9.ts.net:4647",
"de.tailnet-68f9.ts.net:4647"
]
meta {
consul = "true"
consul_version = "1.21.5"
consul_server = "true"
}
}
insertbefore: '^client \{'
replace: '^client \{.*?\}'
- name: Restart Nomad service
systemd:
name: nomad
state: restarted
enabled: yes
- name: Wait for Nomad to be ready
wait_for:
port: 4646
host: "{{ ansible_default_ipv4.address }}"
delay: 5
timeout: 30
- name: Verify Nomad client status
shell: |
NOMAD_ADDR=http://localhost:4646 nomad node status | grep -q "ready"
register: nomad_ready
failed_when: nomad_ready.rc != 0
retries: 3
delay: 10
- name: Display completion message
debug:
msg: |
✅ Successfully converted ch4 from Nomad server to client
✅ Nomad service restarted
✅ Configuration updated

View File

@ -0,0 +1,82 @@
---
- name: Fix master node - rename to ch4 and restore SSH port 22
hosts: master
become: yes
vars:
new_hostname: ch4
old_hostname: master
tasks:
- name: Backup current hostname
copy:
content: "{{ old_hostname }}"
dest: /etc/hostname.backup
mode: '0644'
when: ansible_hostname == old_hostname
- name: Update hostname to ch4
hostname:
name: "{{ new_hostname }}"
when: ansible_hostname == old_hostname
- name: Update /etc/hostname file
copy:
content: "{{ new_hostname }}"
dest: /etc/hostname
mode: '0644'
when: ansible_hostname == old_hostname
- name: Update /etc/hosts file
lineinfile:
path: /etc/hosts
regexp: '^127\.0\.1\.1.*{{ old_hostname }}'
line: '127.0.1.1 {{ new_hostname }}'
state: present
when: ansible_hostname == old_hostname
- name: Update Tailscale hostname
shell: |
tailscale set --hostname={{ new_hostname }}
when: ansible_hostname == old_hostname
- name: Backup SSH config
copy:
src: /etc/ssh/sshd_config
dest: /etc/ssh/sshd_config.backup
remote_src: yes
backup: yes
- name: Restore SSH port to 22
lineinfile:
path: /etc/ssh/sshd_config
regexp: '^Port '
line: 'Port 22'
state: present
- name: Restart SSH service
systemd:
name: ssh
state: restarted
enabled: yes
- name: Wait for SSH to be ready on port 22
wait_for:
port: 22
host: "{{ ansible_default_ipv4.address }}"
delay: 5
timeout: 30
- name: Test SSH connection on port 22
ping:
delegate_to: "{{ inventory_hostname }}"
vars:
ansible_port: 22
- name: Display completion message
debug:
msg: |
✅ Successfully renamed {{ old_hostname }} to {{ new_hostname }}
✅ SSH port restored to 22
✅ Tailscale hostname updated
🔄 Please update your inventory file to use the new hostname and port

View File

@ -0,0 +1,71 @@
---
- name: Install and configure Consul clients on all nodes
hosts: all
become: yes
vars:
consul_servers:
- "100.117.106.136" # ch4 (韩国)
- "100.122.197.112" # warden (北京)
- "100.116.80.94" # ash3c (美国)
tasks:
- name: Get Tailscale IP address
shell: ip addr show tailscale0 | grep 'inet ' | awk '{print $2}' | cut -d/ -f1
register: tailscale_ip_result
changed_when: false
- name: Set Tailscale IP fact
set_fact:
tailscale_ip: "{{ tailscale_ip_result.stdout }}"
- name: Install Consul
apt:
name: consul
state: present
update_cache: yes
- name: Create Consul data directory
file:
path: /opt/consul/data
state: directory
owner: consul
group: consul
mode: '0755'
- name: Create Consul log directory
file:
path: /var/log/consul
state: directory
owner: consul
group: consul
mode: '0755'
- name: Create Consul config directory
file:
path: /etc/consul.d
state: directory
owner: consul
group: consul
mode: '0755'
- name: Generate Consul client configuration
template:
src: consul-client.hcl.j2
dest: /etc/consul.d/consul.hcl
owner: consul
group: consul
mode: '0644'
notify: restart consul
- name: Enable and start Consul service
systemd:
name: consul
enabled: yes
state: started
daemon_reload: yes
handlers:
- name: restart consul
systemd:
name: consul
state: restarted

View File

@ -0,0 +1,91 @@
---
- name: Install NFS CSI Plugin for Nomad
hosts: nomad_nodes
become: yes
vars:
nomad_user: nomad
nomad_plugins_dir: /opt/nomad/plugins
csi_driver_version: "v4.0.0"
csi_driver_url: "https://github.com/kubernetes-csi/csi-driver-nfs/releases/download/{{ csi_driver_version }}/csi-nfs-driver"
tasks:
- name: Stop Nomad service
systemd:
name: nomad
state: stopped
- name: Create plugins directory
file:
path: "{{ nomad_plugins_dir }}"
state: directory
owner: "{{ nomad_user }}"
group: "{{ nomad_user }}"
mode: '0755'
- name: Download NFS CSI driver
get_url:
url: "{{ csi_driver_url }}"
dest: "{{ nomad_plugins_dir }}/csi-nfs-driver"
owner: "{{ nomad_user }}"
group: "{{ nomad_user }}"
mode: '0755'
- name: Install required packages for CSI
package:
name:
- nfs-common
- mount
state: present
- name: Create CSI mount directory
file:
path: /opt/nomad/csi
state: directory
owner: "{{ nomad_user }}"
group: "{{ nomad_user }}"
mode: '0755'
- name: Update Nomad configuration for CSI plugin
blockinfile:
path: /etc/nomad.d/nomad.hcl
marker: "# {mark} CSI PLUGIN CONFIGURATION"
block: |
plugin_dir = "{{ nomad_plugins_dir }}"
plugin "csi-nfs" {
type = "csi"
config {
driver_name = "nfs.csi.k8s.io"
mount_dir = "/opt/nomad/csi"
health_timeout = "30s"
log_level = "INFO"
}
}
insertafter: 'data_dir = "/opt/nomad/data"'
- name: Start Nomad service
systemd:
name: nomad
state: started
enabled: yes
- name: Wait for Nomad to start
wait_for:
port: 4646
delay: 10
timeout: 60
- name: Check Nomad status
command: nomad node status
register: nomad_status
ignore_errors: yes
- name: Display Nomad status
debug:
var: nomad_status.stdout_lines

View File

@ -0,0 +1,33 @@
---
- name: 启动所有Nomad服务器形成集群
hosts: nomad_servers
become: yes
tasks:
- name: 检查Nomad服务状态
systemd:
name: nomad
register: nomad_status
- name: 启动Nomad服务如果未运行
systemd:
name: nomad
state: started
enabled: yes
when: nomad_status.status.ActiveState != "active"
- name: 等待Nomad服务启动
wait_for:
port: 4646
host: "{{ ansible_host }}"
timeout: 30
- name: 显示Nomad服务状态
debug:
msg: "{{ inventory_hostname }} Nomad服务状态: {{ nomad_status.status.ActiveState }}"

View File

@ -0,0 +1,61 @@
# Consul Client Configuration for {{ inventory_hostname }}
datacenter = "dc1"
data_dir = "/opt/consul/data"
log_level = "INFO"
node_name = "{{ inventory_hostname }}"
bind_addr = "{{ hostvars[inventory_hostname]['tailscale_ip'] }}"
# Client mode (not server)
server = false
# Connect to Consul servers (指向三节点集群)
retry_join = [
{% for server in consul_servers %}
"{{ server }}"{% if not loop.last %},{% endif %}
{% endfor %}
]
# Performance optimization
performance {
raft_multiplier = 5
}
# Ports configuration
ports {
grpc = 8502
http = 8500
dns = 8600
}
# Enable Connect for service mesh
connect {
enabled = true
}
# Cache configuration for performance
cache {
entry_fetch_max_burst = 42
entry_fetch_rate = 30
}
# Node metadata
node_meta = {
region = "unknown"
zone = "nomad-{{ 'server' if 'server' in group_names else 'client' }}"
}
# UI disabled for clients
ui_config {
enabled = false
}
# ACL configuration (if needed)
acl = {
enabled = false
default_policy = "allow"
}
# Logging
log_file = "/var/log/consul/consul.log"
log_rotate_duration = "24h"
log_rotate_max_files = 7

View File

@ -0,0 +1,106 @@
datacenter = "dc1"
data_dir = "/opt/nomad/data"
plugin_dir = "/opt/nomad/plugins"
log_level = "INFO"
name = "{{ ansible_hostname }}"
bind_addr = "0.0.0.0"
addresses {
http = "{{ ansible_host }}"
rpc = "{{ ansible_host }}"
serf = "{{ ansible_host }}"
}
advertise {
http = "{{ ansible_host }}:4646"
rpc = "{{ ansible_host }}:4647"
serf = "{{ ansible_host }}:4648"
}
ports {
http = 4646
rpc = 4647
serf = 4648
}
server {
enabled = true
bootstrap_expect = 3
server_join {
retry_join = [
"semaphore.tailnet-68f9.ts.net:4648",
"ash1d.tailnet-68f9.ts.net:4648",
"ash2e.tailnet-68f9.ts.net:4648",
"ch2.tailnet-68f9.ts.net:4648",
"ch3.tailnet-68f9.ts.net:4648",
"onecloud1.tailnet-68f9.ts.net:4648",
"de.tailnet-68f9.ts.net:4648",
"hcp1.tailnet-68f9.ts.net:4648"
]
}
}
{% if ansible_hostname == 'hcp1' %}
client {
enabled = true
network_interface = "tailscale0"
servers = [
"semaphore.tailnet-68f9.ts.net:4647",
"ash1d.tailnet-68f9.ts.net:4647",
"ash2e.tailnet-68f9.ts.net:4647",
"ch2.tailnet-68f9.ts.net:4647",
"ch3.tailnet-68f9.ts.net:4647",
"onecloud1.tailnet-68f9.ts.net:4647",
"de.tailnet-68f9.ts.net:4647",
"hcp1.tailnet-68f9.ts.net:4647"
]
host_volume "traefik-certs" {
path = "/opt/traefik/certs"
read_only = false
}
host_volume "fnsync" {
path = "/mnt/fnsync"
read_only = false
}
meta {
consul = "true"
consul_version = "1.21.5"
consul_client = "true"
}
gc_interval = "5m"
gc_disk_usage_threshold = 80
gc_inode_usage_threshold = 70
}
plugin "nomad-driver-podman" {
config {
socket_path = "unix:///run/podman/podman.sock"
volumes {
enabled = true
}
}
}
{% endif %}
consul {
address = "ch4.tailnet-68f9.ts.net:8500,ash3c.tailnet-68f9.ts.net:8500,warden.tailnet-68f9.ts.net:8500"
server_service_name = "nomad"
client_service_name = "nomad-client"
auto_advertise = true
server_auto_join = false
client_auto_join = true
}
telemetry {
collection_interval = "1s"
disable_hostname = false
prometheus_metrics = true
publish_allocation_metrics = true
publish_node_metrics = true
}

View File

@ -19,7 +19,7 @@
- ip: "100.120.225.29"
hostnames: ["de"]
- ip: "100.117.106.136"
hostnames: ["master"]
hostnames: ["ch4"]
- ip: "100.116.80.94"
hostnames: ["ash3c", "influxdb1"]
- ip: "100.116.112.45"

View File

@ -0,0 +1,56 @@
---
- name: 更新Nomad服务器配置添加hcp1作为peer
hosts: nomad_servers
become: yes
vars:
hcp1_ip: "100.97.62.111"
bootstrap_expect: 8
tasks:
- name: 备份原配置文件
copy:
src: /etc/nomad.d/nomad.hcl
dest: /etc/nomad.d/nomad.hcl.bak
remote_src: yes
backup: yes
- name: 添加hcp1到retry_join列表
lineinfile:
path: /etc/nomad.d/nomad.hcl
regexp: '^ retry_join = \['
line: ' retry_join = ["{{ hcp1_ip }}",'
backup: yes
- name: 更新bootstrap_expect为8
lineinfile:
path: /etc/nomad.d/nomad.hcl
regexp: '^ bootstrap_expect = \d+'
line: ' bootstrap_expect = {{ bootstrap_expect }}'
backup: yes
- name: 重启Nomad服务
systemd:
name: nomad
state: restarted
enabled: yes
- name: 等待Nomad服务启动
wait_for:
port: 4646
host: "{{ ansible_host }}"
timeout: 30
- name: 检查Nomad服务状态
systemd:
name: nomad
register: nomad_status
- name: 显示Nomad服务状态
debug:
msg: "Nomad服务状态: {{ nomad_status.status.ActiveState }}"

View File

@ -0,0 +1,72 @@
---
- name: Remove Consul configuration from all Nomad servers
hosts: semaphore,ash1d,ash2e,ch2,ch3,onecloud1,de
become: yes
tasks:
- name: Create clean Nomad server configuration
copy:
content: |
datacenter = "dc1"
data_dir = "/opt/nomad/data"
plugin_dir = "/opt/nomad/plugins"
log_level = "INFO"
name = "{{ inventory_hostname }}"
bind_addr = "{{ inventory_hostname }}.tailnet-68f9.ts.net"
addresses {
http = "{{ inventory_hostname }}.tailnet-68f9.ts.net"
rpc = "{{ inventory_hostname }}.tailnet-68f9.ts.net"
serf = "{{ inventory_hostname }}.tailnet-68f9.ts.net"
}
advertise {
http = "{{ inventory_hostname }}.tailnet-68f9.ts.net:4646"
rpc = "{{ inventory_hostname }}.tailnet-68f9.ts.net:4647"
serf = "{{ inventory_hostname }}.tailnet-68f9.ts.net:4648"
}
ports {
http = 4646
rpc = 4647
serf = 4648
}
server {
enabled = true
bootstrap_expect = 7
retry_join = ["ash1d.tailnet-68f9.ts.net","ash2e.tailnet-68f9.ts.net","ch2.tailnet-68f9.ts.net","ch3.tailnet-68f9.ts.net","onecloud1.tailnet-68f9.ts.net","de.tailnet-68f9.ts.net"]
}
client {
enabled = false
}
plugin "nomad-driver-podman" {
config {
socket_path = "unix:///run/podman/podman.sock"
volumes {
enabled = true
}
}
}
dest: /etc/nomad.d/nomad.hcl
mode: '0644'
- name: Restart Nomad service
systemd:
name: nomad
state: restarted
- name: Wait for Nomad to be ready
wait_for:
port: 4646
host: "{{ ansible_default_ipv4.address }}"
delay: 5
timeout: 30
- name: Display completion message
debug:
msg: "Removed Consul configuration from {{ inventory_hostname }}"

View File

@ -0,0 +1,62 @@
# Consul Client Configuration for {{ inventory_hostname }}
datacenter = "dc1"
data_dir = "/opt/consul/data"
log_level = "INFO"
node_name = "{{ inventory_hostname }}"
bind_addr = "{{ ansible_host }}"
# Client mode (not server)
server = false
# Connect to Consul servers (指向三节点集群)
retry_join = [
{% for server in consul_servers %}
"{{ server }}"{% if not loop.last %},{% endif %}
{% endfor %}
]
# Performance optimization
performance {
raft_multiplier = 5
}
# Ports configuration
ports {
grpc = 8502
http = 8500
dns = 8600
}
# Enable Connect for service mesh
connect {
enabled = true
}
# Cache configuration for performance
cache {
entry_fetch_max_burst = 42
entry_fetch_rate = 30
}
# Node metadata
node_meta = {
region = "unknown"
zone = "nomad-{{ 'server' if 'server' in group_names else 'client' }}"
}
# UI disabled for clients
ui_config {
enabled = false
}
# ACL configuration (if needed)
acl = {
enabled = false
default_policy = "allow"
}
# Logging
log_file = "/var/log/consul/consul.log"
log_rotate_duration = "24h"
log_rotate_max_files = 7

View File

@ -49,6 +49,11 @@ client {
read_only = false
}
host_volume "vault-storage" {
path = "/opt/nomad/data/vault-storage"
read_only = false
}
# 禁用Docker驱动只使用Podman
options {
"driver.raw_exec.enable" = "1"

View File

@ -2,20 +2,20 @@ datacenter = "dc1"
data_dir = "/opt/nomad/data"
plugin_dir = "/opt/nomad/plugins"
log_level = "INFO"
name = "{{ server_name }}"
name = "{{ ansible_hostname }}"
bind_addr = "{{ server_name }}.tailnet-68f9.ts.net"
bind_addr = "0.0.0.0"
addresses {
http = "{{ server_name }}.tailnet-68f9.ts.net"
rpc = "{{ server_name }}.tailnet-68f9.ts.net"
serf = "{{ server_name }}.tailnet-68f9.ts.net"
http = "{{ ansible_host }}"
rpc = "{{ ansible_host }}"
serf = "{{ ansible_host }}"
}
advertise {
http = "{{ server_name }}.tailnet-68f9.ts.net:4646"
rpc = "{{ server_name }}.tailnet-68f9.ts.net:4647"
serf = "{{ server_name }}.tailnet-68f9.ts.net:4648"
http = "{{ ansible_host }}:4646"
rpc = "{{ ansible_host }}:4647"
serf = "{{ ansible_host }}:4648"
}
ports {
@ -26,18 +26,56 @@ ports {
server {
enabled = true
bootstrap_expect = 7
bootstrap_expect = 3
server_join {
retry_join = [
{%- for server in groups['nomad_servers'] -%}
{%- if server != inventory_hostname -%}
"{{ server }}.tailnet-68f9.ts.net"{% if not loop.last %},{% endif %}
{%- endif -%}
{%- endfor -%}
"semaphore.tailnet-68f9.ts.net:4648",
"ash1d.tailnet-68f9.ts.net:4648",
"ash2e.tailnet-68f9.ts.net:4648",
"ch2.tailnet-68f9.ts.net:4648",
"ch3.tailnet-68f9.ts.net:4648",
"onecloud1.tailnet-68f9.ts.net:4648",
"de.tailnet-68f9.ts.net:4648",
"hcp1.tailnet-68f9.ts.net:4648"
]
}
}
{% if ansible_hostname == 'hcp1' %}
client {
enabled = false
enabled = true
network_interface = "tailscale0"
servers = [
"semaphore.tailnet-68f9.ts.net:4647",
"ash1d.tailnet-68f9.ts.net:4647",
"ash2e.tailnet-68f9.ts.net:4647",
"ch2.tailnet-68f9.ts.net:4647",
"ch3.tailnet-68f9.ts.net:4647",
"onecloud1.tailnet-68f9.ts.net:4647",
"de.tailnet-68f9.ts.net:4647",
"hcp1.tailnet-68f9.ts.net:4647"
]
host_volume "traefik-certs" {
path = "/opt/traefik/certs"
read_only = false
}
host_volume "fnsync" {
path = "/mnt/fnsync"
read_only = false
}
meta {
consul = "true"
consul_version = "1.21.5"
consul_client = "true"
}
gc_interval = "5m"
gc_disk_usage_threshold = 80
gc_inode_usage_threshold = 70
}
plugin "nomad-driver-podman" {
@ -48,20 +86,21 @@ plugin "nomad-driver-podman" {
}
}
}
{% endif %}
consul {
address = "master.tailnet-68f9.ts.net:8500,ash3c.tailnet-68f9.ts.net:8500,warden.tailnet-68f9.ts.net:8500"
address = "ch4.tailnet-68f9.ts.net:8500,ash3c.tailnet-68f9.ts.net:8500,warden.tailnet-68f9.ts.net:8500"
server_service_name = "nomad"
client_service_name = "nomad-client"
auto_advertise = true
server_auto_join = true
server_auto_join = false
client_auto_join = true
}
vault {
enabled = true
address = "http://master.tailnet-68f9.ts.net:8200,http://ash3c.tailnet-68f9.ts.net:8200,http://warden.tailnet-68f9.ts.net:8200"
token = "hvs.A5Fu4E1oHyezJapVllKPFsWg"
create_from_role = "nomad-cluster"
tls_skip_verify = true
telemetry {
collection_interval = "1s"
disable_hostname = false
prometheus_metrics = true
publish_allocation_metrics = true
publish_node_metrics = true
}

View File

@ -64,7 +64,7 @@ plugin "nomad-driver-podman" {
}
consul {
address = "master.tailnet-68f9.ts.net:8500,ash3c.tailnet-68f9.ts.net:8500,warden.tailnet-68f9.ts.net:8500"
address = "ch4.tailnet-68f9.ts.net:8500,ash3c.tailnet-68f9.ts.net:8500,warden.tailnet-68f9.ts.net:8500"
server_service_name = "nomad"
client_service_name = "nomad-client"
auto_advertise = true
@ -74,7 +74,7 @@ consul {
vault {
enabled = true
address = "http://master.tailnet-68f9.ts.net:8200,http://ash3c.tailnet-68f9.ts.net:8200,http://warden.tailnet-68f9.ts.net:8200"
address = "http://ch4.tailnet-68f9.ts.net:8200,http://ash3c.tailnet-68f9.ts.net:8200,http://warden.tailnet-68f9.ts.net:8200"
token = "hvs.A5Fu4E1oHyezJapVllKPFsWg"
create_from_role = "nomad-cluster"
tls_skip_verify = true

View File

@ -0,0 +1,45 @@
# Vault Configuration for {{ inventory_hostname }}
# Storage backend - Consul
storage "consul" {
address = "127.0.0.1:8500"
path = "vault/"
# Consul datacenter
datacenter = "{{ vault_datacenter }}"
# Service registration
service = "vault"
service_tags = "vault-server"
# Session TTL
session_ttl = "15s"
lock_wait_time = "15s"
}
# Listener configuration
listener "tcp" {
address = "0.0.0.0:8200"
tls_disable = 1
}
# API address - 使用Tailscale网络地址
api_addr = "http://{{ ansible_host }}:8200"
# Cluster address - 使用Tailscale网络地址
cluster_addr = "http://{{ ansible_host }}:8201"
# UI
ui = true
# Cluster name
cluster_name = "{{ vault_cluster_name }}"
# Disable mlock for development (remove in production)
disable_mlock = true
# Log level
log_level = "INFO"
# Plugin directory
plugin_directory = "/opt/vault/plugins"

View File

@ -0,0 +1,34 @@
[Unit]
Description=Vault
Documentation=https://www.vaultproject.io/docs/
Requires=network-online.target
After=network-online.target
ConditionFileNotEmpty=/etc/vault.d/vault.hcl
StartLimitIntervalSec=60
StartLimitBurst=3
[Service]
Type=notify
User=vault
Group=vault
ProtectSystem=full
ProtectHome=read-only
PrivateTmp=yes
PrivateDevices=yes
SecureBits=keep-caps
AmbientCapabilities=CAP_IPC_LOCK
CapabilityBoundingSet=CAP_SYSLOG CAP_IPC_LOCK
NoNewPrivileges=yes
ExecStart=/usr/bin/vault server -config=/etc/vault.d/vault.hcl
ExecReload=/bin/kill --signal HUP $MAINPID
KillMode=process
Restart=on-failure
RestartSec=5
TimeoutStopSec=30
StartLimitInterval=60
StartLimitBurst=3
LimitNOFILE=65536
LimitMEMLOCK=infinity
[Install]
WantedBy=multi-user.target

View File

@ -0,0 +1,66 @@
---
- name: Initialize Vault Cluster
hosts: ch4 # 只在一个节点初始化
become: yes
tasks:
- name: Check if Vault is already initialized
uri:
url: "http://{{ ansible_host }}:8200/v1/sys/health"
method: GET
status_code: [200, 429, 472, 473, 501, 503]
register: vault_health
- name: Initialize Vault (only if not initialized)
uri:
url: "http://{{ ansible_host }}:8200/v1/sys/init"
method: POST
body_format: json
body:
secret_shares: 5
secret_threshold: 3
status_code: 200
register: vault_init_result
when: not vault_health.json.initialized
- name: Save initialization results to local file
copy:
content: |
# Vault Cluster Initialization Results
Generated on: {{ ansible_date_time.iso8601 }}
Initialized by: {{ inventory_hostname }}
## Root Token
{{ vault_init_result.json.root_token }}
## Unseal Keys
{% for key in vault_init_result.json.keys %}
Key {{ loop.index }}: {{ key }}
{% endfor %}
## Base64 Unseal Keys
{% for key in vault_init_result.json.keys_base64 %}
Key {{ loop.index }} (base64): {{ key }}
{% endfor %}
## Important Notes
- Store these keys securely and separately
- You need 3 out of 5 keys to unseal Vault
- Root token provides full access to Vault
- Consider revoking root token after initial setup
dest: /tmp/vault-init-results.txt
delegate_to: localhost
when: vault_init_result is defined and vault_init_result.json is defined
- name: Display initialization results
debug:
msg: |
Vault initialized successfully!
Root Token: {{ vault_init_result.json.root_token }}
Unseal Keys: {{ vault_init_result.json.keys }}
when: vault_init_result is defined and vault_init_result.json is defined
- name: Display already initialized message
debug:
msg: "Vault is already initialized on {{ inventory_hostname }}"
when: vault_health.json.initialized

View File

@ -0,0 +1,85 @@
---
- name: Deploy Vault Cluster with Consul Integration
hosts: ch4,ash3c,warden
become: yes
vars:
vault_version: "1.15.2"
vault_datacenter: "dc1"
vault_cluster_name: "vault-cluster"
tasks:
- name: Update apt cache
apt:
update_cache: yes
cache_valid_time: 3600
- name: Add HashiCorp GPG key (if not exists)
shell: |
if [ ! -f /etc/apt/sources.list.d/hashicorp.list ]; then
curl -fsSL https://apt.releases.hashicorp.com/gpg | gpg --dearmor | sudo tee /usr/share/keyrings/hashicorp-archive-keyring.gpg
echo "deb [signed-by=/usr/share/keyrings/hashicorp-archive-keyring.gpg] https://apt.releases.hashicorp.com $(lsb_release -cs) main" | sudo tee /etc/apt/sources.list.d/hashicorp.list
fi
args:
creates: /etc/apt/sources.list.d/hashicorp.list
- name: Install Vault
apt:
name: vault
state: present
update_cache: yes
allow_downgrade: yes
- name: Create vault user and directories
block:
- name: Create vault data directory
file:
path: /opt/vault/data
state: directory
owner: vault
group: vault
mode: '0755'
- name: Create vault config directory
file:
path: /etc/vault.d
state: directory
owner: vault
group: vault
mode: '0755'
- name: Generate Vault configuration
template:
src: vault.hcl.j2
dest: /etc/vault.d/vault.hcl
owner: vault
group: vault
mode: '0640'
notify: restart vault
- name: Create Vault systemd service
template:
src: vault.service.j2
dest: /etc/systemd/system/vault.service
owner: root
group: root
mode: '0644'
notify:
- reload systemd
- restart vault
- name: Enable and start Vault service
systemd:
name: vault
enabled: yes
state: started
daemon_reload: yes
handlers:
- name: reload systemd
systemd:
daemon_reload: yes
- name: restart vault
systemd:
name: vault
state: restarted

View File

@ -0,0 +1,67 @@
---
- name: Verify Vault Cluster Status
hosts: ch4,ash3c,warden
become: yes
tasks:
- name: Check Vault service status
systemd:
name: vault
register: vault_service_status
- name: Display Vault service status
debug:
msg: "Vault service on {{ inventory_hostname }}: {{ vault_service_status.status.ActiveState }}"
- name: Check Vault process
shell: ps aux | grep vault | grep -v grep
register: vault_process
ignore_errors: yes
- name: Display Vault process
debug:
msg: "Vault process on {{ inventory_hostname }}: {{ vault_process.stdout_lines }}"
- name: Check Vault port 8200
wait_for:
port: 8200
host: "{{ ansible_default_ipv4.address }}"
timeout: 10
register: vault_port_check
ignore_errors: yes
- name: Display port check result
debug:
msg: "Vault port 8200 on {{ inventory_hostname }}: {{ 'OPEN' if vault_port_check.failed == false else 'CLOSED' }}"
- name: Get Vault status
uri:
url: "http://{{ ansible_default_ipv4.address }}:8200/v1/sys/health"
method: GET
status_code: [200, 429, 472, 473, 501, 503]
register: vault_health
ignore_errors: yes
- name: Display Vault health status
debug:
msg: "Vault health on {{ inventory_hostname }}: {{ vault_health.json if vault_health.json is defined else 'Connection failed' }}"
- name: Check Consul integration
uri:
url: "http://127.0.0.1:8500/v1/kv/vault/?recurse"
method: GET
register: consul_vault_kv
ignore_errors: yes
- name: Display Consul Vault KV
debug:
msg: "Consul Vault KV on {{ inventory_hostname }}: {{ 'Found vault keys' if consul_vault_kv.status == 200 else 'No vault keys found' }}"
- name: Check Vault logs for errors
shell: journalctl -u vault --no-pager -n 10 | grep -i error || echo "No errors found"
register: vault_logs
ignore_errors: yes
- name: Display Vault error logs
debug:
msg: "Vault errors on {{ inventory_hostname }}: {{ vault_logs.stdout_lines }}"

View File

@ -38,6 +38,12 @@ terraform {
source = "hashicorp/vault"
version = "~> 4.0"
}
# Cloudflare Provider
cloudflare = {
source = "cloudflare/cloudflare"
version = "~> 3.0"
}
}
#
@ -53,10 +59,17 @@ provider "consul" {
datacenter = "dc1"
}
# Vault Provider配置
provider "vault" {
address = var.vault_config.address
token = var.vault_token
# Consul获取Cloudflare配置
data "consul_keys" "cloudflare_config" {
key {
name = "token"
path = "config/dev/cloudflare/token"
}
}
# Cloudflare Provider配置
provider "cloudflare" {
api_token = data.consul_keys.cloudflare_config.var.token
}
# Consul获取Oracle Cloud配置
@ -185,8 +198,28 @@ module "nomad_cluster" {
depends_on = [module.oracle_cloud]
}
# Nomad
output "nomad_cluster" {
description = "Nomad 多数据中心集群信息"
value = module.nomad_cluster
# Cloudflare
data "cloudflare_zones" "available" {
filter {
status = "active"
}
}
data "cloudflare_accounts" "available" {}
# Cloudflare
output "cloudflare_connectivity_test" {
description = "Cloudflare API 连通性测试结果"
value = {
zones_count = length(data.cloudflare_zones.available.zones)
accounts_count = length(data.cloudflare_accounts.available.accounts)
zones = [for zone in data.cloudflare_zones.available.zones : {
name = zone.name
id = zone.id
}]
accounts = [for account in data.cloudflare_accounts.available.accounts : {
name = account.name
id = account.id
}]
}
}

View File

@ -17,7 +17,7 @@ output "cluster_overview" {
name = "dc2"
location = "Korea (KR)"
provider = "oracle"
node = "master"
node = "ch4"
ip = try(oci_core_instance.nomad_kr_node[0].public_ip, "pending")
status = "deployed"
} : null

View File

@ -0,0 +1,305 @@
# Vault与Consul集成最佳实践
## 1. 架构设计
### 1.1 高可用架构
- **Vault集群**: 3个节点 (1个Leader + 2个Follower)
- **Consul集群**: 3个节点 (1个Leader + 2个Follower)
- **网络**: Tailscale安全网络
- **存储**: Consul作为Vault的存储后端
### 1.2 节点分布
```
Vault节点:
- ch4.tailnet-68f9.ts.net:8200 (Leader)
- ash3c.tailnet-68f9.ts.net:8200 (Follower)
- warden.tailnet-68f9.ts.net:8200 (Follower)
Consul节点:
- ch4.tailnet-68f9.ts.net:8500 (Leader)
- ash3c.tailnet-68f9.ts.net:8500 (Follower)
- warden.tailnet-68f9.ts.net:8500 (Follower)
```
## 2. Vault配置最佳实践
### 2.1 存储后端配置
```hcl
storage "consul" {
address = "127.0.0.1:8500"
path = "vault/"
# 高可用配置
datacenter = "dc1"
service = "vault"
service_tags = "vault-server"
# 会话配置
session_ttl = "15s"
lock_wait_time = "15s"
# 一致性配置
consistency_mode = "strong"
# 故障转移配置
max_parallel = 128
disable_registration = false
}
```
### 2.2 监听器配置
```hcl
listener "tcp" {
address = "0.0.0.0:8200"
# 生产环境启用TLS
tls_cert_file = "/opt/vault/tls/vault.crt"
tls_key_file = "/opt/vault/tls/vault.key"
tls_min_version = "1.2"
}
# 集群监听器
listener "tcp" {
address = "0.0.0.0:8201"
purpose = "cluster"
tls_cert_file = "/opt/vault/tls/vault.crt"
tls_key_file = "/opt/vault/tls/vault.key"
}
```
### 2.3 集群配置
```hcl
# API地址 - 使用Tailscale网络
api_addr = "https://{{ ansible_host }}:8200"
# 集群地址 - 使用Tailscale网络
cluster_addr = "https://{{ ansible_host }}:8201"
# 集群名称
cluster_name = "vault-cluster"
# 禁用mlock (生产环境应启用)
disable_mlock = false
# 日志配置
log_level = "INFO"
log_format = "json"
```
## 3. Consul配置最佳实践
### 3.1 服务注册配置
```hcl
services {
name = "vault"
tags = ["vault-server", "secrets"]
port = 8200
check {
name = "vault-health"
http = "http://127.0.0.1:8200/v1/sys/health"
interval = "10s"
timeout = "3s"
}
}
```
### 3.2 ACL配置
```hcl
acl {
enabled = true
default_policy = "deny"
enable_token_persistence = true
# Vault服务权限
tokens {
default = "{{ vault_consul_token }}"
}
}
```
## 4. 安全最佳实践
### 4.1 TLS配置
- 所有Vault节点间通信使用TLS
- Consul节点间通信使用TLS
- 客户端到Vault通信使用TLS
### 4.2 认证配置
```hcl
# 启用多种认证方法
auth {
enabled = true
# AppRole认证
approle {
enabled = true
}
# LDAP认证
ldap {
enabled = true
url = "ldap://authentik.tailnet-68f9.ts.net:389"
userdn = "ou=users,dc=authentik,dc=local"
groupdn = "ou=groups,dc=authentik,dc=local"
}
# OIDC认证
oidc {
enabled = true
oidc_discovery_url = "https://authentik1.git-4ta.live/application/o/vault/"
}
}
```
## 5. 监控和审计
### 5.1 审计日志
```hcl
audit {
enabled = true
# 文件审计
file {
path = "/opt/vault/logs/audit.log"
format = "json"
}
# Syslog审计
syslog {
facility = "AUTH"
tag = "vault"
}
}
```
### 5.2 遥测配置
```hcl
telemetry {
prometheus_retention_time = "30s"
disable_hostname = false
# 指标配置
metrics {
enabled = true
prefix = "vault"
}
}
```
## 6. 备份和恢复
### 6.1 自动备份脚本
```bash
#!/bin/bash
# /opt/vault/scripts/backup.sh
VAULT_ADDR="https://vault.git-4ta.live"
VAULT_TOKEN="$(cat /opt/vault/token)"
# 创建快照
vault operator raft snapshot save /opt/vault/backups/vault-$(date +%Y%m%d-%H%M%S).snapshot
# 清理旧备份 (保留7天)
find /opt/vault/backups -name "vault-*.snapshot" -mtime +7 -delete
```
### 6.2 Consul快照
```bash
#!/bin/bash
# /opt/consul/scripts/backup.sh
CONSUL_ADDR="http://127.0.0.1:8500"
# 创建Consul快照
consul snapshot save /opt/consul/backups/consul-$(date +%Y%m%d-%H%M%S).snapshot
```
## 7. 故障转移和灾难恢复
### 7.1 自动故障转移
- Vault使用Raft协议自动选举新Leader
- Consul使用Raft协议自动选举新Leader
- 客户端自动重连到新的Leader节点
### 7.2 灾难恢复流程
1. 停止所有Vault节点
2. 从Consul恢复数据
3. 启动Vault集群
4. 验证服务状态
## 8. 性能优化
### 8.1 缓存配置
```hcl
cache {
enabled = true
size = 1000
persist {
type = "kubernetes"
path = "/opt/vault/cache"
}
}
```
### 8.2 连接池配置
```hcl
storage "consul" {
# 连接池配置
max_parallel = 128
max_requests_per_second = 100
}
```
## 9. 部署检查清单
### 9.1 部署前检查
- [ ] Consul集群健康
- [ ] 网络连通性测试
- [ ] TLS证书配置
- [ ] 防火墙规则配置
- [ ] 存储空间检查
### 9.2 部署后验证
- [ ] Vault集群状态检查
- [ ] 服务注册验证
- [ ] 认证功能测试
- [ ] 备份功能测试
- [ ] 监控指标验证
## 10. 常见问题和解决方案
### 10.1 常见问题
1. **Vault无法连接到Consul**
- 检查网络连通性
- 验证Consul服务状态
- 检查ACL权限
2. **集群分裂问题**
- 检查网络分区
- 验证Raft日志一致性
- 执行灾难恢复流程
3. **性能问题**
- 调整连接池大小
- 启用缓存
- 优化网络配置
### 10.2 故障排除命令
```bash
# 检查Vault状态
vault status
# 检查Consul成员
consul members
# 检查服务注册
consul catalog services
# 检查Vault日志
journalctl -u vault -f
# 检查Consul日志
journalctl -u consul -f
```

BIN
gh.tar.gz Normal file

Binary file not shown.

View File

@ -0,0 +1,42 @@
# Cloudflare
# 使 Consul Cloudflare token API
# Consul Cloudflare
data "consul_keys" "cloudflare_config" {
key {
name = "token"
path = "config/dev/cloudflare/token"
}
}
# Cloudflare Provider
provider "cloudflare" {
api_token = data.consul_keys.cloudflare_config.var.token
}
# Cloudflare API -
data "cloudflare_zones" "available" {
filter {
status = "active"
}
}
# Cloudflare API -
data "cloudflare_accounts" "available" {}
# Cloudflare
output "cloudflare_connectivity_test" {
description = "Cloudflare API 连通性测试结果"
value = {
zones_count = length(data.cloudflare_zones.available.zones)
accounts_count = length(data.cloudflare_accounts.available.accounts)
zones = [for zone in data.cloudflare_zones.available.zones : {
name = zone.name
id = zone.id
}]
accounts = [for account in data.cloudflare_accounts.available.accounts : {
name = account.name
id = account.id
}]
}
}

View File

@ -0,0 +1,66 @@
# -
# ch4 (ARM)
resource "oci_core_instance" "ch4" {
# -
compartment_id = data.consul_keys.oracle_config.var.tenancy_ocid
availability_domain = "CSRd:AP-CHUNCHEON-1-AD-1"
shape = "VM.Standard.A1.Flex"
display_name = "ch4"
shape_config {
ocpus = 4
memory_in_gbs = 24
}
#
lifecycle {
prevent_destroy = true
ignore_changes = [
source_details,
metadata,
create_vnic_details,
time_created
]
}
}
# ch2
resource "oci_core_instance" "ch2" {
# -
compartment_id = data.consul_keys.oracle_config.var.tenancy_ocid
availability_domain = "CSRd:AP-CHUNCHEON-1-AD-1"
shape = "VM.Standard.E2.1.Micro"
display_name = "ch2"
#
lifecycle {
prevent_destroy = true
ignore_changes = [
source_details,
metadata,
create_vnic_details,
time_created
]
}
}
# ch3
resource "oci_core_instance" "ch3" {
# -
compartment_id = data.consul_keys.oracle_config.var.tenancy_ocid
availability_domain = "CSRd:AP-CHUNCHEON-1-AD-1"
shape = "VM.Standard.E2.1.Micro"
display_name = "ch3"
#
lifecycle {
prevent_destroy = true
ignore_changes = [
source_details,
metadata,
create_vnic_details,
time_created
]
}
}

View File

@ -0,0 +1,4 @@
#
data "oci_identity_availability_domains" "kr_test" {
compartment_id = data.consul_keys.oracle_config.var.tenancy_ocid
}

View File

@ -44,6 +44,12 @@ terraform {
source = "digitalocean/digitalocean"
version = "~> 2.0"
}
# Cloudflare Provider
cloudflare = {
source = "cloudflare/cloudflare"
version = "~> 4.0"
}
}
#
@ -65,64 +71,7 @@ provider "vault" {
token = var.vault_token
}
# Consul获取Oracle Cloud配置
data "consul_keys" "oracle_config" {
key {
name = "tenancy_ocid"
path = "config/dev/oracle/kr/tenancy_ocid"
}
key {
name = "user_ocid"
path = "config/dev/oracle/kr/user_ocid"
}
key {
name = "fingerprint"
path = "config/dev/oracle/kr/fingerprint"
}
key {
name = "private_key"
path = "config/dev/oracle/kr/private_key"
}
}
# Consul获取Oracle Cloud美国区域配置
data "consul_keys" "oracle_config_us" {
key {
name = "tenancy_ocid"
path = "config/dev/oracle/us/tenancy_ocid"
}
key {
name = "user_ocid"
path = "config/dev/oracle/us/user_ocid"
}
key {
name = "fingerprint"
path = "config/dev/oracle/us/fingerprint"
}
key {
name = "private_key"
path = "config/dev/oracle/us/private_key"
}
}
# 使Consul获取的配置的OCI Provider
provider "oci" {
tenancy_ocid = data.consul_keys.oracle_config.var.tenancy_ocid
user_ocid = data.consul_keys.oracle_config.var.user_ocid
fingerprint = data.consul_keys.oracle_config.var.fingerprint
private_key = data.consul_keys.oracle_config.var.private_key
region = "ap-chuncheon-1"
}
# OCI Provider
provider "oci" {
alias = "us"
tenancy_ocid = data.consul_keys.oracle_config_us.var.tenancy_ocid
user_ocid = data.consul_keys.oracle_config_us.var.user_ocid
fingerprint = data.consul_keys.oracle_config_us.var.fingerprint
private_key = data.consul_keys.oracle_config_us.var.private_key
region = "us-ashburn-1"
}
# Oracle Cloud oracle.tf
# Oracle Cloud - VCN数量限制问题
# module "oracle_cloud" {

View File

@ -0,0 +1,61 @@
# Oracle Cloud Infrastructure
# Oracle Cloud
# Consul Oracle Cloud
data "consul_keys" "oracle_config" {
key {
name = "tenancy_ocid"
path = "config/dev/oracle/kr/tenancy_ocid"
}
key {
name = "user_ocid"
path = "config/dev/oracle/kr/user_ocid"
}
key {
name = "fingerprint"
path = "config/dev/oracle/kr/fingerprint"
}
key {
name = "private_key"
path = "config/dev/oracle/kr/private_key"
}
}
# Consul Oracle Cloud
data "consul_keys" "oracle_config_us" {
key {
name = "tenancy_ocid"
path = "config/dev/oracle/us/tenancy_ocid"
}
key {
name = "user_ocid"
path = "config/dev/oracle/us/user_ocid"
}
key {
name = "fingerprint"
path = "config/dev/oracle/us/fingerprint"
}
key {
name = "private_key"
path = "config/dev/oracle/us/private_key"
}
}
# OCI Provider
provider "oci" {
tenancy_ocid = data.consul_keys.oracle_config.var.tenancy_ocid
user_ocid = data.consul_keys.oracle_config.var.user_ocid
fingerprint = data.consul_keys.oracle_config.var.fingerprint
private_key = data.consul_keys.oracle_config.var.private_key
region = "ap-chuncheon-1"
}
# OCI Provider
provider "oci" {
alias = "us"
tenancy_ocid = data.consul_keys.oracle_config_us.var.tenancy_ocid
user_ocid = data.consul_keys.oracle_config_us.var.user_ocid
fingerprint = data.consul_keys.oracle_config_us.var.fingerprint
private_key = data.consul_keys.oracle_config_us.var.private_key
region = "us-ashburn-1"
}

View File

@ -0,0 +1,72 @@
# -
# ash1d
resource "oci_core_instance" "ash1d" {
provider = oci.us
# -
compartment_id = data.consul_keys.oracle_config_us.var.tenancy_ocid
availability_domain = "TZXJ:US-ASHBURN-AD-1"
shape = "VM.Standard.E2.1.Micro"
display_name = "ash1d"
#
lifecycle {
prevent_destroy = true
ignore_changes = [
source_details,
metadata,
create_vnic_details,
time_created
]
}
}
# ash2e
resource "oci_core_instance" "ash2e" {
provider = oci.us
# -
compartment_id = data.consul_keys.oracle_config_us.var.tenancy_ocid
availability_domain = "TZXJ:US-ASHBURN-AD-1"
shape = "VM.Standard.E2.1.Micro"
display_name = "ash2e"
#
lifecycle {
prevent_destroy = true
ignore_changes = [
source_details,
metadata,
create_vnic_details,
time_created
]
}
}
# ash3c
resource "oci_core_instance" "ash3c" {
provider = oci.us
# -
compartment_id = data.consul_keys.oracle_config_us.var.tenancy_ocid
availability_domain = "TZXJ:US-ASHBURN-AD-1"
shape = "VM.Standard.A1.Flex"
display_name = "ash3c"
shape_config {
ocpus = 4
memory_in_gbs = 24
}
#
lifecycle {
prevent_destroy = true
ignore_changes = [
source_details,
metadata,
create_vnic_details,
time_created
]
}
}

View File

@ -0,0 +1,5 @@
#
data "oci_identity_availability_domains" "us_test" {
provider = oci.us
compartment_id = data.consul_keys.oracle_config_us.var.tenancy_ocid
}

View File

@ -17,7 +17,7 @@ output "cluster_overview" {
name = "dc2"
location = "Korea (KR)"
provider = "oracle"
node = "master"
node = "ch4"
ip = try(module.oracle_korea_node[0].public_ip, "pending")
status = "deployed"
} : null

75
nomad-de-correct.hcl Normal file
View File

@ -0,0 +1,75 @@
datacenter = "dc1"
data_dir = "/opt/nomad/data"
plugin_dir = "/opt/nomad/plugins"
log_level = "INFO"
name = "de"
bind_addr = "0.0.0.0"
addresses {
http = "de.tailnet-68f9.ts.net"
rpc = "de.tailnet-68f9.ts.net"
serf = "de.tailnet-68f9.ts.net"
}
advertise {
http = "de.tailnet-68f9.ts.net:4646"
rpc = "de.tailnet-68f9.ts.net:4647"
serf = "de.tailnet-68f9.ts.net:4648"
}
ports {
http = 4646
rpc = 4647
serf = 4648
}
server {
enabled = true
bootstrap_expect = 3
server_join {
retry_join = [
"semaphore.tailnet-68f9.ts.net:4648",
"ash1d.tailnet-68f9.ts.net:4648",
"ash2e.tailnet-68f9.ts.net:4648",
"ch2.tailnet-68f9.ts.net:4648",
"ch3.tailnet-68f9.ts.net:4648",
"onecloud1.tailnet-68f9.ts.net:4648",
"de.tailnet-68f9.ts.net:4648",
"hcp1.tailnet-68f9.ts.net:4648"
]
}
}
client {
enabled = true
servers = [
"ch3.tailnet-68f9.ts.net:4647",
"ash1d.tailnet-68f9.ts.net:4647",
"ash2e.tailnet-68f9.ts.net:4647",
"ch2.tailnet-68f9.ts.net:4647",
"hcp1.tailnet-68f9.ts.net:4647",
"onecloud1.tailnet-68f9.ts.net:4647",
"de.tailnet-68f9.ts.net:4647",
"semaphore.tailnet-68f9.ts.net:4647"
]
network_interface = "tailscale0"
cgroup_parent = ""
}
consul {
address = "ch4.tailnet-68f9.ts.net:8500"
server_service_name = "nomad"
client_service_name = "nomad-client"
auto_advertise = true
server_auto_join = false
client_auto_join = true
}
telemetry {
collection_interval = "1s"
disable_hostname = false
prometheus_metrics = true
publish_allocation_metrics = true
publish_node_metrics = true
}

Some files were not shown because too many files have changed in this diff Show More